Commit Graph

681 Commits (ae2eddec2d7fde6d0d01e5186b3371ff48dec552)

Author SHA1 Message Date
asivache 8f1cabd33d cmd line args changed - again; internally uses VariantType enum
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@818 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-26 21:03:58 +00:00
asivache 9ef1a21112 minor changes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@817 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-26 21:03:06 +00:00
aaron d994544c47 Added back end code support for Sharding based on genomic location for reads. Changed the sharding
code to take GenomeLocSortedSet instead of a list<GenomeLoc>, and added a bunch of much simplier 
and cleaner test cases.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@816 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-26 20:57:46 +00:00
asivache 4edcdffe45 refseq annotation track: should be able to provide (multiple) transcript annotations available over a given genomic position. NOT finished and NOT tested!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@815 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-26 20:07:15 +00:00
andrewk 149cc9989b spaces!!!!!!!!!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@814 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-26 19:40:25 +00:00
ebanks c2df35b7fe - get leftmost position of indel correct
- don't try to clean reads with mapping quality of 0
- un-deprecate


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@813 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-26 17:24:58 +00:00
hanna 54bb643d19 Validated Mark's assertion that GSA-27 is fixed. Also did some cleanup on the pileup walker so that it doesn't output to System.out.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@812 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-26 15:58:21 +00:00
hanna 008d677bea Fixed ValidatingPileup to work with Andrey's new rodSAMPileup -> GenotypeList type hierarchy.
Fixed reference-ordered data validation system to validate class hierarchies instead of specific class types.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@811 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-23 20:50:28 +00:00
aaron d056f9f3e8 Changed the name to reflect the sorted nature of the set, added some fixes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@810 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 22:34:24 +00:00
aaron 831d430025 Added a collection for storing GenomeLocs, that also has functions for removing by genomic region (that may span multiple GenomeLoc's in the collection), and adding regions, which are then merged with any overlapping regions.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@809 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 21:52:40 +00:00
hanna 34413362fd Bugfix: handle case where queue is empty.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@808 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 21:45:22 +00:00
hanna ec2e8d5726 Fixes for getting ValidatingPileup running in parallel.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@807 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 21:20:24 +00:00
kiran cd80e3f372 Replaced dumb training function with a version that creates a training set slightly more sensibly.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@806 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 19:34:33 +00:00
kiran 02c0afdb85 Added the ability to specify the sorted, unaligned bam and/or the sorted, aligned bam such that broken computations can be restarted.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@805 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 19:33:34 +00:00
kiran 454a6d1df7 Fixed an egregious error in simpleReverseComplement wherein the RC'd string would be composed entirely of the last base.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@804 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 19:32:20 +00:00
hanna 2a5be1debe Cleanup in datasources.providers namespace. Make it easier for others writing traversal engines to use.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@803 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 19:12:00 +00:00
asivache 02fc4f145f refactoring: a couple of general purpose (hopefully useful?) methods/classes extracted into a standalone utils class
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@802 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 18:54:40 +00:00
asivache 4b718688d5 no changes, really, just synchronizing (instead of reversing) to increase the amount of entropy
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@801 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 17:27:28 +00:00
asivache 893f1b6427 updated
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@800 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 17:25:50 +00:00
asivache a9dfbfb309 internal changes and some refactoring. slightly different final report. Now can take tracks that implement either Genotype or GenotypeList; takes an arg specifying what variants to look for (POINT - aka snp - or INDEL); takes an arg specifying whether default ref/ref call of one type (INDEL/POINT) should be implicitly assumed if another call (POINT/INDEL respectively) was made at the same position [this is probably most useful for indels and only (?) for sam pileups: if we have only point mutation call at a given position, it does mean that we do have coverage, and that there was no evidence whatsoever for an indel, so we have an implicit 'no-indel' call]
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@799 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 17:25:09 +00:00
asivache d5bb4d9ba9 Auxiliary class that can read one line from samtools pileup file. Used by rodSAMPileup to read pairs of lines as needed. NOTE: this class implements Genotype and (a trivial) GenotypeList, but it is NOT a rod!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@798 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 17:20:01 +00:00
asivache 732fed9aad ALERT, ALERT! rodSAMPileup is now a GenotypeList, not a Genotype! Now it can intelligently read full samtools pileup files (containing, in general, both point and indel genotypes at the same position). No need to split/synchronize pileups from different individuals anymore, hooray!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@797 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 17:17:59 +00:00
asivache 26633957d9 Genotype interface is extended: now it requires implementing object to be able to tell whether it isPointGenotype() or isIndelGenotype() (and the contract requires, e.g. alleles to be represented differently)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@796 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 17:14:46 +00:00
depristo d9fc84f1e3 actually checking in the first pass
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@795 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 17:13:27 +00:00
asivache 8773b3a430 a trivial wrapper interface for the objects capable of holding 'full' genotype, i.e. both point (as in ref/snp) and indel variants at the same reference position
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@794 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 17:12:01 +00:00
depristo 7a979859a9 Intermediate checking for evaluation -- now supports transition / transversion evaluation
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@793 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 17:05:06 +00:00
ebanks f2ea193149 For some reason the apostraphes in the comments were throwing annoying
compile-time warnings: "unmappable character for encoding UTF8"


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@792 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 14:07:07 +00:00
jmaguire 9902ce8073 properly flush the gzip output stream. this was a subtle inheritance bug.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@791 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 13:57:58 +00:00
asivache 63caca31bf minor update in report printout format
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@790 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 13:56:09 +00:00
asivache 7afc10fd6f updated, reports more stuff now, including stats for external consistency checks
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@789 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 22:28:18 +00:00
depristo 30c63daf89 More improvements to the duplicate quality combiner, making progress towards a clean system
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@788 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 22:26:57 +00:00
depristo 65995887fc Releasable version of the Pileup walker
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@786 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 22:25:37 +00:00
depristo dc17a5661d Better accessors for dealing with second base prob pileups
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@785 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 22:25:16 +00:00
depristo d261459c48 Useful function to create a string with N copies of a same char
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@784 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 22:23:52 +00:00
kiran 287bb52e81 Refreshes the mount points that we'll be using (so that the program will play nicely with LSF).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@783 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 20:36:12 +00:00
jmaguire b5ad5176f7 stick headers on the output tables
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@782 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 20:35:50 +00:00
kiran 83e1454a11 Added a method to determine the fraction of a sequence that's taken up by the most frequent base.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@781 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 20:35:31 +00:00
kiran bdf772f017 Added test for determining the fraction of a sequence that's taken up by the most frequent base (quick-and-dirty homopolymer testing).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@780 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 20:35:08 +00:00
hanna d61a5261c1 Better integration of reference-ordered data into the data sharding system.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@779 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 20:09:32 +00:00
ebanks 0d58e4ccc9 -check original alignments for indels when computing mismatch score
-move logging to debug


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@778 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 19:55:42 +00:00
kiran 5f67914b08 Added loads of documentation.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@777 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 19:40:47 +00:00
kiran 1a9d5cea29 Added a method to reverse-complement a String object, preserving 'N' and '.' bases.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@776 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 19:39:39 +00:00
kiran a687c6bc03 Added a method to refresh an NFS mount point (necessary to prevent NFS flakiness when running on the LSF farm.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@774 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 19:31:54 +00:00
kiran 324ef9cbd1 Test class for PathUtils.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@773 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 19:31:22 +00:00
aaron 8515247575 Adding some functions I keep reinventing, especially for testing purposes.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@772 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 19:30:44 +00:00
ebanks e6200fe5b5 don't ignore reads when maxReadLength isn't set
also, print out LOD score for cleaning


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@771 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 19:24:10 +00:00
andrewk 0219d33e10 QualityUtils: added reverse function to reverse an array of bytes (and not complement it), BaseUtils: split qualToProb into itself and qualToErrProb, CovariateCounterWalker and LogisticRecalibrationWalker: several changes including a properly acocunting (only partly complete) for reversing AND complementing bases that are negative strand, PrintReadsWalker: created option to output reads to a BAM file rather than just to the sceern (useful for creating a downsampled BAM file)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@770 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 18:30:45 +00:00
asivache 7e77c62b49 auxiliary class, a simple struct to keep together info like numbers of covered, assessed, ref/variant bases across the sample
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@769 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 16:30:16 +00:00
asivache 7e5e422591 ReferenceOrdereData now inspects the ROD class using reflection. If the ROD declares a static Iterator<ROD> createIterator(String rodName, File rodFile) factory method, it is wrapped and used by the ReferenceOrderedData to read records from rodFile. If the ROD does not provide such factory method, the old behavior is the default: ReferenceOrderedData uses its own simple default iterator to read the file line by line (assuming there is only one line per record/position).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@768 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 15:23:22 +00:00
hanna 26dd3cd50e Cleanup. Move filtering functions closer to where they're used.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@767 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-20 21:42:48 +00:00
hanna e7a6f8cdc4 Removed evidence of a previous incarnation of data sharding.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@766 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-20 20:48:33 +00:00
hanna 3cad580655 Catch and rethrow the walker's required argument, so that command-line arguments will be displayed when the GATK throws an argument exception.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@765 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-20 19:17:16 +00:00
hanna dc748d9c9c Integrate more feedback on command-line argument system. Focus on help
formatter: separate required from optional but otherwise keep ordering
the same, reorder GATK arguments by usage.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@764 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-20 19:01:25 +00:00
ebanks 34f9820299 update mapping quality score and edit distance attribute for reads when they are cleaned
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@763 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-20 17:51:31 +00:00
ebanks 57918de753 add the @Requires for this walker
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@762 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-20 17:03:12 +00:00
kiran 747521c849 Fixed the simplest of typos.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@761 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-20 16:00:30 +00:00
kiran e48078b476 Updated to reflect change to BasecallingReadModel constructor.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@760 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-20 15:43:26 +00:00
kiran 505f588768 Forgot to say that the mate is unmapped too. This is necessary to prevent SAM-JDK from yelling at me about an invalid SAM file.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@759 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-20 15:38:51 +00:00
hanna 96e73e496a Delete deprecated old-school traversals.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@758 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-20 14:57:17 +00:00
aaron b840dd1320 Added some code to change the instrumentation for tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@756 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-20 05:15:27 +00:00
kiran 6c5fbb988b Now basecalls an entire read (both ends of the pair, barcode... everything) at once. After, RawRead and FourProbRead can be asked to return a specified subset (corresponding to the ranges specified for each end of the read.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@754 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-20 00:09:20 +00:00
kiran e293d65ede Refactored to allow the user to specify the range of cycles they wish to call. Simply specify a single range (i.e. '0-75') or two ranges ('0-75,76-151'). This allows single and paired-end read processing to coexist happily. Also implements annotation of an aligned bam file (which should hopefully fit in under two gigs now, but I'm waiting on a bug fix or a clarification from the Picard team.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@753 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-20 00:07:24 +00:00
kiran 08c9f4d86b Renamed to BasecallingTrainer.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@752 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-20 00:03:46 +00:00
hanna 01a3cb27c7 @Required / @Allows flags for main arguments.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@751 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-19 23:26:17 +00:00
kiran 40dbc21df7 Moved ParseException to it's own file and made it public.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@750 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-19 14:42:44 +00:00
hanna ff798fe483 Reintroduce support for interval-based traversals.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@749 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-18 22:54:18 +00:00
jmaguire 3441795d9c better handling of edge cases (zero coverage, reference mistakes, etc.)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@747 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-18 18:04:37 +00:00
kiran 7c615c8fb0 Some changes to the system for annotating a pre-aligned bam file. Doesn't fit within 2gigs.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@746 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-18 17:42:08 +00:00
asivache a39c8839c8 print percentage sign!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@745 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-18 14:38:20 +00:00
hanna c10741e9f5 Rename TraverseLociByReference to TraverseLoci to represent its new function.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@743 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-18 01:31:57 +00:00
hanna e6ce80c8e3 Fix for GSA-44...don't throw exception when user specifies -h.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@742 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-18 00:42:00 +00:00
hanna d35e20ce21 Better error checking for missing .dict file.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@741 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-17 21:57:12 +00:00
hanna 7161b8f927 Disable support for short name values directly abutting their arguments.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@740 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-17 16:09:32 +00:00
hanna d152c2b911 New GATKArgumentCollection caused a subtle bug with argument grouping and the help system. Fixed.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@738 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-17 14:54:25 +00:00
jmaguire 94e324b844 Write N for the alt allele when we're hom-ref.
Stop EM loop when we've converged (likelihood[t-1] == likelihood[t]).


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@737 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-17 13:58:11 +00:00
kcibul bd53bc18f9 added new required annotations
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@736 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-17 12:24:06 +00:00
kiran 28bf7ec8ad Aesthetic cleanup.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@735 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-17 04:09:23 +00:00
kiran a0464633fd Whoops. Changed denominator from reads to bases.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@734 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-17 03:42:25 +00:00
kiran 5d60efc498 Factored out some simple stats accumulation.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@733 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-17 03:37:57 +00:00
ebanks 81fac73c01 LOD checks for normal and brute force versions
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@732 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-17 02:56:03 +00:00
jmaguire 527df6e57b Massive speed-up, clean-up and tabular output.
This program is going to rule.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@731 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-16 16:52:40 +00:00
jmaguire 3b57a35009 don't be tricked by multiple read groups with the same sample id!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@730 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-16 15:28:55 +00:00
jmaguire 947bac5cdc vast speedup
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@729 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-16 15:27:58 +00:00
kiran 6f1559bd77 Cleaned up a bit. Added some documentation.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@728 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-15 21:22:24 +00:00
hanna 2c4de7b5c5 Switch TraverseByLoci over to new sharding system, and cleanup some code in passing read files along
the pathway from command line to traversal engine.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@727 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-15 21:02:12 +00:00
ebanks f33f3c0434 added LOD threshold for determining when to clean
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@725 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-15 20:23:59 +00:00
aaron 99d4ebc26d Added functionality to return the final accumulator of a traversal, so external tools can get the result of a walker.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@724 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-15 20:20:27 +00:00
kiran dae77bf14a Fixed a typo in a comment.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@723 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-15 20:07:31 +00:00
kiran bfc40f54f0 Nicer output when training off of perfect reads. Not that that works yet...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@722 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-15 20:07:08 +00:00
kcibul d1f3000afa bed-style output for IGV
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@721 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-15 17:58:44 +00:00
kiran 36db44620b Improved output. Can optionally limit the number reads actually called.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@720 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-15 00:07:57 +00:00
depristo 7834b969b4 Better interface to the tabular ROD, now makes writing files easier. Also has corresponding test files
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@719 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 23:20:11 +00:00
aaron 50f32b7f61 Added a shard strategy for the reduce-by-interval traversals. Also fixed bugs that I found along the way.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@718 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 21:20:18 +00:00
depristo 0f8e6061b6 Simple interface improvements
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@717 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 21:08:09 +00:00
depristo 8e9e2f4502 Revised ROD system. Split the system in Basic type and interface. Enabled more control over rod accessing, including an initialize() function to fetch headers and other options from the file. Added general tabular rod, which has a named columns and supports a map<String,String> interface. Comes with shiny new Junit system for RODs. Also, added simple python script for accessing picard data.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@716 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 21:06:28 +00:00
hanna 67293168e7 Support periods in sequence names.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@715 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 20:17:57 +00:00
jmaguire 641afc4e76 fix a crash in the event that the input file has no read groups!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@714 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 19:27:41 +00:00
aaron d8c1b010f1 Fixing the naming of the function I checked in earlier.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@713 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 19:27:10 +00:00
kiran 5858f20902 Documentation.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@712 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 18:58:43 +00:00
kiran 68c9455c0f Moved the base complement method to BaseUtils.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@711 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 18:57:48 +00:00