Commit Graph

43 Commits (ee2f022c71e2efabf19ed2b899a052f2fb5fcfa0)

Author SHA1 Message Date
hanna ee2f022c71 Make new TraverseByLociByReference the default.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@532 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-24 19:50:11 +00:00
hanna e50ae97fe1 Introduce new index-based fasta reader. Clean up MicroManager code, pushing necessary code back into TraversalEngine.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@531 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-24 19:40:21 +00:00
ebanks 13d4692d2e 1. Added a by-interval traversal.
2. Added a shell for the indel cleaner walker (it's currently being used to test the interval traversal).
3. Fixed small bug in downsampling (make sure to downsample the offsets too)
4. GenomeAnalysisTK.execute => anyone object to my change to "instanceof" instead of trying to catch a ClassCastException (yuck)?



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@524 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-24 04:33:35 +00:00
aaron 3dc2afd7ab Added the ability to get a merged header in a LociByReference traversal
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@514 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-23 20:34:52 +00:00
depristo ee5ab9536f trivial checking / flagging issues to enable testing of merging iterator performance
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@460 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-17 03:11:59 +00:00
depristo dbf2344cef Fixes for including duplicate reads in the locus traversal; now checks that the ref arg is provided when needed
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@459 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-17 01:27:36 +00:00
hanna 165e504d1c Turn on new TraverseLociByReference is now only dependent on the -et flag. REGION_STR does not matter.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@454 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-16 19:45:47 +00:00
aaron 180ff13290 Added a bunch of changes to support the new MicroManager code
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@431 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-15 18:29:38 +00:00
depristo 72a3d84ed2 General purpose pileup code -- you can use these features to obtain detailed pileup data from reads and offsets. Useful for all pileup based walkers. Expanded support for rodSAMPileup to enable the new ValidatingPileupWalker, which takes a samtools pileup output and checks that GATK gives identical output as samtools on a per base and per qual pileup. It's going to be a very useful validation tool.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@418 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 22:13:10 +00:00
hanna 0629f79049 Moved fasta support files into their own package.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@408 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 18:13:23 +00:00
asivache 8e6093d5a5 remove mom/dad/kid cmd line arguments that were needed for mendelian walker; now we can use generic track binding!!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@389 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 00:45:34 +00:00
kiran 756e6c61d8 Strictness args are presented as lowercase in the help, but only accepted if uppercase. Changed help to list the valid arguments in uppercase.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@376 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-13 00:50:19 +00:00
kcibul c7777d46d6 * re-enabled setting of sequence dictionary information on GenomeLoc
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@366 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-12 02:44:14 +00:00
depristo 17b3d5b554 New ROD accessing system, including a generalized interface for binding ROD on the command line that doesn't require you to chance GenomeAnalysisTK.java
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@355 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-09 22:04:59 +00:00
hanna 0d825ccfc1 Oops. Fixed duplicate reference to the reference.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@353 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-09 21:27:57 +00:00
hanna 8a1207e4db Bringing up scaffolding for integration of locus traversals by reference with Aaron's data source code.
Reverts to original TraverseByLociByReference behavior unless a special combination of command-line flags are used.

Lightly tested at best, and major flaws include:
- MicroManager is not doing MicroScheduling right now; it's driving the traversals.
- New database-ish data providers imply by their interface that they're stateless, but they're highly stateful.
- Using static objects to circumvent encapsulation.
- Code duplication is rampant.
- Plus more!


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@346 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-09 20:28:17 +00:00
depristo b49f713336 Enabled multiple argument for GATK driver; first step towards generalized -rods <name> <type> <file> argument structure
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@325 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-08 01:52:13 +00:00
asivache 1ade22121b cruel hack: new toolkit-wide optional cmdline arguments added to allow for loading trio genotyping tracks; to be moved back to walker when walkers can register their data needs with the toolkit
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@324 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-07 22:33:26 +00:00
andrewk 86fc18e9fc Fixed merge bug
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@288 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-03 17:41:58 +00:00
andrewk bef475778f - Updated --hapmap switch to --hapmap-chip to reflect the data being chip data for an individual rather than population allele frequency data in Hapmap
- Corrected some bugs to get metrics logging working
- Added a switch --force_1base_probs to ignore 4-base probalities if they exist


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@287 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-03 17:32:31 +00:00
depristo edc44807af rod's now have names. Use getName() to access it. Next step is better interface to accessing rods
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@286 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-03 16:41:33 +00:00
depristo f031d882c6 ByReference traversals!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@281 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-03 13:23:18 +00:00
asivache c6ab60ee04 change variable type to Boolean from boolean to make cmdline parser happy
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@279 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-02 22:35:30 +00:00
asivache 16aa979e34 make -A a true flag not an argument that asks for 'true/false' value!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@278 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-02 22:23:46 +00:00
jmaguire b7a67da775 Expose the underlying SAM reader to the walkers.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@270 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-02 21:38:00 +00:00
asivache 5d9b068b8b generic declarations added here and there to eliminate a few annoying warnings; no consequential changes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@268 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-02 20:53:01 +00:00
kcibul c192a95998 changes in three files to make the HapMap RODs work:
- HapMapAlleleFrequenciesROD.java - the referenceOrderedDatum implementation
 - PrepareROD.java - has a static block that loads the known ROD classes, had to add the above
 - GenomeAnalysisTK.java - when supplied a hapmap argument... loads the ROD

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@265 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-02 19:55:19 +00:00
jmaguire 25ace306b9 GenomeAnalysisTK: better documentation of validation option.
AlleleFrequencyWalker: output the last reference interval if it's left hanging open.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@258 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-02 16:11:20 +00:00
depristo d952790258 GFF now parses attributes correctly and efficiently. Slightly better interface to Utils.join
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@253 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-01 22:54:38 +00:00
ebanks 6cc2fa24d5 Added ability to downsample to a particular coverage
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@250 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-01 20:27:06 +00:00
ebanks 3af4290a49 Added iterator to randomly downsample to a given fraction of the reads.
Also, updated sort iterator to allow user to input max sorts.
Put in placeholder for downsampling to given coverage.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@243 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-01 02:11:13 +00:00
depristo d7c0bcc223 Reorganized GenomeLoc code to more clearly and better use the picard SequenceDictionary information.
All GenomeLoc[] are not ArrayList<GenomeLoc> for clarity and consistency
Parsing now recursively merges contiguous elements chr1:1-10;chr1:11-20 => chr1:1-20
Added support for TraversingByLoci over all reference positions specified by the provided location array.  System dynamically determines which traversal system to use.
Pileup now marks, very clearly, reference positions without covered reads.
Made changes around the codebase to deal with new GenomeLoc structure.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@218 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-28 20:37:27 +00:00
depristo cfee59e0e6 New type hierarchy for Traversals. There's a new package to hold them (traversals) and an easy system to create new ones. We are now one step closer to supporting the execution manager (a totally non-functional version is included here) that actually executes walkers in parallel using N threads.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@214 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-27 15:40:45 +00:00
hanna 4a6be896b9 Provide out and err PrintStreams to the walkers.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@213 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-27 15:03:32 +00:00
aaron d115209e86 moved a bunch of files over to the logging system. In some cases I ballparked the severity level of an error, so if you see something wrong feel free to make changes.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@209 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-27 13:27:04 +00:00
depristo 3abaaa3cc3 Tried to add a poor man's version of seeing all reference sites in an interval, and failed. However, I did add the command line argument and a few pieces of useful code.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@206 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-27 00:12:35 +00:00
hanna 53fe9acf65 Make command-line arguments available in walker constructor, provide back door from
walker into GATK itself, do some cleanup of output messages, and add some bug fixes.
Command-line arguments in walkers are now feature-complete, but still a bit messy.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@203 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-26 20:45:27 +00:00
depristo d457778283 Unified byLoci and byLociByInterval traversals. It now figures out what to do for you based on the presence of an index and set of required locations to process.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@191 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-25 16:01:58 +00:00
hanna 9e2a373184 Prototype, buggy implementation of walker command-line arguments. Doesn't
(yet) deal elegantly with even simple cases.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@180 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-25 00:12:00 +00:00
depristo 919a86e876 Cleaned up code for by interval traversals for Jared. Initialization code refactored and made clear. by loci and by loci by interval use the same underlying code now. Everyone uses the same initialization code to set things up. It's a party in the TraversalEngine and everyone's invited...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@179 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-24 22:32:45 +00:00
depristo 6df19ab793 Support for byInterval traversals for Jared. Do not use them.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@175 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-24 20:55:34 +00:00
andrewk 9dee9ab51c Added Hapmap data track (using rodGFF class for GFF file format) to toolkit as a command line option, Hapmap metrics to AlleleFrequencyMetricsWalker, and a python Geli2GFF file converter.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@163 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-24 03:58:03 +00:00
hanna 63cd1fe201 Push core / playground lower into the tree.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@160 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-23 23:19:54 +00:00