Commit Graph

247 Commits (30c63daf8958396f14d35c6b09b71f78e382a742)

Author SHA1 Message Date
depristo 30c63daf89 More improvements to the duplicate quality combiner, making progress towards a clean system
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@788 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 22:26:57 +00:00
depristo 65995887fc Releasable version of the Pileup walker
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@786 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 22:25:37 +00:00
hanna d61a5261c1 Better integration of reference-ordered data into the data sharding system.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@779 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 20:09:32 +00:00
andrewk 0219d33e10 QualityUtils: added reverse function to reverse an array of bytes (and not complement it), BaseUtils: split qualToProb into itself and qualToErrProb, CovariateCounterWalker and LogisticRecalibrationWalker: several changes including a properly acocunting (only partly complete) for reversing AND complementing bases that are negative strand, PrintReadsWalker: created option to output reads to a BAM file rather than just to the sceern (useful for creating a downsampled BAM file)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@770 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 18:30:45 +00:00
asivache 7e5e422591 ReferenceOrdereData now inspects the ROD class using reflection. If the ROD declares a static Iterator<ROD> createIterator(String rodName, File rodFile) factory method, it is wrapped and used by the ReferenceOrderedData to read records from rodFile. If the ROD does not provide such factory method, the old behavior is the default: ReferenceOrderedData uses its own simple default iterator to read the file line by line (assuming there is only one line per record/position).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@768 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 15:23:22 +00:00
hanna 26dd3cd50e Cleanup. Move filtering functions closer to where they're used.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@767 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-20 21:42:48 +00:00
hanna e7a6f8cdc4 Removed evidence of a previous incarnation of data sharding.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@766 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-20 20:48:33 +00:00
hanna 3cad580655 Catch and rethrow the walker's required argument, so that command-line arguments will be displayed when the GATK throws an argument exception.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@765 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-20 19:17:16 +00:00
hanna dc748d9c9c Integrate more feedback on command-line argument system. Focus on help
formatter: separate required from optional but otherwise keep ordering
the same, reorder GATK arguments by usage.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@764 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-20 19:01:25 +00:00
ebanks 57918de753 add the @Requires for this walker
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@762 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-20 17:03:12 +00:00
hanna 96e73e496a Delete deprecated old-school traversals.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@758 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-20 14:57:17 +00:00
hanna 01a3cb27c7 @Required / @Allows flags for main arguments.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@751 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-19 23:26:17 +00:00
hanna ff798fe483 Reintroduce support for interval-based traversals.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@749 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-18 22:54:18 +00:00
hanna c10741e9f5 Rename TraverseLociByReference to TraverseLoci to represent its new function.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@743 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-18 01:31:57 +00:00
hanna 2c4de7b5c5 Switch TraverseByLoci over to new sharding system, and cleanup some code in passing read files along
the pathway from command line to traversal engine.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@727 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-15 21:02:12 +00:00
aaron 99d4ebc26d Added functionality to return the final accumulator of a traversal, so external tools can get the result of a walker.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@724 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-15 20:20:27 +00:00
depristo 7834b969b4 Better interface to the tabular ROD, now makes writing files easier. Also has corresponding test files
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@719 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 23:20:11 +00:00
aaron 50f32b7f61 Added a shard strategy for the reduce-by-interval traversals. Also fixed bugs that I found along the way.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@718 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 21:20:18 +00:00
depristo 0f8e6061b6 Simple interface improvements
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@717 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 21:08:09 +00:00
depristo 8e9e2f4502 Revised ROD system. Split the system in Basic type and interface. Enabled more control over rod accessing, including an initialize() function to fetch headers and other options from the file. Added general tabular rod, which has a named columns and supports a map<String,String> interface. Comes with shiny new Junit system for RODs. Also, added simple python script for accessing picard data.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@716 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 21:06:28 +00:00
aaron d8c1b010f1 Fixing the naming of the function I checked in earlier.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@713 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 19:27:10 +00:00
ebanks b62bddee42 The header was never being set.
Added this hack for now and will alert the authorities ASAP... 


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@708 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 17:18:51 +00:00
aaron 7aa90757ac Moved the iterators over to the StingSAMIterator interface. This will help us ensure that iterators that need to be closed get closed.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@702 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 16:52:18 +00:00
aaron c3b2c66911 The GATK doesn't need the rest
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@698 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 16:20:45 +00:00
aaron 0215905bb6 Added an adapter class, that will adapt plain iterators and closeable iterators of SAMRecords into STingSAMIterators. Also unit tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@697 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 15:17:32 +00:00
hanna 80c13f7127 Added a getter for command-line arguments.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@695 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 13:55:52 +00:00
hanna 307c6e4ecf Oops. Forgot to add new file to svn.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@694 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 00:52:30 +00:00
hanna d14cab0be7 Added IterableLocusContextQueue and test. Cleaned up tests, adding BaseTest where it didn't exist. Enhanced test runner to run only classes ending in ...Test.java, so that utility classes can sit alongside the tests but won't be run by JUnit.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@693 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-13 21:32:05 +00:00
hanna 12ae3a22b6 Break locus context data access providers into modular components in preparation for traverse by loci.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@689 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-13 18:51:16 +00:00
aaron 6e69193e3c Deprecated calls to getSamReader on both the GenomeAnalysisEngine and the TraversalEngine. This call fails in the new style traversals, but it won't disapear until the cut-over to the new traversals is complete.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@671 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 18:52:42 +00:00
ebanks 630066cc0a 1. Merge LocusWindows whose reads overlap.
2. Fix bug (we weren't clearing the "to emit" list)


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@670 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 17:33:23 +00:00
aaron 9f942fdfa0 Added code to correct the violation of the parsing interface. Now the analysis type resides in the command line arg, but is stored into the argument collection before it's passed to the genomeAnalysisEngine.
Also fixed a bug where we'd exception-out if we didn't provide a interval region.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@669 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 15:33:55 +00:00
hanna ee9077fc69 LocusIterator iterated through LocusContexts, which was fine until now when we need something
that iterates through loci (GenomeLocs).  Rename LocusIterator to LocusContextIterator.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@662 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 13:54:57 +00:00
hanna 608948210c Check for a reference before extraction.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@661 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 13:29:44 +00:00
hanna 32696b13f5 Fixed method override issue with old-style traversals.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@660 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 01:22:18 +00:00
hanna 862b8a6787 intervals_file + genome_loc => intervals.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@659 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 01:04:18 +00:00
hanna 23e9e29964 Changed reads traversals from providing a LocusContext from which the reference sequence
could be extracted to a char[] containing the reference bases.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@657 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-11 22:45:11 +00:00
hanna 052819bed5 Switched dependencies of GenomeAnalysisTK to depend on GenomeAnalysisEngine.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@656 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-11 22:33:00 +00:00
aaron ff1b92acc4 Switch over to the GenomeAnalysisEngine/CommandLineGATK system from the GenomeAnalysisTK code.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@655 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-11 22:05:58 +00:00
ebanks 009e71fcd9 We need to sort cleaned reads ourselves (instead of letting SAMFileWriter
do it) because the SAM headers are often screwed up and claim to be
"unsorted".  While here, I broke off the module from the SortSamIterator
in case someone else wants to use it.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@654 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-11 15:43:42 +00:00
aaron c735e1f627 small javadoc cleanup.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@653 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-11 03:44:21 +00:00
aaron e8b8ab5985 Added code to extend Matt's getReferenceBases out to the read walkers, so they can see the corresponding reference for each read.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@652 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-11 03:42:38 +00:00
aaron 898f65547e Added code to split GenomeAnalysisTK.java into an object concerned with loading command line args, and one that runs the engines. This will allow us to run the GATK from other tools (like Matlab). Also some cleanup to seperate out the legacy traversals and the new style traversals. This is not live yet, and any modifications you need should be made to GenomeAnalysisTK.java for now.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@650 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-11 02:07:20 +00:00
aaron 8d43ec3d7e a fix for a situation where a chromosome on the reference file contains no reads, and doesn't align to the bam file. This came up using reference 18, which has chomosomes like chr1_random that aren't in all BAM files.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@649 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-11 01:39:25 +00:00
hanna 55c1b688bd Fix mediocre javadoc.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@646 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-08 22:31:16 +00:00
hanna 522f8b58be Added second method for getting large sequences of the reference for use in reads traversals.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@645 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-08 22:18:04 +00:00
aaron 517f27f331 Added sharding strat. code that picks the right kind of shard, based on the traversal engine
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@644 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-08 21:55:10 +00:00
hanna 6e394490cb Cleanup in preparation for ByLoci traversal. Also did some work minimizing unit tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@643 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-08 21:27:54 +00:00
hanna ee777c89de Change the default mechanism for adding ROD bindings to the new system. TODO: create a new object type for these triplets.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@642 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-08 18:43:00 +00:00
ebanks 3aabc144c6 Added functionality to allow for a contract between LocusWindowTraversalEngine and LocusWindowWalker which allows the Walker to act upon reads outside of the provided intervals.
(Really, all we want to do is spit out all reads, but this allows the Walker to do other things with the reads if it wants)


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@641 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-08 17:28:16 +00:00