Commit Graph

43 Commits (86bd55408e03ede30661c5200d85f9a6f348bebb)

Author SHA1 Message Date
aaron 0f29f2ae3f fixes for the Tree index, and some small clean-up in the GATK.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3991 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-09 20:41:50 +00:00
hanna 4995950d04 IndexedFastaSequenceFile is now in Picard; transitioning to that implementation.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3701 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-01 04:40:31 +00:00
hanna 3d055e3d16 Fail fast if users try to parallelize a read walker.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3484 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-03 18:14:33 +00:00
hanna 7389077b3b A few misc usability fixes:
- Clarify the message emitted when -XL is supplied so I don't spend another half day chasing a bug that doesn't exist.  
- Crash with a helpful message when running -nt with non-TreeReducible walkers.
- Crash with a helpful message when running -nt with reduceByInterval walkers.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3405 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-20 19:02:02 +00:00
ebanks 1e8b3ca6ba Fare thee well, oh LocusWindowTraversal.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3089 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-29 13:17:26 +00:00
hanna b4b4e8d672 For Sarah Calvo: initial implementation of read pair traversal, for BAM files
sorted by read name.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3052 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-21 23:22:25 +00:00
hanna 199b43fcf2 Reduce by interval alterations to interface with new sharding system. This checkin with be followed by a
simplification of some of the locus traversal code.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2886 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-25 00:16:50 +00:00
hanna b19bb19f3d First successful test of new sharding system prototype. Can traverse over reads from a single
BAM file.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2587 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 03:35:55 +00:00
hanna ccdb4a0313 General-purpose management of output streams.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1454 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-23 00:56:02 +00:00
hanna 5429b4d4a8 A bit of reorganization to help with more flexible output streams. Pushed construction of data
sources and post-construction validation back into the GATKEngine, leaving the MicroScheduler
to just microschedule.  


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1336 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-29 23:00:15 +00:00
hanna 7a13647c35 Support for specifying SAMFileReaders and SAMFileWriters as @Arguments directly. *Very*
rough initial implementation, but should provide enough support so that people can stop
creating SAMFileWriters in reduceInit.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1332 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-29 16:11:45 +00:00
asivache a361e7b342 SAMDataSource is now exposed by GATK engine; SamFileHeaderMerger is exposed from Resources all the way up to SAMDataSource, so now we can see underlying individual readers should we need them; GATK engine has new methods getSamplesByReaders(), getLibrariesByReaders(), and getMergedReadGroupsByReaders(): each of these methods returns a list of sets, with each element (set) holding, respectively, samples, libraries, or (merged) read groups coming from an individual input bam file (so now when using multiple -I options we can still find out which of the input bams each read comes from)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1315 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-24 22:59:49 +00:00
hanna 99f9cd84ed Warning for possibly mismatched reads / reference was very aggressive. Relax
the criteria a bit.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1234 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-14 16:21:22 +00:00
hanna 0f6bfaaf73 Skip validation in case of no reads aligning.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1230 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-14 02:03:36 +00:00
hanna b18caa2052 Fix for GSA-90: System isn't failing with an error when you use the wrong reference.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1225 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-13 20:42:12 +00:00
aaron d86717db93 Refactoring of the traversal engine base class, I removed a lot of old code.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1209 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-09 21:57:00 +00:00
hanna 5e26770634 Hack the MicroScheduler to be tolerant of RefWalkers. We need to implement a longer-term solution to make it easier for datasources to report problems they've encountered along the way (GSA-103).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1205 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-09 17:26:59 +00:00
hanna 5735c87581 Basic infrastructure for filtering malformed reads.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1178 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-06 22:50:22 +00:00
hanna 491ed70b44 TraverseByLocusWindow -- asstd bug fixes.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1109 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 22:51:38 +00:00
aaron 5b1c23a7f2 changes to fix and test the interval based traversals
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1095 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 17:54:15 +00:00
aaron 8b4d0412ca Changed the duplicate traversal over to the new style of traversal and plumbed into the genome analysis engine. Also added a CountDuplicates walker, to validate the engine.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1072 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-22 21:11:18 +00:00
aaron bcb64d92e9 Aaron: 1, GenomeLoc: 0. I changed our GenomeLoc class, seperating the creation of a genome loc (with the reference setup) to a parser class. GenomeLoc now just represents the actual genomic postion. The constructors are now package-protected (to enforce using the parser), but we may want to expose some constructors in the future.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1069 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-22 14:39:41 +00:00
aaron 63b5c12cbd Changed dataSources to datasources, to be consistant with the rest of our package names. Also, this makes me champion in the largest check-in contest.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@985 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-11 18:13:22 +00:00
aaron a8a2d0eab9 added support for the -M option in traversals.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@935 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-08 15:12:24 +00:00
aaron 109bef6c08 We're no longer in the read-dropping business.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@901 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-04 22:37:51 +00:00
hanna fc7320133c Cleaned up error when fasta index is missing. Code still throws an exception, but the message is more direct (no more 'error while micromanaging') and tells the user to run 'samtools faidx' to fix the issue.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@867 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-01 15:34:38 +00:00
hanna 5e8c08ee63 Update to latest version of picard. Change imports in all classes dependent on picard public from import edu.mit.broad.picard... to import net.sf.picard...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@849 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-28 20:13:01 +00:00
aaron 3c3cd5bb64 Moving some of the data sharding around. A new shard catagory now exits, INTERVAL. This saved a lot of code that was mirroring the same approach in both the read and locus shard strategies.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@840 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-27 18:24:31 +00:00
aaron d994544c47 Added back end code support for Sharding based on genomic location for reads. Changed the sharding
code to take GenomeLocSortedSet instead of a list<GenomeLoc>, and added a bunch of much simplier 
and cleaner test cases.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@816 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-26 20:57:46 +00:00
hanna d61a5261c1 Better integration of reference-ordered data into the data sharding system.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@779 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 20:09:32 +00:00
hanna 01a3cb27c7 @Required / @Allows flags for main arguments.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@751 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-19 23:26:17 +00:00
hanna ff798fe483 Reintroduce support for interval-based traversals.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@749 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-18 22:54:18 +00:00
hanna c10741e9f5 Rename TraverseLociByReference to TraverseLoci to represent its new function.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@743 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-18 01:31:57 +00:00
hanna 2c4de7b5c5 Switch TraverseByLoci over to new sharding system, and cleanup some code in passing read files along
the pathway from command line to traversal engine.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@727 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-15 21:02:12 +00:00
aaron 99d4ebc26d Added functionality to return the final accumulator of a traversal, so external tools can get the result of a walker.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@724 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-15 20:20:27 +00:00
aaron 517f27f331 Added sharding strat. code that picks the right kind of shard, based on the traversal engine
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@644 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-08 21:55:10 +00:00
hanna 6e394490cb Cleanup in preparation for ByLoci traversal. Also did some work minimizing unit tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@643 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-08 21:27:54 +00:00
hanna 4c269b8496 Cleanup LinearMicroScheduler in preparation for TraverseByLoci inclusion.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@634 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-08 00:58:37 +00:00
hanna dc944ec69b First stage of ROD plumbing for MicroScheduler.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@614 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-06 23:26:21 +00:00
aaron 5136724884 Added code to the schedulers, one step closer to turning on the new reads traversals
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@613 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-06 22:36:25 +00:00
hanna 9a8902571c Placeholder for parallel MicroManager.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@542 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-26 23:08:12 +00:00
hanna c9e9731495 More cleanup.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@539 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-26 17:46:52 +00:00
hanna 4036f24909 Documentation and cleanup work in preparation for parallelism.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@538 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-26 17:42:00 +00:00