Commit Graph

40 Commits (2d4bcb60a1fb030e1645b3b9218ceb74abba6a23)

Author SHA1 Message Date
depristo 9b1b8d46aa Performance tracking of GenomeLocProcessingTrackers, as well as a marker for where to put tracker in HierarchicalMicroScheduler
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5051 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-21 22:24:42 +00:00
depristo 85553cf5cb V2 cleaner, easily testing, shared memory and distributed GATK job management. Serious unit testing. Very much cleaner processing. Some code cleanup remains in removing now unused classes but the system is ready for general testing. Confirmed that one can run the UG 100 ways parallel without error, but edge cases may remain.
See documentation at:

http://www.broadinstitute.org/gsa/wiki/index.php/Parallelism_and_the_GATK#Distributed_Parallelism_.28Experimental.29

for examples on how to run this, or the testing Scala script

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5032 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-20 12:58:13 +00:00
depristo f8ba76d87c Incremental commit for distributed computation. Appears to work but has potential deadlock situation not yet debugged. Do not use yet.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5010 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-17 21:23:09 +00:00
hanna e0092bb160 Experimental feature: change the rate at which log messages appear on-the-fly
and enable/disable performance logs from outside the JVM process.  Making this
available for the moment; we'll see whether it ends up being useful.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4983 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-13 04:20:53 +00:00
bthomas 374c0deba2 Updating the core LocusWalker tools to include the Sample infrastructure that I added last month. This commit touches a lot of files, but only significantly changes a few: LocusIteratorByState and ReadBackedPileup and associated classes.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4711 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-19 19:59:05 +00:00
hanna 8e36a07bea Convert GenomeLocParser into an instance variable. This change is required
for anything that needs to be simultaneously aware of multiple references, eg
Queue's interval sharding code, liftover support, distributed GATK etc.  

GenomeLocParser instances must now be used to create/parse GenomeLocs.
GenomeLocParser instances are available in walkers by calling either

-getToolkit().getGenomeLocParser()
or
-refContext.getGenomeLocParser()

This is an intermediate change; GenomeLocParser will eventually be merged
with the reference, but we're not clear exactly how to do that yet.  This
will become clearer when contig aliasing is implemented.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4642 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-10 17:59:50 +00:00
depristo f9541b78d3 Timing of traversal now starts at the start of the traversal, so the rate is reasonable right off the bat. For example, we now see: INFO 22:45:02,476 TraversalEngine - [TRAVERSAL STARTING]; INFO 22:45:32,484 TraversalEngine - [PROGRESS] Traversed to 2:50850686, processing 18,646 sites in 30.05 secs (1611.50 secs per 1M sites)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4527 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-20 02:47:34 +00:00
hanna 41d57b7139 Massive cleanup of read filtering.
- Eliminate reduncancy of filter application.
- Track filter metrics per-shard to facitate per merging.
- Flatten counting iterator hierarchy for easier debugging.
- Rename Reads class to ReadProperties and track it outside of the Sting iterators.
Note: because shards are currently tied so closely to reads and not the merged triplet of <reads,ref,RODs>, the metrics
classes are managed by the SAMDataSource when they should be managed by something more general.  For now, we're hacking
the reads data source to manage the metrics; in the future, something more general should manage the metrics classes.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4015 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-11 20:17:11 +00:00
aaron 0f29f2ae3f fixes for the Tree index, and some small clean-up in the GATK.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3991 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-09 20:41:50 +00:00
hanna 4995950d04 IndexedFastaSequenceFile is now in Picard; transitioning to that implementation.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3701 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-01 04:40:31 +00:00
hanna 612c3fdd9d First pass at eliminating the old sharding system. Classes required for the original sharding system
are gone where I could identify them, but hierarchies that split to support two sharding systems have
not yet been taken apart.
@Eric: ~4k lines.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3580 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-17 20:17:31 +00:00
depristo 2b02324587 Support for detecting and automatically excluding reads reading into the adaptor sequence and, if desired, also only showing the first pair when two reads overlap in the fragment. Not enabled, an intermediate check in before updating and verifying the impact on locus walkers everywhere.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3465 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-30 18:00:12 +00:00
depristo dfc36c1e95 Restructuring of the mandatory read filters for traversals. Now everything uses ReadFilters, even for the required filters like being mapped for LocusWalkers. Statistics now tracked for each read filter used during the traversal and info emitted in INFO at the end.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3445 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-26 22:12:25 +00:00
hanna a7ba88e649 Rework the way the MicroScheduler handles locus shards to handle intervals that span shards
with less memory consumption.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2981 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-11 18:40:31 +00:00
hanna 1b572b192a Stopgap fix for temporary problems sharding when indexless. A more compelling solution will come later this week.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2908 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-02 02:59:14 +00:00
hanna 75a541b479 Fix nasty issue where shard boundaries aren't properly clipped during locus traversals.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2907 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-01 23:31:58 +00:00
hanna 30eb28886b Basic functionality for intervaled reads in new sharding system. Not currently filtering out cruft, so
the mode of operation is currently queryOverlapping rather than queryContained.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2899 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-26 21:41:55 +00:00
hanna 199b43fcf2 Reduce by interval alterations to interface with new sharding system. This checkin with be followed by a
simplification of some of the locus traversal code.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2886 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-25 00:16:50 +00:00
hanna a1e8a532ad Support for initialize() and onTraversalDone() output from parallelized walkers.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1911 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-26 20:18:31 +00:00
hanna b69eb208a6 Always create output files, even if no output was written to them.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1627 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-15 17:58:14 +00:00
hanna ccdb4a0313 General-purpose management of output streams.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1454 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-23 00:56:02 +00:00
hanna 5429b4d4a8 A bit of reorganization to help with more flexible output streams. Pushed construction of data
sources and post-construction validation back into the GATKEngine, leaving the MicroScheduler
to just microschedule.  


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1336 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-29 23:00:15 +00:00
hanna 7a13647c35 Support for specifying SAMFileReaders and SAMFileWriters as @Arguments directly. *Very*
rough initial implementation, but should provide enough support so that people can stop
creating SAMFileWriters in reduceInit.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1332 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-29 16:11:45 +00:00
hanna 5735c87581 Basic infrastructure for filtering malformed reads.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1178 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-06 22:50:22 +00:00
aaron 63b5c12cbd Changed dataSources to datasources, to be consistant with the rest of our package names. Also, this makes me champion in the largest check-in contest.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@985 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-11 18:13:22 +00:00
aaron a8a2d0eab9 added support for the -M option in traversals.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@935 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-08 15:12:24 +00:00
aaron d994544c47 Added back end code support for Sharding based on genomic location for reads. Changed the sharding
code to take GenomeLocSortedSet instead of a list<GenomeLoc>, and added a bunch of much simplier 
and cleaner test cases.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@816 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-26 20:57:46 +00:00
hanna ff798fe483 Reintroduce support for interval-based traversals.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@749 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-18 22:54:18 +00:00
hanna 2c4de7b5c5 Switch TraverseByLoci over to new sharding system, and cleanup some code in passing read files along
the pathway from command line to traversal engine.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@727 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-15 21:02:12 +00:00
aaron 99d4ebc26d Added functionality to return the final accumulator of a traversal, so external tools can get the result of a walker.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@724 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-15 20:20:27 +00:00
hanna 6e394490cb Cleanup in preparation for ByLoci traversal. Also did some work minimizing unit tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@643 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-08 21:27:54 +00:00
hanna 483a58627b More cleanup -- pushing shared functions down into the traversal engine.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@639 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-08 14:12:45 +00:00
hanna 7a9cfe1f75 Push reduceInit down a level so that the walker can call into it without weird casts.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@638 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-08 13:46:28 +00:00
hanna 4c269b8496 Cleanup LinearMicroScheduler in preparation for TraverseByLoci inclusion.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@634 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-08 00:58:37 +00:00
hanna dc944ec69b First stage of ROD plumbing for MicroScheduler.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@614 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-06 23:26:21 +00:00
aaron 5136724884 Added code to the schedulers, one step closer to turning on the new reads traversals
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@613 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-06 22:36:25 +00:00
aaron 0aba688e6f Added a interface that all our SAMRecord iterators should try to code to. This is in the effort to keep our code generic
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@609 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-06 21:40:41 +00:00
aaron 63403d32cd Changes to the interface to the simple data source rippled out to a bunch of files.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@572 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-30 20:35:56 +00:00
hanna 9a8902571c Placeholder for parallel MicroManager.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@542 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-26 23:08:12 +00:00
hanna c9e9731495 More cleanup.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@539 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-26 17:46:52 +00:00