Commit Graph

16 Commits (c103623cf69b272706da0b3748c3be6a2425c4fb)

Author SHA1 Message Date
Mauricio Carneiro 2a4ccfe6fd Updated all JAVA file licenses accordingly
GSATDG-5
2013-01-10 17:06:41 -05:00
Mark DePristo 38cc496de8 Move SomaticIndelDetector and associated tools and libraries into private/andrey package
-- Intermediate commit on the way to archiving SomaticIndelDetector and other tools.
-- SomaticIndelDetector, PairMaker and RemapAlignments tools have been refactored into the private andrey package.  All utility classes refactored into here as well.  At this point, the SomaticIndelDetector builds in this version of the GATK.
-- Subsequent commit will put this code into the archive so it no longer builds in the GATK
2012-12-29 14:34:08 -05:00
Mark DePristo 0f04485c24 NanoScheduler optimization: don't use a PriorityBlockingQueue for the MapResultsQueue
-- Created a separate, limited interface MapResultsQueue object that previously was set to the PriorityBlockingQueue.
-- The MapResultsQueue is now backed by a synchronized ExpandingArrayList, since job ids are integers incrementing from 0 to N.  This means we avoid the n log n sort in the priority queue which was generating a lot of cost in the reduce step
-- Had to update ReducerUnitTest because the test itself was brittle, and broken when I changed the underlying code.
-- A few bits of minor code cleanup through the system (removing unused constructors, local variables, etc)
-- ExpandingArrayList called ensureCapacity so that we increase the size of the arraylist once to accommodate the upcoming size needs
2012-12-24 13:35:56 -05:00
Mark DePristo 1de2f527b9 Optimization of recalibrateRead
-- Refactor calculation so that upfront constant values are pre-computed, and cached, and their values just looked up during application
-- Trivial comment on how we might use BAQ better in BaseRecalibrator
2012-12-17 16:47:27 -05:00
Mark DePristo 5ec25797b3 Optimizations for BaseRecalibrator
-- No longer computes at each update the overall read group table.  Now computes this derived table only at the end of the computation, using the ByQual table as input.  Reduces BQSR runtime by 1/3 in my test
2012-12-17 16:47:27 -05:00
David Roazen 884d031e72 NestedIntegerArray: Pre-allocate only the first two dimensions
It turns out that pre-allocating the entire tree was too expensive in
terms of memory when using large values for the -mcs and -ics parameters.

Pre-allocating the first two dimensions prevents us from ever locking the
root node during a put(). Contention between threads over lower levels
of the tree should be minimal given that puts are rare compared to gets.

Also output dimensions and pre-allocation info at startup. If pre-allocation
takes longer than usual this gives the user a sense of what is causing the
delay.
2012-10-25 15:17:42 -04:00
David Roazen d9aa9855f8 Better comments in NestedIntegerArray 2012-10-24 15:29:13 -04:00
David Roazen 991658acf4 BQSR: use more granular locking for concurrency control
-With this change, BQSR performance scales properly by thread rather
 than gaining nothing from additional threads.
-Benefits are seen when using either -nt (HierarchicalMicroScheduler) or -nct
 (NanoScheduler)
-Removes high-level locks in the recalibration engines and NestedIntegerArray
 in favor of maximally-granular locks on and around manipulation of the leaf
 nodes of the NestedIntegerArray.
-NestedIntegerArray now creates all interior nodes upfront rather than on
 the fly to avoid the need for locking during tree traversals. This uses
 more memory in the initial part of BQSR runs, but the BQSR would eventually
 converge to use this memory anyway over the course of a typical run.

IMPORTANT NOTE: This does not mean it's safe to run the old BaseRecalibrator
walker with multiple threads. The BaseRecalibrator walker is and will never be
thread-safe, as it's a LocusWalker that uses read attributes to track
state information. ONLY the newer DelocalizedBaseRecalibrator can be made
thread-safe (and will hopefully be made so in my subsequent commits). This
commit addresses performance, not correctness.
2012-10-24 15:22:50 -04:00
David Roazen b30e2a5b7d BQSR: tool to profile the effects of more-granular locking on scalability by # of threads 2012-10-16 14:43:16 -04:00
David Roazen ac87ed47bb BQSR: allow logging recal table updates to a file
For testing/debugging purposes only
2012-10-01 14:18:34 -04:00
Eric Banks a4670113bd Refactored/renamed the nested integer array; cleaned up code a bit. 2012-07-03 00:12:33 -04:00
Eric Banks cac72bce91 Initial version of int indexed mapping for BQSR. Will be cleaned up in a bit. 2012-07-02 14:33:33 -04:00
Eric Banks 1fafd9f6c8 NestedHashMap-based implementation of BQSRv2 along with a few minor optimizations. Not a huge runtime upgrade over the long bitset approach, but it allows us to implement further optimizations going forward. Integration test change because the original version had a bug in the quantized qual table creation. 2012-06-27 16:55:49 -04:00
Ryan Poplin 0ad7d5fbc1 Standalone common Pair HMM utility class with associated unit tests. 2012-03-01 22:41:13 -05:00
Mark DePristo 9992c373be Optimize imports run on the whole project, public and private. I just got too tired of all of the unused imports floating around. Confirmed that the system builds after the changes. 2011-07-17 20:29:58 -04:00
David Roazen 3c9497788e Reorganized the codebase beneath top-level public and private directories,
removing the playground and oneoffprojects directories in the process. Updated
build.xml accordingly.
2011-06-28 06:55:19 -04:00