gatk-3.8/public/java/test/org/broadinstitute/sting/gatk/datasources
Mark DePristo 39e4396de0 New ActiveRegionShardBalancer allows efficient NanoScheduling
-- Previously we used the LocusShardBalancer for the haplotype caller, which meant that TraverseActiveRegions saw its shards grouped in chunks of 16kb bits on the genome.  These locus shards are useful when you want to use the HierarchicalMicroScheduler, as they provide fine-grained accessed to the underlying BAM, but they have two major drawbacks (1) we have to fairly frequently reset our state in TAR to handle moving between shard boundaries and (2) with the nano scheduled TAR we end up blocking at the end of each shard while our threads all finish processing.
-- This commit changes the system over to using an ActiveRegionShardBalancers, that combines all of the shard data for a single contig into a single combined shard.  This ensures that TAR, and by extensions the HaplotypeCaller, gets all of the data on a single contig together so the the NanoSchedule runs efficiently instead of blocking over and over at shard boundaries.  This simple change allows us to scale efficiently to around 8 threads in the nano scheduler:
  -- See https://www.dropbox.com/s/k7f280pd2zt0lyh/hc_nano_linear_scale.pdf
  -- See https://www.dropbox.com/s/fflpnan802m2906/hc_nano_log_scale.pdf
-- Misc. changes throughout the codebase so we Use the ActiveRegionShardBalancer where appropriate.
-- Added unit tests for ActiveRegionShardBalancer to confirm it does the merging as expected.
-- Fix bad toString in FilePointer
2013-05-13 11:09:02 -04:00
..
providers Detect stuck lock-acquisition calls, and disable file locking for tests 2013-04-24 22:49:02 -04:00
reads New ActiveRegionShardBalancer allows efficient NanoScheduling 2013-05-13 11:09:02 -04:00
reference Remove auto-creation of fai/dict files for fasta references 2013-04-02 18:34:08 -04:00
rmd Detect stuck lock-acquisition calls, and disable file locking for tests 2013-04-24 22:49:02 -04:00