gatk-3.8/public
Mark DePristo 39e4396de0 New ActiveRegionShardBalancer allows efficient NanoScheduling
-- Previously we used the LocusShardBalancer for the haplotype caller, which meant that TraverseActiveRegions saw its shards grouped in chunks of 16kb bits on the genome.  These locus shards are useful when you want to use the HierarchicalMicroScheduler, as they provide fine-grained accessed to the underlying BAM, but they have two major drawbacks (1) we have to fairly frequently reset our state in TAR to handle moving between shard boundaries and (2) with the nano scheduled TAR we end up blocking at the end of each shard while our threads all finish processing.
-- This commit changes the system over to using an ActiveRegionShardBalancers, that combines all of the shard data for a single contig into a single combined shard.  This ensures that TAR, and by extensions the HaplotypeCaller, gets all of the data on a single contig together so the the NanoSchedule runs efficiently instead of blocking over and over at shard boundaries.  This simple change allows us to scale efficiently to around 8 threads in the nano scheduler:
  -- See https://www.dropbox.com/s/k7f280pd2zt0lyh/hc_nano_linear_scale.pdf
  -- See https://www.dropbox.com/s/fflpnan802m2906/hc_nano_log_scale.pdf
-- Misc. changes throughout the codebase so we Use the ActiveRegionShardBalancer where appropriate.
-- Added unit tests for ActiveRegionShardBalancer to confirm it does the merging as expected.
-- Fix bad toString in FilePointer
2013-05-13 11:09:02 -04:00
..
R Improve queue script jobreport visualization script 2013-05-07 12:11:46 -04:00
c At chartl's request, add the bwa aln -N and bwa aln -m parameters to the bindings. 2012-01-17 14:47:53 -05:00
chainFiles
doc Fixed issues raised by Appistry QA (mostly small fixes, corrections & clarifications to GATKDocs) 2013-03-12 10:57:14 -04:00
java New ActiveRegionShardBalancer allows efficient NanoScheduling 2013-05-13 11:09:02 -04:00
keys Public-key authorization scheme to restrict use of NO_ET 2012-03-06 00:09:43 -05:00
packages ValidatingPileup was renamed to CheckPileup 2013-02-15 11:56:19 -05:00
perl Split out contig names from Reference .fai file on white space (to support the GATK resource bundle's file human_g1k_v37.fasta.fai.gz, which does not use tab delimiters) 2012-06-07 16:56:32 -04:00
scala Enable convenient display of diff engine output in Bamboo, plus misc. minor test-related improvements 2013-05-10 19:00:33 -04:00
testdata Revert changes to exampleFASTA.fasta.fai for now to get tests passing again 2013-05-02 12:59:20 -04:00