gatk-3.8/public/java/test/org/broadinstitute/sting/gatk
David Roazen 95b5f99feb Exclude reduced reads from elimination during downsampling
Problem:
-Downsamplers were treating reduced reads the same as normal reads,
 with occasionally catastrophic results on variant calling when an
 entire reduced read happened to get eliminated.

Solution:
-Since reduced reads lack the information we need to do position-based
 downsampling on them, best available option for now is to simply
 exempt all reduced reads from elimination during downsampling.

Details:
-Add generic capability of exempting items from elimination to
 the Downsampler interface via new doNotDiscardItem() method.
 Default inherited version of this method exempts all reduced reads
 (or objects encapsulating reduced reads) from elimination.

-Switch from interfaces to abstract classes to facilitate this change,
 and do some minor refactoring of the Downsampler interface (push
 implementation of some methods into the abstract classes, improve
 names of the confusing clear() and reset() methods).

-Rewrite TAROrderedReadCache. This class was incorrectly relying
 on the ReservoirDownsampler to preserve the relative ordering of
 items in some circumstances, which was behavior not guaranteed by
 the API and only happened to work due to implementation details
 which no longer apply. Restructured this class around the assumption
 that the ReservoirDownsampler will not preserve relative ordering
 at all.

-Add disclaimer to description of -dcov argument explaining that
 coverage targets are approximate goals that will not always be
 precisely met.

-Unit tests for all individual downsamplers to verify that reduced
 reads are exempted from elimination
2013-06-11 16:16:26 -04:00
..
datasources Implement ActiveRegionTraversal RefMetaDataTracker for map call; HaplotypeCaller now annotates ID from dbSNP 2013-06-10 16:20:31 -04:00
downsampling Exclude reduced reads from elimination during downsampling 2013-06-11 16:16:26 -04:00
executive Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
filters This commit addresses JIRA issue GSA-948: Prevent users from doing the wrong thing with RNA-Seq data and the GATK. 2013-06-10 10:44:42 -04:00
iterators Refactor LIBS into utils.locusiterator before refactoring 2013-01-11 15:17:16 -05:00
refdata Detect stuck lock-acquisition calls, and disable file locking for tests 2013-04-24 22:49:02 -04:00
report Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
samples Trivial update to ceutrio.ped file to make it really the CEU trio sample names 2013-05-14 17:08:13 -04:00
traversals Exclude reduced reads from elimination during downsampling 2013-06-11 16:16:26 -04:00
walkers Fixes to get accurate read counts for Read traversals 2013-05-21 15:24:07 -04:00
CommandLineGATKUnitTest.java Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
EngineFeaturesIntegrationTest.java This commit addresses JIRA issue GSA-948: Prevent users from doing the wrong thing with RNA-Seq data and the GATK. 2013-06-10 10:44:42 -04:00
GenomeAnalysisEngineUnitTest.java Added the functionality to impose a relative ordering on ReadTransformers in the GATK engine. 2013-03-06 12:38:59 -05:00
MaxRuntimeIntegrationTest.java Subshard timeouts in the GATK 2013-05-15 07:00:39 -04:00
ReadMetricsUnitTest.java Implement ActiveRegionTraversal RefMetaDataTracker for map call; HaplotypeCaller now annotates ID from dbSNP 2013-06-10 16:20:31 -04:00
WalkerManagerUnitTest.java Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00