gatk-3.8/public/java/test/org/broadinstitute/sting/utils
Mark DePristo e3e8631ff5 Working version of HaplotypeCaller ReferenceConfidenceModel that accounts for indels as well as SNP confidences
-- Assembly graph building now returns an object that describes whether the graph was successfully built and has variation, was succesfully built but didn't have variation, or truly failed in construction.  Fixing an annoying bug where you'd prefectly assembly the sequence into the reference graph, but then return a null graph because of this, and you'd increase your kmer because it null was also used to indicate assembly failure
--
-- Output format looks like:
20      10026072        .       T       <NON_REF>       .       .       .       GT:AD:DP:GQ:PL  0/0:3,0:3:9:0,9,120
20      10026073        .       A       <NON_REF>       .       .       .       GT:AD:DP:GQ:PL  0/0:3,0:3:9:0,9,119
20      10026074        .       T       <NON_REF>       .       .       .       GT:AD:DP:GQ:PL  0/0:3,0:3:9:0,9,121
20      10026075        .       T       <NON_REF>       .       .       .       GT:AD:DP:GQ:PL  0/0:3,0:3:9:0,9,119
20      10026076        .       T       <NON_REF>       .       .       .       GT:AD:DP:GQ:PL  0/0:3,0:3:9:0,9,120
20      10026077        .       T       <NON_REF>       .       .       .       GT:AD:DP:GQ:PL  0/0:3,0:3:9:0,9,120
20      10026078        .       C       <NON_REF>       .       .       .       GT:AD:DP:GQ:PL  0/0:5,0:5:15:0,15,217
20      10026079        .       A       <NON_REF>       .       .       .       GT:AD:DP:GQ:PL  0/0:6,0:6:18:0,18,240
20      10026080        .       G       <NON_REF>       .       .       .       GT:AD:DP:GQ:PL  0/0:6,0:6:18:0,18,268
20      10026081        .       T       <NON_REF>       .       .       .       GT:AD:DP:GQ:PL  0/0:7,0:7:21:0,21,267

We use a symbolic allele to indicate that the site is hom-ref, and because we have an ALT allele we can provide AD and PL field values.  Currently these are calculated as ref vs. any non-ref value (mismatch or insertion) but doesn't yet account properly for alignment uncertainty.
-- Can we enabled for single samples with --emitRefConfidence (-ERC).
-- This is accomplished by realigning the each read to its most likley haplotype, and then evaluting the resulting pileups over the active region interval.  The realignment is done by the HaplotypeBAMWriter, which now has a generalized interface that lets us provide a ReadDestination object so we can capture the realigned reads
-- Provide access to the more raw LocusIteratorByState constructor so we can more easily make them programmatically without constructing lots of misc. GATK data structures.  Moved the NO_DOWNSAMPLING constant from LIBSDownsamplingInfo to LocusIteratorByState so clients can use it without making LIBSDownsamplingInfo a public class.
-- Includes GVCF writer
-- Add 1 mb of WEx data to private/testdata
-- Integration tests for reference model output for WGS and WEx data
-- Emit GQ block information into VCF header for GVCF mode
-- OutputMode from StandardCallerArgumentCollection moved to UnifiedArgumentCollection as its no longer relevant for HC
-- Control max indel size for the reference confidence model from the command line.  Increase default to 10
-- Don't use out_mode in HaplotypeCallerComplexAndSymbolicVariantsIntegrationTest
-- Unittests for ReferenceConfidenceModel
-- Unittests for new MathUtils functions
2013-07-02 15:46:38 -04:00
..
R Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
activeregion Create a new normalDistributionLog10 function that is unit tested for use in the VQSR. 2013-05-30 16:00:08 -04:00
baq Fixing BQSR/BAQ bug: 2013-01-31 11:03:17 -05:00
classloader Enable convenient display of diff engine output in Bamboo, plus misc. minor test-related improvements 2013-05-10 19:00:33 -04:00
clipping Bugfix for HaplotypeCaller error: Only one of refStart or refStop must be < 0, not both 2013-06-04 10:33:46 -04:00
codecs/hapmap Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
collections Fixing license on Yossi's file 2013-02-05 11:14:43 -05:00
crypt Update expected test output for Java 7 2013-05-01 16:18:01 -04:00
fasta Move BaseUtils back to the GATK by request, along with associated utility methods 2013-01-30 13:09:44 -05:00
file Detect stuck lock-acquisition calls, and disable file locking for tests 2013-04-24 22:49:02 -04:00
fragments Critical bugfix for adapter clipping in HaplotypeCaller 2013-07-02 15:46:36 -04:00
haplotype Major improvements to HC that trims down active regions before genotyping 2013-04-08 12:47:49 -04:00
interval Intervals: fix bug where we could fail to find the intersection of unsorted/missorted interval lists 2013-04-02 14:01:52 -04:00
io Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
locusiterator Working version of HaplotypeCaller ReferenceConfidenceModel that accounts for indels as well as SNP confidences 2013-07-02 15:46:38 -04:00
nanoScheduler Further tweaking of test timeouts 2013-03-15 14:49:21 -04:00
pileup Last manual license update (hopefully) 2013-01-18 16:13:07 -05:00
progressmeter Subshard timeouts in the GATK 2013-05-15 07:00:39 -04:00
recalibration Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
report Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
runtime Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
sam Critical bugfix for adapter clipping in HaplotypeCaller 2013-07-02 15:46:36 -04:00
smithwaterman New faster Smith-Waterman implementation that is edge greedy and assumes that ref and haplotype have same global start/end points. 2013-05-13 09:36:39 -04:00
text Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
threading Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
variant Better GATK version and command line output 2013-06-20 11:19:13 -04:00
AutoFormattingTimeUnitTest.java AutoFormattingTimeUnitTest should be in utils 2013-01-30 09:47:47 -05:00
BaseUtilsUnitTest.java More aggressive checking of AWS key quality upon startup in the GATK 2013-01-31 09:08:38 -05:00
BitSetUtilsUnitTest.java Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
GenomeLocParserBenchmark.java Optimize GenomeLocParser.createGenomeLoc 2013-01-30 09:47:47 -05:00
GenomeLocParserUnitTest.java Refactoring and unit testing GenomeLocParser 2013-01-30 09:47:47 -05:00
GenomeLocSortedSetUnitTest.java Fixed the add functionality of GenomeLocSortedSet. 2013-02-28 23:31:00 -05:00
GenomeLocUnitTest.java Added distance across contigs calculation to GenomeLocs 2013-02-07 16:31:41 -05:00
MRUCachingSAMSequencingDictionaryUnitTest.java Refactoring and unit testing GenomeLocParser 2013-01-30 09:47:47 -05:00
MWUnitTest.java Move some VCF/VariantContext methods back to the GATK based on feedback 2013-01-29 16:56:55 -05:00
MathUtilsUnitTest.java Working version of HaplotypeCaller ReferenceConfidenceModel that accounts for indels as well as SNP confidences 2013-07-02 15:46:38 -04:00
MedianUnitTest.java Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
NGSPlatformUnitTest.java Expand NGSPlatform to meet SAM 1.4 spec, with full unit tests 2013-02-09 11:16:21 -05:00
PathUtilsUnitTest.java Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
QualityUtilsUnitTest.java Final edge case bug fixes to QualityUtil routines 2013-02-16 07:31:38 -08:00
SequenceDictionaryUtilsUnitTest.java Sequence dictionary validation: detect problematic contig indexing differences 2013-02-25 11:14:22 -05:00
SimpleTimerUnitTest.java Fix tests that were consistently or intermittently failing when run in parallel on the farm 2013-03-06 13:56:54 -05:00
UtilsUnitTest.java New faster Smith-Waterman implementation that is edge greedy and assumes that ref and haplotype have same global start/end points. 2013-05-13 09:36:39 -04:00