gatk-3.8/public/java/test/org/broadinstitute/sting/utils
Mark DePristo 50cdffc61f Slightly improved Smith-Waterman parameter values for HaplotypeCaller Path comparisons
Key improvement
---------------
-- The haplotype caller was producing unstable calls when comparing the following two haplotypes:

ref:               ACAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGA
alt: TGTGTGTGTGTGTGACAGAGAGAGAGAGAGAGAGAGAGAGAGAGA

in which the alt and ref haplotypes differ in having indel at both the start and end of the bubble.  The previous parameter values used in the Path algorithm were set so that such haplotype comparisons would result in the either the above alignment or the following alignment depending on exactly how many GA units were present in the bubble.

ref: ACAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGA
alt: TGTGTGTGTGTGTGACAGAGAGAGAGAGAGAGAGAGAGAGAGAGA

The number of elements could vary depending on how the graph was built, and resulted in real differences in the calls between BWA mem and BWA-SW calls.  I added a few unit tests for this case, and found a set of SW parameter values with lower gap-extension penalties that significantly favor the first alignment, which is the right thing to do, as we really don't mind large indels in the haplotypes relative to having lots of mismatches.

-- Expanded the unit tests in both SW and KBestPaths to look at complex events like this, and to check as well somewhat sysmatically that we are finding many types of expected mutational events.
-- Verified that this change doesn't alter our calls on 20:10,000,000-11,000,000 at all

General code cleanup
--------------------
-- Move Smith-Waterman to its own package in utils
-- Refactored out SWParameters class in SWPairwiseAlignment, and made constructors take either a named parameter set or a Parameter object directly.  Depreciated old call to inline constants.  This makes it easier to group all of the SW parameters into a single object for callers
-- Update users of SW code to use new Parameter class
-- Also moved haplotype bam writers to protected so they can use the Path SW parameter, which is protected
-- Removed the storage of the SW scoring matrix in SWPairwiseAligner by default.  Only the SWPairwiseAlignmentMain test program needs this, so added a gross protected static variable that enables its storage
2013-04-11 18:22:55 -04:00
..
R Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
activeregion Major improvements to HC that trims down active regions before genotyping 2013-04-08 12:47:49 -04:00
baq Fixing BQSR/BAQ bug: 2013-01-31 11:03:17 -05:00
clipping Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
codecs/hapmap Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
collections Fixing license on Yossi's file 2013-02-05 11:14:43 -05:00
crypt Update MD5s to reflect version number change in the BAM header 2013-02-01 13:51:31 -05:00
fasta Move BaseUtils back to the GATK by request, along with associated utility methods 2013-01-30 13:09:44 -05:00
fragments New GATKSAMRecord concept of a strandless read, update to FS 2013-03-13 11:16:36 -04:00
haplotype Major improvements to HC that trims down active regions before genotyping 2013-04-08 12:47:49 -04:00
interval Intervals: fix bug where we could fail to find the intersection of unsorted/missorted interval lists 2013-04-02 14:01:52 -04:00
io Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
locusiterator LIBS unit test debugging should be false 2013-04-08 12:47:47 -04:00
nanoScheduler Further tweaking of test timeouts 2013-03-15 14:49:21 -04:00
pileup Last manual license update (hopefully) 2013-01-18 16:13:07 -05:00
progressmeter Resolves Genome Sequence Analysis GSA-750 Don't print an endless series of starting messages from the ProgressMeter 2013-02-04 15:47:30 -05:00
recalibration Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
report Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
runtime Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
sam Critical bugfix to ReduceRead functionality of the GATKSAMRecord 2013-04-08 12:47:50 -04:00
text Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
threading Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
variant AssessNA12878 bugfixes 2013-03-18 15:48:08 -04:00
AutoFormattingTimeUnitTest.java AutoFormattingTimeUnitTest should be in utils 2013-01-30 09:47:47 -05:00
BaseUtilsUnitTest.java More aggressive checking of AWS key quality upon startup in the GATK 2013-01-31 09:08:38 -05:00
BitSetUtilsUnitTest.java Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
GenomeLocParserBenchmark.java Optimize GenomeLocParser.createGenomeLoc 2013-01-30 09:47:47 -05:00
GenomeLocParserUnitTest.java Refactoring and unit testing GenomeLocParser 2013-01-30 09:47:47 -05:00
GenomeLocSortedSetUnitTest.java Fixed the add functionality of GenomeLocSortedSet. 2013-02-28 23:31:00 -05:00
GenomeLocUnitTest.java Added distance across contigs calculation to GenomeLocs 2013-02-07 16:31:41 -05:00
MRUCachingSAMSequencingDictionaryUnitTest.java Refactoring and unit testing GenomeLocParser 2013-01-30 09:47:47 -05:00
MWUnitTest.java Move some VCF/VariantContext methods back to the GATK based on feedback 2013-01-29 16:56:55 -05:00
MathUtilsUnitTest.java Rarely-occurring logic bugfix for GenotypeConcordance, streamlining and testing of MathUtils 2013-03-28 23:25:28 -04:00
MedianUnitTest.java Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
NGSPlatformUnitTest.java Expand NGSPlatform to meet SAM 1.4 spec, with full unit tests 2013-02-09 11:16:21 -05:00
PathUtilsUnitTest.java Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
QualityUtilsUnitTest.java Final edge case bug fixes to QualityUtil routines 2013-02-16 07:31:38 -08:00
SequenceDictionaryUtilsUnitTest.java Sequence dictionary validation: detect problematic contig indexing differences 2013-02-25 11:14:22 -05:00
SimpleTimerUnitTest.java Fix tests that were consistently or intermittently failing when run in parallel on the farm 2013-03-06 13:56:54 -05:00
UtilsUnitTest.java Cleanup of FragmentUtils 2013-03-13 07:36:20 -04:00