gatk-3.8/public/java/test/org/broadinstitute/sting/utils
Chris Hartl 73d1c319bf Rarely-occurring logic bugfix for GenotypeConcordance, streamlining and testing of MathUtils
Currently, the multi-allelic test is covering the following case:

Eval   A   T,C
Comp   A   C

reciprocate this so that the reverse can be covered.

Eval   A   C
Comp   A   T,C

And furthermore, modify ConcordanceMetrics to more properly handle the situation where multiple alternate alleles are available in the comp. It was possible for an eval C/C sample to match a comp T/T sample, so long as the C allele were also present in at least one other comp sample.

This comes from the fact that "truth" reference alleles can be paired with *any* allele also present in the truth VCF, while truth het/hom var sites are restricted to having to match only the alleles present in the genotype. The reason that truth ref alleles are special case is as follows, imagine:

Eval:   A  G,T      0/0   2/0   2/2   1/1
Comp:   A  C,T      0/0   1/0   0/0   0/0

Even though the alt allele of the comp is a C, the assessment of genotypes should be as follows:

Sample1: ref called ref
Sample2: alleles don't match (the alt allele of the comp was not assessed in eval)
Sample3: ref called hom-var
Sample4: alleles don't match (the alt allele of the eval was not assessed in comp)

Before this change, Sample2 was evaluated as "het called het" (as the T allele in eval happens to also be in the comp record, just not in the comp sample). Thus: apply current
logic to comp hom-refs, and the more restrictive logic ("you have to match an allele in the comp genotype") when the comp is not reference.

Also in this commit,major refactoring and testing for MathUtils. A large number of methods were not used at all in the codebase, these methods were removed:
 - dotProduct(several types). logDotProduct is used extensively, but not the real-space version.
 - vectorSum
 - array shuffle, random subset
 - countOccurances (general forms, the char form is used in the codebase)
 - getNMaxElements
 - array permutation
 - sorted array permutation
 - compare floats
 - sum() (for integer arrays and lists).

Final keyword was extensively added to MathUtils.

The ratio() and percentage() methods were revised to error out with non-positive denominators, except in the case of 0/0 (which returns 0.0 (ratio), or 0.0% (percentage)). Random sampling code was updated to make use of the cleaner implementations of generating permutations in MathUtils (allowing the array permutation code to be retired).

The PaperGenotyper still made use of one of these array methods, since it was the only walker it was migrated into the genotyper itself.

In addition, more extensive tests were added for
 - logBinomialCoefficient (Newton's identity should always hold)
 - logFactorial
 - log10sumlog10 and its approximation

All unit tests pass
2013-03-28 23:25:28 -04:00
..
R Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
activeregion Rarely-occurring logic bugfix for GenotypeConcordance, streamlining and testing of MathUtils 2013-03-28 23:25:28 -04:00
baq Fixing BQSR/BAQ bug: 2013-01-31 11:03:17 -05:00
clipping Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
codecs/hapmap Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
collections Fixing license on Yossi's file 2013-02-05 11:14:43 -05:00
crypt Update MD5s to reflect version number change in the BAM header 2013-02-01 13:51:31 -05:00
fasta Move BaseUtils back to the GATK by request, along with associated utility methods 2013-01-30 13:09:44 -05:00
fragments New GATKSAMRecord concept of a strandless read, update to FS 2013-03-13 11:16:36 -04:00
haplotypeBAMWriter Expanded functionality for writing BAMs from HaplotypeCaller 2013-03-03 12:07:29 -05:00
interval Update MD5s to reflect version number change in the BAM header 2013-02-01 13:51:31 -05:00
io Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
locusiterator Cleanup and unit tests for QualityUtils 2013-02-16 07:31:37 -08:00
nanoScheduler Further tweaking of test timeouts 2013-03-15 14:49:21 -04:00
pileup Last manual license update (hopefully) 2013-01-18 16:13:07 -05:00
progressmeter Resolves Genome Sequence Analysis GSA-750 Don't print an endless series of starting messages from the ProgressMeter 2013-02-04 15:47:30 -05:00
recalibration Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
report Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
runtime Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
sam Refactored the het (polyploid) consensus creation in ReduceReads. 2013-03-25 09:34:54 -04:00
text Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
threading Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
variant AssessNA12878 bugfixes 2013-03-18 15:48:08 -04:00
AutoFormattingTimeUnitTest.java AutoFormattingTimeUnitTest should be in utils 2013-01-30 09:47:47 -05:00
BaseUtilsUnitTest.java More aggressive checking of AWS key quality upon startup in the GATK 2013-01-31 09:08:38 -05:00
BitSetUtilsUnitTest.java Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
GenomeLocParserBenchmark.java Optimize GenomeLocParser.createGenomeLoc 2013-01-30 09:47:47 -05:00
GenomeLocParserUnitTest.java Refactoring and unit testing GenomeLocParser 2013-01-30 09:47:47 -05:00
GenomeLocSortedSetUnitTest.java Fixed the add functionality of GenomeLocSortedSet. 2013-02-28 23:31:00 -05:00
GenomeLocUnitTest.java Added distance across contigs calculation to GenomeLocs 2013-02-07 16:31:41 -05:00
HaplotypeUnitTest.java Expanded functionality for writing BAMs from HaplotypeCaller 2013-03-03 12:07:29 -05:00
MRUCachingSAMSequencingDictionaryUnitTest.java Refactoring and unit testing GenomeLocParser 2013-01-30 09:47:47 -05:00
MWUnitTest.java Move some VCF/VariantContext methods back to the GATK based on feedback 2013-01-29 16:56:55 -05:00
MathUtilsUnitTest.java Rarely-occurring logic bugfix for GenotypeConcordance, streamlining and testing of MathUtils 2013-03-28 23:25:28 -04:00
MedianUnitTest.java Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
NGSPlatformUnitTest.java Expand NGSPlatform to meet SAM 1.4 spec, with full unit tests 2013-02-09 11:16:21 -05:00
PathUtilsUnitTest.java Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
QualityUtilsUnitTest.java Final edge case bug fixes to QualityUtil routines 2013-02-16 07:31:38 -08:00
SequenceDictionaryUtilsUnitTest.java Sequence dictionary validation: detect problematic contig indexing differences 2013-02-25 11:14:22 -05:00
SimpleTimerUnitTest.java Fix tests that were consistently or intermittently failing when run in parallel on the farm 2013-03-06 13:56:54 -05:00
UtilsUnitTest.java Cleanup of FragmentUtils 2013-03-13 07:36:20 -04:00