2010-04-20 07:00:08 +08:00
/ *
2013-01-11 06:04:08 +08:00
* Copyright ( c ) 2012 The Broad Institute
*
* Permission is hereby granted , free of charge , to any person
* obtaining a copy of this software and associated documentation
* files ( the "Software" ) , to deal in the Software without
* restriction , including without limitation the rights to use ,
* copy , modify , merge , publish , distribute , sublicense , and / or sell
* copies of the Software , and to permit persons to whom the
* Software is furnished to do so , subject to the following
* conditions :
*
* The above copyright notice and this permission notice shall be
* included in all copies or substantial portions of the Software .
*
* THE SOFTWARE IS PROVIDED "AS IS" , WITHOUT WARRANTY OF ANY KIND ,
* EXPRESS OR IMPLIED , INCLUDING BUT NOT LIMITED TO THE WARRANTIES
* OF MERCHANTABILITY , FITNESS FOR A PARTICULAR PURPOSE AND
* NONINFRINGEMENT . IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
* HOLDERS BE LIABLE FOR ANY CLAIM , DAMAGES OR OTHER LIABILITY ,
* WHETHER IN AN ACTION OF CONTRACT , TORT OR OTHERWISE , ARISING
* FROM , OUT OF OR IN CONNECTION WITH THE SOFTWARE OR
* THE USE OR OTHER DEALINGS IN THE SOFTWARE .
* /
2010-04-20 07:00:08 +08:00
2009-04-24 11:31:02 +08:00
package org.broadinstitute.sting.utils ;
2013-05-30 04:17:43 +08:00
import cern.jet.random.Normal ;
Working version of HaplotypeCaller ReferenceConfidenceModel that accounts for indels as well as SNP confidences
-- Assembly graph building now returns an object that describes whether the graph was successfully built and has variation, was succesfully built but didn't have variation, or truly failed in construction. Fixing an annoying bug where you'd prefectly assembly the sequence into the reference graph, but then return a null graph because of this, and you'd increase your kmer because it null was also used to indicate assembly failure
--
-- Output format looks like:
20 10026072 . T <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:3,0:3:9:0,9,120
20 10026073 . A <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:3,0:3:9:0,9,119
20 10026074 . T <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:3,0:3:9:0,9,121
20 10026075 . T <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:3,0:3:9:0,9,119
20 10026076 . T <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:3,0:3:9:0,9,120
20 10026077 . T <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:3,0:3:9:0,9,120
20 10026078 . C <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:5,0:5:15:0,15,217
20 10026079 . A <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:6,0:6:18:0,18,240
20 10026080 . G <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:6,0:6:18:0,18,268
20 10026081 . T <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:7,0:7:21:0,21,267
We use a symbolic allele to indicate that the site is hom-ref, and because we have an ALT allele we can provide AD and PL field values. Currently these are calculated as ref vs. any non-ref value (mismatch or insertion) but doesn't yet account properly for alignment uncertainty.
-- Can we enabled for single samples with --emitRefConfidence (-ERC).
-- This is accomplished by realigning the each read to its most likley haplotype, and then evaluting the resulting pileups over the active region interval. The realignment is done by the HaplotypeBAMWriter, which now has a generalized interface that lets us provide a ReadDestination object so we can capture the realigned reads
-- Provide access to the more raw LocusIteratorByState constructor so we can more easily make them programmatically without constructing lots of misc. GATK data structures. Moved the NO_DOWNSAMPLING constant from LIBSDownsamplingInfo to LocusIteratorByState so clients can use it without making LIBSDownsamplingInfo a public class.
-- Includes GVCF writer
-- Add 1 mb of WEx data to private/testdata
-- Integration tests for reference model output for WGS and WEx data
-- Emit GQ block information into VCF header for GVCF mode
-- OutputMode from StandardCallerArgumentCollection moved to UnifiedArgumentCollection as its no longer relevant for HC
-- Control max indel size for the reference confidence model from the command line. Increase default to 10
-- Don't use out_mode in HaplotypeCallerComplexAndSymbolicVariantsIntegrationTest
-- Unittests for ReferenceConfidenceModel
-- Unittests for new MathUtils functions
2013-06-20 03:07:35 +08:00
import org.apache.commons.lang.ArrayUtils ;
2011-12-29 13:41:59 +08:00
import org.broadinstitute.sting.BaseTest ;
2010-11-02 05:31:44 +08:00
import org.testng.Assert ;
import org.testng.annotations.BeforeClass ;
Working version of HaplotypeCaller ReferenceConfidenceModel that accounts for indels as well as SNP confidences
-- Assembly graph building now returns an object that describes whether the graph was successfully built and has variation, was succesfully built but didn't have variation, or truly failed in construction. Fixing an annoying bug where you'd prefectly assembly the sequence into the reference graph, but then return a null graph because of this, and you'd increase your kmer because it null was also used to indicate assembly failure
--
-- Output format looks like:
20 10026072 . T <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:3,0:3:9:0,9,120
20 10026073 . A <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:3,0:3:9:0,9,119
20 10026074 . T <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:3,0:3:9:0,9,121
20 10026075 . T <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:3,0:3:9:0,9,119
20 10026076 . T <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:3,0:3:9:0,9,120
20 10026077 . T <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:3,0:3:9:0,9,120
20 10026078 . C <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:5,0:5:15:0,15,217
20 10026079 . A <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:6,0:6:18:0,18,240
20 10026080 . G <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:6,0:6:18:0,18,268
20 10026081 . T <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:7,0:7:21:0,21,267
We use a symbolic allele to indicate that the site is hom-ref, and because we have an ALT allele we can provide AD and PL field values. Currently these are calculated as ref vs. any non-ref value (mismatch or insertion) but doesn't yet account properly for alignment uncertainty.
-- Can we enabled for single samples with --emitRefConfidence (-ERC).
-- This is accomplished by realigning the each read to its most likley haplotype, and then evaluting the resulting pileups over the active region interval. The realignment is done by the HaplotypeBAMWriter, which now has a generalized interface that lets us provide a ReadDestination object so we can capture the realigned reads
-- Provide access to the more raw LocusIteratorByState constructor so we can more easily make them programmatically without constructing lots of misc. GATK data structures. Moved the NO_DOWNSAMPLING constant from LIBSDownsamplingInfo to LocusIteratorByState so clients can use it without making LIBSDownsamplingInfo a public class.
-- Includes GVCF writer
-- Add 1 mb of WEx data to private/testdata
-- Integration tests for reference model output for WGS and WEx data
-- Emit GQ block information into VCF header for GVCF mode
-- OutputMode from StandardCallerArgumentCollection moved to UnifiedArgumentCollection as its no longer relevant for HC
-- Control max indel size for the reference confidence model from the command line. Increase default to 10
-- Don't use out_mode in HaplotypeCallerComplexAndSymbolicVariantsIntegrationTest
-- Unittests for ReferenceConfidenceModel
-- Unittests for new MathUtils functions
2013-06-20 03:07:35 +08:00
import org.testng.annotations.DataProvider ;
2010-11-02 05:31:44 +08:00
import org.testng.annotations.Test ;
2009-04-24 11:31:02 +08:00
2011-12-29 13:41:59 +08:00
import java.util.* ;
2009-04-24 11:31:02 +08:00
/ * *
* Basic unit test for MathUtils
* /
2010-04-08 14:14:15 +08:00
public class MathUtilsUnitTest extends BaseTest {
2009-04-24 11:31:02 +08:00
@BeforeClass
2011-12-29 13:41:59 +08:00
public void init ( ) {
}
2009-04-24 11:31:02 +08:00
2013-06-04 22:08:24 +08:00
/ * *
* Tests that we get unqiue values for the valid ( non - null - producing ) input space for { @link MathUtils # fastGenerateUniqueHashFromThreeIntegers ( int , int , int ) } .
* /
@Test
public void testGenerateUniqueHashFromThreePositiveIntegers ( ) {
logger . warn ( "Executing testGenerateUniqueHashFromThreePositiveIntegers" ) ;
Working version of HaplotypeCaller ReferenceConfidenceModel that accounts for indels as well as SNP confidences
-- Assembly graph building now returns an object that describes whether the graph was successfully built and has variation, was succesfully built but didn't have variation, or truly failed in construction. Fixing an annoying bug where you'd prefectly assembly the sequence into the reference graph, but then return a null graph because of this, and you'd increase your kmer because it null was also used to indicate assembly failure
--
-- Output format looks like:
20 10026072 . T <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:3,0:3:9:0,9,120
20 10026073 . A <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:3,0:3:9:0,9,119
20 10026074 . T <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:3,0:3:9:0,9,121
20 10026075 . T <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:3,0:3:9:0,9,119
20 10026076 . T <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:3,0:3:9:0,9,120
20 10026077 . T <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:3,0:3:9:0,9,120
20 10026078 . C <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:5,0:5:15:0,15,217
20 10026079 . A <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:6,0:6:18:0,18,240
20 10026080 . G <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:6,0:6:18:0,18,268
20 10026081 . T <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:7,0:7:21:0,21,267
We use a symbolic allele to indicate that the site is hom-ref, and because we have an ALT allele we can provide AD and PL field values. Currently these are calculated as ref vs. any non-ref value (mismatch or insertion) but doesn't yet account properly for alignment uncertainty.
-- Can we enabled for single samples with --emitRefConfidence (-ERC).
-- This is accomplished by realigning the each read to its most likley haplotype, and then evaluting the resulting pileups over the active region interval. The realignment is done by the HaplotypeBAMWriter, which now has a generalized interface that lets us provide a ReadDestination object so we can capture the realigned reads
-- Provide access to the more raw LocusIteratorByState constructor so we can more easily make them programmatically without constructing lots of misc. GATK data structures. Moved the NO_DOWNSAMPLING constant from LIBSDownsamplingInfo to LocusIteratorByState so clients can use it without making LIBSDownsamplingInfo a public class.
-- Includes GVCF writer
-- Add 1 mb of WEx data to private/testdata
-- Integration tests for reference model output for WGS and WEx data
-- Emit GQ block information into VCF header for GVCF mode
-- OutputMode from StandardCallerArgumentCollection moved to UnifiedArgumentCollection as its no longer relevant for HC
-- Control max indel size for the reference confidence model from the command line. Increase default to 10
-- Don't use out_mode in HaplotypeCallerComplexAndSymbolicVariantsIntegrationTest
-- Unittests for ReferenceConfidenceModel
-- Unittests for new MathUtils functions
2013-06-20 03:07:35 +08:00
2013-06-04 22:08:24 +08:00
final Set < Long > observedLongs = new HashSet < Long > ( ) ;
for ( short i = 0 ; i < Byte . MAX_VALUE ; i + + ) {
for ( short j = 0 ; j < Byte . MAX_VALUE ; j + + ) {
for ( short k = 0 ; k < Byte . MAX_VALUE ; k + + ) {
final Long aLong = MathUtils . fastGenerateUniqueHashFromThreeIntegers ( i , j , k ) ;
//System.out.println(String.format("%s, %s, %s: %s", i, j, k, aLong));
Assert . assertTrue ( observedLongs . add ( aLong ) ) ;
}
}
}
for ( short i = Byte . MAX_VALUE ; i < = Short . MAX_VALUE & & i > 0 ; i + = 128 ) {
for ( short j = Byte . MAX_VALUE ; j < = Short . MAX_VALUE & & j > 0 ; j + = 128 ) {
for ( short k = Byte . MAX_VALUE ; k < = Short . MAX_VALUE & & k > 0 ; k + = 128 ) {
final Long aLong = MathUtils . fastGenerateUniqueHashFromThreeIntegers ( i , j , k ) ;
// System.out.println(String.format("%s, %s, %s: %s", i, j, k, aLong));
Assert . assertTrue ( observedLongs . add ( aLong ) ) ;
}
}
}
}
2009-04-24 11:31:02 +08:00
/ * *
* Tests that we get the right values from the binomial distribution
* /
@Test
public void testBinomialProbability ( ) {
logger . warn ( "Executing testBinomialProbability" ) ;
2011-06-23 06:55:20 +08:00
Assert . assertEquals ( MathUtils . binomialProbability ( 3 , 2 , 0.5 ) , 0.375 , 0.0001 ) ;
Assert . assertEquals ( MathUtils . binomialProbability ( 100 , 10 , 0.5 ) , 1.365543e-17 , 1e-18 ) ;
Assert . assertEquals ( MathUtils . binomialProbability ( 217 , 73 , 0.02 ) , 4.521904e-67 , 1e-68 ) ;
Assert . assertEquals ( MathUtils . binomialProbability ( 300 , 100 , 0.02 ) , 9.27097e-91 , 1e-92 ) ;
Assert . assertEquals ( MathUtils . binomialProbability ( 300 , 150 , 0.98 ) , 6.462892e-168 , 1e-169 ) ;
Assert . assertEquals ( MathUtils . binomialProbability ( 300 , 120 , 0.98 ) , 3.090054e-221 , 1e-222 ) ;
Assert . assertEquals ( MathUtils . binomialProbability ( 300 , 112 , 0.98 ) , 2.34763e-236 , 1e-237 ) ;
2009-04-24 11:31:02 +08:00
}
2009-04-27 22:59:11 +08:00
2013-04-10 03:37:08 +08:00
/ * *
* Tests that we get the right values from the binomial distribution
* /
@Test
public void testCumulativeBinomialProbability ( ) {
logger . warn ( "Executing testCumulativeBinomialProbability" ) ;
2013-06-04 22:08:24 +08:00
for ( int j = 0 ; j < 2 ; j + + ) { // Test memoizing functionality, as well.
final int numTrials = 10 ;
for ( int i = 0 ; i < numTrials ; i + + )
Assert . assertEquals ( MathUtils . binomialCumulativeProbability ( numTrials , i , i ) , MathUtils . binomialProbability ( numTrials , i ) , 1e-10 , String . format ( "k=%d, n=%d" , i , numTrials ) ) ;
Working version of HaplotypeCaller ReferenceConfidenceModel that accounts for indels as well as SNP confidences
-- Assembly graph building now returns an object that describes whether the graph was successfully built and has variation, was succesfully built but didn't have variation, or truly failed in construction. Fixing an annoying bug where you'd prefectly assembly the sequence into the reference graph, but then return a null graph because of this, and you'd increase your kmer because it null was also used to indicate assembly failure
--
-- Output format looks like:
20 10026072 . T <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:3,0:3:9:0,9,120
20 10026073 . A <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:3,0:3:9:0,9,119
20 10026074 . T <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:3,0:3:9:0,9,121
20 10026075 . T <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:3,0:3:9:0,9,119
20 10026076 . T <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:3,0:3:9:0,9,120
20 10026077 . T <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:3,0:3:9:0,9,120
20 10026078 . C <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:5,0:5:15:0,15,217
20 10026079 . A <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:6,0:6:18:0,18,240
20 10026080 . G <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:6,0:6:18:0,18,268
20 10026081 . T <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:7,0:7:21:0,21,267
We use a symbolic allele to indicate that the site is hom-ref, and because we have an ALT allele we can provide AD and PL field values. Currently these are calculated as ref vs. any non-ref value (mismatch or insertion) but doesn't yet account properly for alignment uncertainty.
-- Can we enabled for single samples with --emitRefConfidence (-ERC).
-- This is accomplished by realigning the each read to its most likley haplotype, and then evaluting the resulting pileups over the active region interval. The realignment is done by the HaplotypeBAMWriter, which now has a generalized interface that lets us provide a ReadDestination object so we can capture the realigned reads
-- Provide access to the more raw LocusIteratorByState constructor so we can more easily make them programmatically without constructing lots of misc. GATK data structures. Moved the NO_DOWNSAMPLING constant from LIBSDownsamplingInfo to LocusIteratorByState so clients can use it without making LIBSDownsamplingInfo a public class.
-- Includes GVCF writer
-- Add 1 mb of WEx data to private/testdata
-- Integration tests for reference model output for WGS and WEx data
-- Emit GQ block information into VCF header for GVCF mode
-- OutputMode from StandardCallerArgumentCollection moved to UnifiedArgumentCollection as its no longer relevant for HC
-- Control max indel size for the reference confidence model from the command line. Increase default to 10
-- Don't use out_mode in HaplotypeCallerComplexAndSymbolicVariantsIntegrationTest
-- Unittests for ReferenceConfidenceModel
-- Unittests for new MathUtils functions
2013-06-20 03:07:35 +08:00
2013-06-04 22:08:24 +08:00
Assert . assertEquals ( MathUtils . binomialCumulativeProbability ( 10 , 0 , 2 ) , 0.05468750 , 1e-7 ) ;
Assert . assertEquals ( MathUtils . binomialCumulativeProbability ( 10 , 0 , 5 ) , 0.62304687 , 1e-7 ) ;
Assert . assertEquals ( MathUtils . binomialCumulativeProbability ( 10 , 0 , 10 ) , 1.0 , 1e-7 ) ;
}
2013-04-10 03:37:08 +08:00
}
2009-04-27 22:59:11 +08:00
/ * *
* Tests that we get the right values from the multinomial distribution
* /
@Test
public void testMultinomialProbability ( ) {
logger . warn ( "Executing testMultinomialProbability" ) ;
2011-12-29 13:41:59 +08:00
int [ ] counts0 = { 2 , 0 , 1 } ;
double [ ] probs0 = { 0.33 , 0.33 , 0.34 } ;
2011-06-23 06:55:20 +08:00
Assert . assertEquals ( MathUtils . multinomialProbability ( counts0 , probs0 ) , 0.111078 , 1e-6 ) ;
2009-05-01 14:27:05 +08:00
2011-12-29 13:41:59 +08:00
int [ ] counts1 = { 10 , 20 , 30 } ;
double [ ] probs1 = { 0.25 , 0.25 , 0.50 } ;
2011-06-23 06:55:20 +08:00
Assert . assertEquals ( MathUtils . multinomialProbability ( counts1 , probs1 ) , 0.002870301 , 1e-9 ) ;
2009-05-01 14:27:05 +08:00
2011-12-29 13:41:59 +08:00
int [ ] counts2 = { 38 , 82 , 50 , 36 } ;
double [ ] probs2 = { 0.25 , 0.25 , 0.25 , 0.25 } ;
2011-06-23 06:55:20 +08:00
Assert . assertEquals ( MathUtils . multinomialProbability ( counts2 , probs2 ) , 1.88221e-09 , 1e-10 ) ;
2009-05-01 14:27:05 +08:00
2011-12-29 13:41:59 +08:00
int [ ] counts3 = { 1 , 600 , 1 } ;
double [ ] probs3 = { 0.33 , 0.33 , 0.34 } ;
2011-06-23 06:55:20 +08:00
Assert . assertEquals ( MathUtils . multinomialProbability ( counts3 , probs3 ) , 5.20988e-285 , 1e-286 ) ;
2009-04-27 22:59:11 +08:00
}
2010-04-20 07:00:08 +08:00
/ * *
* Tests that the random index selection is working correctly
* /
@Test
public void testRandomIndicesWithReplacement ( ) {
logger . warn ( "Executing testRandomIndicesWithReplacement" ) ;
// Check that the size of the list returned is correct
Assert . assertTrue ( MathUtils . sampleIndicesWithReplacement ( 5 , 0 ) . size ( ) = = 0 ) ;
Assert . assertTrue ( MathUtils . sampleIndicesWithReplacement ( 5 , 1 ) . size ( ) = = 1 ) ;
Assert . assertTrue ( MathUtils . sampleIndicesWithReplacement ( 5 , 5 ) . size ( ) = = 5 ) ;
Assert . assertTrue ( MathUtils . sampleIndicesWithReplacement ( 5 , 1000 ) . size ( ) = = 1000 ) ;
// Check that the list contains only the k element range that as asked for - no more, no less
List < Integer > Five = new ArrayList < Integer > ( ) ;
Collections . addAll ( Five , 0 , 1 , 2 , 3 , 4 ) ;
List < Integer > BigFive = MathUtils . sampleIndicesWithReplacement ( 5 , 10000 ) ;
Assert . assertTrue ( BigFive . containsAll ( Five ) ) ;
Assert . assertTrue ( Five . containsAll ( BigFive ) ) ;
}
/ * *
* Tests that we get the right values from the multinomial distribution
* /
@Test
public void testSliceListByIndices ( ) {
logger . warn ( "Executing testSliceListByIndices" ) ;
// Check that the list contains only the k element range that as asked for - no more, no less but now
// use the index list to pull elements from another list using sliceListByIndices
List < Integer > Five = new ArrayList < Integer > ( ) ;
Collections . addAll ( Five , 0 , 1 , 2 , 3 , 4 ) ;
List < Character > FiveAlpha = new ArrayList < Character > ( ) ;
Collections . addAll ( FiveAlpha , 'a' , 'b' , 'c' , 'd' , 'e' ) ;
List < Integer > BigFive = MathUtils . sampleIndicesWithReplacement ( 5 , 10000 ) ;
List < Character > BigFiveAlpha = MathUtils . sliceListByIndices ( BigFive , FiveAlpha ) ;
Assert . assertTrue ( BigFiveAlpha . containsAll ( FiveAlpha ) ) ;
Assert . assertTrue ( FiveAlpha . containsAll ( BigFiveAlpha ) ) ;
}
2010-08-05 05:39:02 +08:00
2011-12-29 13:41:59 +08:00
/ * *
* Tests that we correctly compute mean and standard deviation from a stream of numbers
* /
2010-08-05 05:39:02 +08:00
@Test
public void testRunningAverage ( ) {
logger . warn ( "Executing testRunningAverage" ) ;
2011-12-29 13:41:59 +08:00
int [ ] numbers = { 1 , 2 , 4 , 5 , 3 , 128 , 25678 , - 24 } ;
2010-08-05 05:39:02 +08:00
MathUtils . RunningAverage r = new MathUtils . RunningAverage ( ) ;
2012-03-06 06:37:34 +08:00
for ( int i = 0 ; i < numbers . length ; i + + )
r . add ( ( double ) numbers [ i ] ) ;
2010-08-05 05:39:02 +08:00
2011-12-29 13:41:59 +08:00
Assert . assertEquals ( ( long ) numbers . length , r . observationCount ( ) ) ;
Assert . assertTrue ( r . mean ( ) - 3224.625 < 2e-10 ) ;
Assert . assertTrue ( r . stddev ( ) - 9072.6515881128 < 2e-10 ) ;
2010-08-05 05:39:02 +08:00
}
2011-06-23 06:55:09 +08:00
@Test
public void testLog10Gamma ( ) {
logger . warn ( "Executing testLog10Gamma" ) ;
2011-06-23 06:55:20 +08:00
Assert . assertEquals ( MathUtils . log10Gamma ( 4.0 ) , 0.7781513 , 1e-6 ) ;
Assert . assertEquals ( MathUtils . log10Gamma ( 10 ) , 5.559763 , 1e-6 ) ;
Assert . assertEquals ( MathUtils . log10Gamma ( 10654 ) , 38280.53 , 1e-2 ) ;
2011-06-23 06:55:09 +08:00
}
@Test
public void testLog10BinomialCoefficient ( ) {
logger . warn ( "Executing testLog10BinomialCoefficient" ) ;
Rarely-occurring logic bugfix for GenotypeConcordance, streamlining and testing of MathUtils
Currently, the multi-allelic test is covering the following case:
Eval A T,C
Comp A C
reciprocate this so that the reverse can be covered.
Eval A C
Comp A T,C
And furthermore, modify ConcordanceMetrics to more properly handle the situation where multiple alternate alleles are available in the comp. It was possible for an eval C/C sample to match a comp T/T sample, so long as the C allele were also present in at least one other comp sample.
This comes from the fact that "truth" reference alleles can be paired with *any* allele also present in the truth VCF, while truth het/hom var sites are restricted to having to match only the alleles present in the genotype. The reason that truth ref alleles are special case is as follows, imagine:
Eval: A G,T 0/0 2/0 2/2 1/1
Comp: A C,T 0/0 1/0 0/0 0/0
Even though the alt allele of the comp is a C, the assessment of genotypes should be as follows:
Sample1: ref called ref
Sample2: alleles don't match (the alt allele of the comp was not assessed in eval)
Sample3: ref called hom-var
Sample4: alleles don't match (the alt allele of the eval was not assessed in comp)
Before this change, Sample2 was evaluated as "het called het" (as the T allele in eval happens to also be in the comp record, just not in the comp sample). Thus: apply current
logic to comp hom-refs, and the more restrictive logic ("you have to match an allele in the comp genotype") when the comp is not reference.
Also in this commit,major refactoring and testing for MathUtils. A large number of methods were not used at all in the codebase, these methods were removed:
- dotProduct(several types). logDotProduct is used extensively, but not the real-space version.
- vectorSum
- array shuffle, random subset
- countOccurances (general forms, the char form is used in the codebase)
- getNMaxElements
- array permutation
- sorted array permutation
- compare floats
- sum() (for integer arrays and lists).
Final keyword was extensively added to MathUtils.
The ratio() and percentage() methods were revised to error out with non-positive denominators, except in the case of 0/0 (which returns 0.0 (ratio), or 0.0% (percentage)). Random sampling code was updated to make use of the cleaner implementations of generating permutations in MathUtils (allowing the array permutation code to be retired).
The PaperGenotyper still made use of one of these array methods, since it was the only walker it was migrated into the genotyper itself.
In addition, more extensive tests were added for
- logBinomialCoefficient (Newton's identity should always hold)
- logFactorial
- log10sumlog10 and its approximation
All unit tests pass
2013-03-29 11:25:28 +08:00
// note that we can test the binomial coefficient calculation indirectly via Newton's identity
// (1+z)^m = sum (m choose k)z^k
double [ ] z_vals = new double [ ] { 0.999 , 0.9 , 0.8 , 0.5 , 0.2 , 0.01 , 0.0001 } ;
int [ ] exponent = new int [ ] { 5 , 15 , 25 , 50 , 100 } ;
for ( double z : z_vals ) {
double logz = Math . log10 ( z ) ;
for ( int exp : exponent ) {
double expected_log = exp * Math . log10 ( 1 + z ) ;
double [ ] newtonArray_log = new double [ 1 + exp ] ;
for ( int k = 0 ; k < = exp ; k + + ) {
newtonArray_log [ k ] = MathUtils . log10BinomialCoefficient ( exp , k ) + k * logz ;
}
Assert . assertEquals ( MathUtils . log10sumLog10 ( newtonArray_log ) , expected_log , 1e-6 ) ;
}
}
2011-06-23 06:55:09 +08:00
2011-06-23 06:55:20 +08:00
Assert . assertEquals ( MathUtils . log10BinomialCoefficient ( 4 , 2 ) , 0.7781513 , 1e-6 ) ;
Assert . assertEquals ( MathUtils . log10BinomialCoefficient ( 10 , 3 ) , 2.079181 , 1e-6 ) ;
Assert . assertEquals ( MathUtils . log10BinomialCoefficient ( 103928 , 119 ) , 400.2156 , 1e-4 ) ;
2011-06-23 06:55:09 +08:00
}
2011-06-23 06:55:20 +08:00
@Test
public void testFactorial ( ) {
logger . warn ( "Executing testFactorial" ) ;
Assert . assertEquals ( ( int ) MathUtils . factorial ( 4 ) , 24 ) ;
Assert . assertEquals ( ( int ) MathUtils . factorial ( 10 ) , 3628800 ) ;
Assert . assertEquals ( ( int ) MathUtils . factorial ( 12 ) , 479001600 ) ;
}
@Test
public void testLog10Factorial ( ) {
logger . warn ( "Executing testLog10Factorial" ) ;
Assert . assertEquals ( MathUtils . log10Factorial ( 4 ) , 1.380211 , 1e-6 ) ;
Assert . assertEquals ( MathUtils . log10Factorial ( 10 ) , 6.559763 , 1e-6 ) ;
Assert . assertEquals ( MathUtils . log10Factorial ( 12 ) , 8.680337 , 1e-6 ) ;
Assert . assertEquals ( MathUtils . log10Factorial ( 200 ) , 374.8969 , 1e-3 ) ;
Assert . assertEquals ( MathUtils . log10Factorial ( 12342 ) , 45138.26 , 1e-1 ) ;
Rarely-occurring logic bugfix for GenotypeConcordance, streamlining and testing of MathUtils
Currently, the multi-allelic test is covering the following case:
Eval A T,C
Comp A C
reciprocate this so that the reverse can be covered.
Eval A C
Comp A T,C
And furthermore, modify ConcordanceMetrics to more properly handle the situation where multiple alternate alleles are available in the comp. It was possible for an eval C/C sample to match a comp T/T sample, so long as the C allele were also present in at least one other comp sample.
This comes from the fact that "truth" reference alleles can be paired with *any* allele also present in the truth VCF, while truth het/hom var sites are restricted to having to match only the alleles present in the genotype. The reason that truth ref alleles are special case is as follows, imagine:
Eval: A G,T 0/0 2/0 2/2 1/1
Comp: A C,T 0/0 1/0 0/0 0/0
Even though the alt allele of the comp is a C, the assessment of genotypes should be as follows:
Sample1: ref called ref
Sample2: alleles don't match (the alt allele of the comp was not assessed in eval)
Sample3: ref called hom-var
Sample4: alleles don't match (the alt allele of the eval was not assessed in comp)
Before this change, Sample2 was evaluated as "het called het" (as the T allele in eval happens to also be in the comp record, just not in the comp sample). Thus: apply current
logic to comp hom-refs, and the more restrictive logic ("you have to match an allele in the comp genotype") when the comp is not reference.
Also in this commit,major refactoring and testing for MathUtils. A large number of methods were not used at all in the codebase, these methods were removed:
- dotProduct(several types). logDotProduct is used extensively, but not the real-space version.
- vectorSum
- array shuffle, random subset
- countOccurances (general forms, the char form is used in the codebase)
- getNMaxElements
- array permutation
- sorted array permutation
- compare floats
- sum() (for integer arrays and lists).
Final keyword was extensively added to MathUtils.
The ratio() and percentage() methods were revised to error out with non-positive denominators, except in the case of 0/0 (which returns 0.0 (ratio), or 0.0% (percentage)). Random sampling code was updated to make use of the cleaner implementations of generating permutations in MathUtils (allowing the array permutation code to be retired).
The PaperGenotyper still made use of one of these array methods, since it was the only walker it was migrated into the genotyper itself.
In addition, more extensive tests were added for
- logBinomialCoefficient (Newton's identity should always hold)
- logFactorial
- log10sumlog10 and its approximation
All unit tests pass
2013-03-29 11:25:28 +08:00
double log10factorial_small = 0 ;
double log10factorial_middle = 374.8969 ;
double log10factorial_large = 45138.26 ;
int small_start = 1 ;
int med_start = 200 ;
int large_start = 12342 ;
for ( int i = 1 ; i < 1000 ; i + + ) {
log10factorial_small + = Math . log10 ( i + small_start ) ;
log10factorial_middle + = Math . log10 ( i + med_start ) ;
log10factorial_large + = Math . log10 ( i + large_start ) ;
Assert . assertEquals ( MathUtils . log10Factorial ( small_start + i ) , log10factorial_small , 1e-6 ) ;
Assert . assertEquals ( MathUtils . log10Factorial ( med_start + i ) , log10factorial_middle , 1e-3 ) ;
Assert . assertEquals ( MathUtils . log10Factorial ( large_start + i ) , log10factorial_large , 1e-1 ) ;
2011-12-29 13:41:59 +08:00
}
}
2012-03-06 01:28:32 +08:00
/ * *
* Private functions used by testArrayShuffle ( )
* /
private boolean hasUniqueElements ( Object [ ] x ) {
for ( int i = 0 ; i < x . length ; i + + )
for ( int j = i + 1 ; j < x . length ; j + + )
if ( x [ i ] . equals ( x [ j ] ) | | x [ i ] = = x [ j ] )
return false ;
return true ;
}
private boolean hasAllElements ( final Object [ ] expected , final Object [ ] actual ) {
HashSet < Object > set = new HashSet < Object > ( ) ;
set . addAll ( Arrays . asList ( expected ) ) ;
set . removeAll ( Arrays . asList ( actual ) ) ;
return set . isEmpty ( ) ;
}
@Test
public void testApproximateLog10SumLog10 ( ) {
Working version of HaplotypeCaller ReferenceConfidenceModel that accounts for indels as well as SNP confidences
-- Assembly graph building now returns an object that describes whether the graph was successfully built and has variation, was succesfully built but didn't have variation, or truly failed in construction. Fixing an annoying bug where you'd prefectly assembly the sequence into the reference graph, but then return a null graph because of this, and you'd increase your kmer because it null was also used to indicate assembly failure
--
-- Output format looks like:
20 10026072 . T <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:3,0:3:9:0,9,120
20 10026073 . A <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:3,0:3:9:0,9,119
20 10026074 . T <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:3,0:3:9:0,9,121
20 10026075 . T <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:3,0:3:9:0,9,119
20 10026076 . T <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:3,0:3:9:0,9,120
20 10026077 . T <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:3,0:3:9:0,9,120
20 10026078 . C <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:5,0:5:15:0,15,217
20 10026079 . A <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:6,0:6:18:0,18,240
20 10026080 . G <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:6,0:6:18:0,18,268
20 10026081 . T <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:7,0:7:21:0,21,267
We use a symbolic allele to indicate that the site is hom-ref, and because we have an ALT allele we can provide AD and PL field values. Currently these are calculated as ref vs. any non-ref value (mismatch or insertion) but doesn't yet account properly for alignment uncertainty.
-- Can we enabled for single samples with --emitRefConfidence (-ERC).
-- This is accomplished by realigning the each read to its most likley haplotype, and then evaluting the resulting pileups over the active region interval. The realignment is done by the HaplotypeBAMWriter, which now has a generalized interface that lets us provide a ReadDestination object so we can capture the realigned reads
-- Provide access to the more raw LocusIteratorByState constructor so we can more easily make them programmatically without constructing lots of misc. GATK data structures. Moved the NO_DOWNSAMPLING constant from LIBSDownsamplingInfo to LocusIteratorByState so clients can use it without making LIBSDownsamplingInfo a public class.
-- Includes GVCF writer
-- Add 1 mb of WEx data to private/testdata
-- Integration tests for reference model output for WGS and WEx data
-- Emit GQ block information into VCF header for GVCF mode
-- OutputMode from StandardCallerArgumentCollection moved to UnifiedArgumentCollection as its no longer relevant for HC
-- Control max indel size for the reference confidence model from the command line. Increase default to 10
-- Don't use out_mode in HaplotypeCallerComplexAndSymbolicVariantsIntegrationTest
-- Unittests for ReferenceConfidenceModel
-- Unittests for new MathUtils functions
2013-06-20 03:07:35 +08:00
2012-10-16 01:24:32 +08:00
final double requiredPrecision = 1E-4 ;
Assert . assertEquals ( MathUtils . approximateLog10SumLog10 ( new double [ ] { 0.0 } ) , 0.0 , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . approximateLog10SumLog10 ( new double [ ] { - 5.15 } ) , - 5.15 , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . approximateLog10SumLog10 ( new double [ ] { 130.0 } ) , 130.0 , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . approximateLog10SumLog10 ( new double [ ] { - 0.145 } ) , - 0.145 , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . approximateLog10SumLog10 ( 0.0 , 0.0 ) , Math . log10 ( Math . pow ( 10.0 , 0.0 ) + Math . pow ( 10.0 , 0.0 ) ) , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . approximateLog10SumLog10 ( - 1.0 , 0.0 ) , Math . log10 ( Math . pow ( 10.0 , - 1.0 ) + Math . pow ( 10.0 , 0.0 ) ) , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . approximateLog10SumLog10 ( 0.0 , - 1.0 ) , Math . log10 ( Math . pow ( 10.0 , 0.0 ) + Math . pow ( 10.0 , - 1.0 ) ) , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . approximateLog10SumLog10 ( - 2.2 , - 3.5 ) , Math . log10 ( Math . pow ( 10.0 , - 2.2 ) + Math . pow ( 10.0 , - 3.5 ) ) , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . approximateLog10SumLog10 ( - 1.0 , - 7.1 ) , Math . log10 ( Math . pow ( 10.0 , - 1.0 ) + Math . pow ( 10.0 , - 7.1 ) ) , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . approximateLog10SumLog10 ( 5.0 , 6.2 ) , Math . log10 ( Math . pow ( 10.0 , 5.0 ) + Math . pow ( 10.0 , 6.2 ) ) , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . approximateLog10SumLog10 ( 38.1 , 16.2 ) , Math . log10 ( Math . pow ( 10.0 , 38.1 ) + Math . pow ( 10.0 , 16.2 ) ) , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . approximateLog10SumLog10 ( - 38.1 , 6.2 ) , Math . log10 ( Math . pow ( 10.0 , - 38.1 ) + Math . pow ( 10.0 , 6.2 ) ) , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . approximateLog10SumLog10 ( - 19.1 , - 37.1 ) , Math . log10 ( Math . pow ( 10.0 , - 19.1 ) + Math . pow ( 10.0 , - 37.1 ) ) , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . approximateLog10SumLog10 ( - 29.1 , - 27.6 ) , Math . log10 ( Math . pow ( 10.0 , - 29.1 ) + Math . pow ( 10.0 , - 27.6 ) ) , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . approximateLog10SumLog10 ( - 0.12345 , - 0.23456 ) , Math . log10 ( Math . pow ( 10.0 , - 0.12345 ) + Math . pow ( 10.0 , - 0.23456 ) ) , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . approximateLog10SumLog10 ( - 15.7654 , - 17.0101 ) , Math . log10 ( Math . pow ( 10.0 , - 15.7654 ) + Math . pow ( 10.0 , - 17.0101 ) ) , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . approximateLog10SumLog10 ( - 0.12345 , Double . NEGATIVE_INFINITY ) , - 0.12345 , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . approximateLog10SumLog10 ( - 15.7654 , Double . NEGATIVE_INFINITY ) , - 15.7654 , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . approximateLog10SumLog10 ( new double [ ] { 0.0 , 0.0 } ) , Math . log10 ( Math . pow ( 10.0 , 0.0 ) + Math . pow ( 10.0 , 0.0 ) ) , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . approximateLog10SumLog10 ( new double [ ] { - 1.0 , 0.0 } ) , Math . log10 ( Math . pow ( 10.0 , - 1.0 ) + Math . pow ( 10.0 , 0.0 ) ) , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . approximateLog10SumLog10 ( new double [ ] { 0.0 , - 1.0 } ) , Math . log10 ( Math . pow ( 10.0 , 0.0 ) + Math . pow ( 10.0 , - 1.0 ) ) , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . approximateLog10SumLog10 ( new double [ ] { - 2.2 , - 3.5 } ) , Math . log10 ( Math . pow ( 10.0 , - 2.2 ) + Math . pow ( 10.0 , - 3.5 ) ) , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . approximateLog10SumLog10 ( new double [ ] { - 1.0 , - 7.1 } ) , Math . log10 ( Math . pow ( 10.0 , - 1.0 ) + Math . pow ( 10.0 , - 7.1 ) ) , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . approximateLog10SumLog10 ( new double [ ] { 5.0 , 6.2 } ) , Math . log10 ( Math . pow ( 10.0 , 5.0 ) + Math . pow ( 10.0 , 6.2 ) ) , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . approximateLog10SumLog10 ( new double [ ] { 38.1 , 16.2 } ) , Math . log10 ( Math . pow ( 10.0 , 38.1 ) + Math . pow ( 10.0 , 16.2 ) ) , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . approximateLog10SumLog10 ( new double [ ] { - 38.1 , 6.2 } ) , Math . log10 ( Math . pow ( 10.0 , - 38.1 ) + Math . pow ( 10.0 , 6.2 ) ) , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . approximateLog10SumLog10 ( new double [ ] { - 19.1 , - 37.1 } ) , Math . log10 ( Math . pow ( 10.0 , - 19.1 ) + Math . pow ( 10.0 , - 37.1 ) ) , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . approximateLog10SumLog10 ( new double [ ] { - 29.1 , - 27.6 } ) , Math . log10 ( Math . pow ( 10.0 , - 29.1 ) + Math . pow ( 10.0 , - 27.6 ) ) , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . approximateLog10SumLog10 ( new double [ ] { - 0.12345 , - 0.23456 } ) , Math . log10 ( Math . pow ( 10.0 , - 0.12345 ) + Math . pow ( 10.0 , - 0.23456 ) ) , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . approximateLog10SumLog10 ( new double [ ] { - 15.7654 , - 17.0101 } ) , Math . log10 ( Math . pow ( 10.0 , - 15.7654 ) + Math . pow ( 10.0 , - 17.0101 ) ) , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . approximateLog10SumLog10 ( new double [ ] { 0.0 , 0.0 , 0.0 } ) , Math . log10 ( Math . pow ( 10.0 , 0.0 ) + Math . pow ( 10.0 , 0.0 ) + Math . pow ( 10.0 , 0.0 ) ) , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . approximateLog10SumLog10 ( new double [ ] { - 1.0 , 0.0 , 0.0 } ) , Math . log10 ( Math . pow ( 10.0 , - 1.0 ) + Math . pow ( 10.0 , 0.0 ) + Math . pow ( 10.0 , 0.0 ) ) , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . approximateLog10SumLog10 ( new double [ ] { 0.0 , - 1.0 , - 2.5 } ) , Math . log10 ( Math . pow ( 10.0 , 0.0 ) + Math . pow ( 10.0 , - 1.0 ) + Math . pow ( 10.0 , - 2.5 ) ) , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . approximateLog10SumLog10 ( new double [ ] { - 2.2 , - 3.5 , - 1.1 } ) , Math . log10 ( Math . pow ( 10.0 , - 2.2 ) + Math . pow ( 10.0 , - 3.5 ) + Math . pow ( 10.0 , - 1.1 ) ) , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . approximateLog10SumLog10 ( new double [ ] { - 1.0 , - 7.1 , 0.5 } ) , Math . log10 ( Math . pow ( 10.0 , - 1.0 ) + Math . pow ( 10.0 , - 7.1 ) + Math . pow ( 10.0 , 0.5 ) ) , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . approximateLog10SumLog10 ( new double [ ] { 5.0 , 6.2 , 1.3 } ) , Math . log10 ( Math . pow ( 10.0 , 5.0 ) + Math . pow ( 10.0 , 6.2 ) + Math . pow ( 10.0 , 1.3 ) ) , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . approximateLog10SumLog10 ( new double [ ] { 38.1 , 16.2 , 18.1 } ) , Math . log10 ( Math . pow ( 10.0 , 38.1 ) + Math . pow ( 10.0 , 16.2 ) + Math . pow ( 10.0 , 18.1 ) ) , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . approximateLog10SumLog10 ( new double [ ] { - 38.1 , 6.2 , 26.6 } ) , Math . log10 ( Math . pow ( 10.0 , - 38.1 ) + Math . pow ( 10.0 , 6.2 ) + Math . pow ( 10.0 , 26.6 ) ) , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . approximateLog10SumLog10 ( new double [ ] { - 19.1 , - 37.1 , - 45.1 } ) , Math . log10 ( Math . pow ( 10.0 , - 19.1 ) + Math . pow ( 10.0 , - 37.1 ) + Math . pow ( 10.0 , - 45.1 ) ) , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . approximateLog10SumLog10 ( new double [ ] { - 29.1 , - 27.6 , - 26.2 } ) , Math . log10 ( Math . pow ( 10.0 , - 29.1 ) + Math . pow ( 10.0 , - 27.6 ) + Math . pow ( 10.0 , - 26.2 ) ) , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . approximateLog10SumLog10 ( new double [ ] { - 0.12345 , - 0.23456 , - 0.34567 } ) , Math . log10 ( Math . pow ( 10.0 , - 0.12345 ) + Math . pow ( 10.0 , - 0.23456 ) + Math . pow ( 10.0 , - 0.34567 ) ) , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . approximateLog10SumLog10 ( new double [ ] { - 15.7654 , - 17.0101 , - 17.9341 } ) , Math . log10 ( Math . pow ( 10.0 , - 15.7654 ) + Math . pow ( 10.0 , - 17.0101 ) + Math . pow ( 10.0 , - 17.9341 ) ) , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . approximateLog10SumLog10 ( 0.0 , 0.0 , 0.0 ) , Math . log10 ( Math . pow ( 10.0 , 0.0 ) + Math . pow ( 10.0 , 0.0 ) + Math . pow ( 10.0 , 0.0 ) ) , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . approximateLog10SumLog10 ( - 1.0 , 0.0 , 0.0 ) , Math . log10 ( Math . pow ( 10.0 , - 1.0 ) + Math . pow ( 10.0 , 0.0 ) + Math . pow ( 10.0 , 0.0 ) ) , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . approximateLog10SumLog10 ( 0.0 , - 1.0 , - 2.5 ) , Math . log10 ( Math . pow ( 10.0 , 0.0 ) + Math . pow ( 10.0 , - 1.0 ) + Math . pow ( 10.0 , - 2.5 ) ) , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . approximateLog10SumLog10 ( - 2.2 , - 3.5 , - 1.1 ) , Math . log10 ( Math . pow ( 10.0 , - 2.2 ) + Math . pow ( 10.0 , - 3.5 ) + Math . pow ( 10.0 , - 1.1 ) ) , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . approximateLog10SumLog10 ( - 1.0 , - 7.1 , 0.5 ) , Math . log10 ( Math . pow ( 10.0 , - 1.0 ) + Math . pow ( 10.0 , - 7.1 ) + Math . pow ( 10.0 , 0.5 ) ) , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . approximateLog10SumLog10 ( 5.0 , 6.2 , 1.3 ) , Math . log10 ( Math . pow ( 10.0 , 5.0 ) + Math . pow ( 10.0 , 6.2 ) + Math . pow ( 10.0 , 1.3 ) ) , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . approximateLog10SumLog10 ( 38.1 , 16.2 , 18.1 ) , Math . log10 ( Math . pow ( 10.0 , 38.1 ) + Math . pow ( 10.0 , 16.2 ) + Math . pow ( 10.0 , 18.1 ) ) , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . approximateLog10SumLog10 ( - 38.1 , 6.2 , 26.6 ) , Math . log10 ( Math . pow ( 10.0 , - 38.1 ) + Math . pow ( 10.0 , 6.2 ) + Math . pow ( 10.0 , 26.6 ) ) , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . approximateLog10SumLog10 ( - 19.1 , - 37.1 , - 45.1 ) , Math . log10 ( Math . pow ( 10.0 , - 19.1 ) + Math . pow ( 10.0 , - 37.1 ) + Math . pow ( 10.0 , - 45.1 ) ) , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . approximateLog10SumLog10 ( - 29.1 , - 27.6 , - 26.2 ) , Math . log10 ( Math . pow ( 10.0 , - 29.1 ) + Math . pow ( 10.0 , - 27.6 ) + Math . pow ( 10.0 , - 26.2 ) ) , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . approximateLog10SumLog10 ( - 0.12345 , - 0.23456 , - 0.34567 ) , Math . log10 ( Math . pow ( 10.0 , - 0.12345 ) + Math . pow ( 10.0 , - 0.23456 ) + Math . pow ( 10.0 , - 0.34567 ) ) , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . approximateLog10SumLog10 ( - 15.7654 , - 17.0101 , - 17.9341 ) , Math . log10 ( Math . pow ( 10.0 , - 15.7654 ) + Math . pow ( 10.0 , - 17.0101 ) + Math . pow ( 10.0 , - 17.9341 ) ) , requiredPrecision ) ;
2011-12-29 13:41:59 +08:00
Rarely-occurring logic bugfix for GenotypeConcordance, streamlining and testing of MathUtils
Currently, the multi-allelic test is covering the following case:
Eval A T,C
Comp A C
reciprocate this so that the reverse can be covered.
Eval A C
Comp A T,C
And furthermore, modify ConcordanceMetrics to more properly handle the situation where multiple alternate alleles are available in the comp. It was possible for an eval C/C sample to match a comp T/T sample, so long as the C allele were also present in at least one other comp sample.
This comes from the fact that "truth" reference alleles can be paired with *any* allele also present in the truth VCF, while truth het/hom var sites are restricted to having to match only the alleles present in the genotype. The reason that truth ref alleles are special case is as follows, imagine:
Eval: A G,T 0/0 2/0 2/2 1/1
Comp: A C,T 0/0 1/0 0/0 0/0
Even though the alt allele of the comp is a C, the assessment of genotypes should be as follows:
Sample1: ref called ref
Sample2: alleles don't match (the alt allele of the comp was not assessed in eval)
Sample3: ref called hom-var
Sample4: alleles don't match (the alt allele of the eval was not assessed in comp)
Before this change, Sample2 was evaluated as "het called het" (as the T allele in eval happens to also be in the comp record, just not in the comp sample). Thus: apply current
logic to comp hom-refs, and the more restrictive logic ("you have to match an allele in the comp genotype") when the comp is not reference.
Also in this commit,major refactoring and testing for MathUtils. A large number of methods were not used at all in the codebase, these methods were removed:
- dotProduct(several types). logDotProduct is used extensively, but not the real-space version.
- vectorSum
- array shuffle, random subset
- countOccurances (general forms, the char form is used in the codebase)
- getNMaxElements
- array permutation
- sorted array permutation
- compare floats
- sum() (for integer arrays and lists).
Final keyword was extensively added to MathUtils.
The ratio() and percentage() methods were revised to error out with non-positive denominators, except in the case of 0/0 (which returns 0.0 (ratio), or 0.0% (percentage)). Random sampling code was updated to make use of the cleaner implementations of generating permutations in MathUtils (allowing the array permutation code to be retired).
The PaperGenotyper still made use of one of these array methods, since it was the only walker it was migrated into the genotyper itself.
In addition, more extensive tests were added for
- logBinomialCoefficient (Newton's identity should always hold)
- logFactorial
- log10sumlog10 and its approximation
All unit tests pass
2013-03-29 11:25:28 +08:00
// magnitude of the sum doesn't matter, so we can combinatorially test this via partitions of unity
double [ ] mult_partitionFactor = new double [ ] { 0.999 , 0.98 , 0.95 , 0.90 , 0.8 , 0.5 , 0.3 , 0.1 , 0.05 , 0.001 } ;
int [ ] n_partitions = new int [ ] { 2 , 4 , 8 , 16 , 32 , 64 , 128 , 256 , 512 , 1028 } ;
for ( double alpha : mult_partitionFactor ) {
double log_alpha = Math . log10 ( alpha ) ;
double log_oneMinusAlpha = Math . log10 ( 1 - alpha ) ;
for ( int npart : n_partitions ) {
double [ ] multiplicative = new double [ npart ] ;
double [ ] equal = new double [ npart ] ;
double remaining_log = 0.0 ; // realspace = 1
for ( int i = 0 ; i < npart - 1 ; i + + ) {
equal [ i ] = - Math . log10 ( npart ) ;
double piece = remaining_log + log_alpha ; // take a*remaining, leaving remaining-a*remaining = (1-a)*remaining
multiplicative [ i ] = piece ;
remaining_log = remaining_log + log_oneMinusAlpha ;
}
equal [ npart - 1 ] = - Math . log10 ( npart ) ;
multiplicative [ npart - 1 ] = remaining_log ;
Assert . assertEquals ( MathUtils . approximateLog10SumLog10 ( equal ) , 0.0 , requiredPrecision , String . format ( "Did not sum to one: k=%d equal partitions." , npart ) ) ;
Assert . assertEquals ( MathUtils . approximateLog10SumLog10 ( multiplicative ) , 0.0 , requiredPrecision , String . format ( "Did not sum to one: k=%d multiplicative partitions with alpha=%f" , npart , alpha ) ) ;
}
}
2011-12-29 13:41:59 +08:00
}
2012-04-10 02:28:25 +08:00
@Test
public void testLog10sumLog10 ( ) {
2012-10-16 01:24:32 +08:00
final double requiredPrecision = 1E-14 ;
2012-04-10 02:28:25 +08:00
final double log3 = 0.477121254719662 ;
2012-10-16 01:24:32 +08:00
Assert . assertEquals ( MathUtils . log10sumLog10 ( new double [ ] { 0.0 , 0.0 , 0.0 } ) , log3 , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . log10sumLog10 ( new double [ ] { 0.0 , 0.0 , 0.0 } , 0 ) , log3 , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . log10sumLog10 ( new double [ ] { 0.0 , 0.0 , 0.0 } , 0 , 3 ) , log3 , requiredPrecision ) ;
2012-04-10 02:28:25 +08:00
final double log2 = 0.301029995663981 ;
2012-10-16 01:24:32 +08:00
Assert . assertEquals ( MathUtils . log10sumLog10 ( new double [ ] { 0.0 , 0.0 , 0.0 } , 0 , 2 ) , log2 , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . log10sumLog10 ( new double [ ] { 0.0 , 0.0 , 0.0 } , 0 , 1 ) , 0.0 , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . log10sumLog10 ( new double [ ] { 0.0 } ) , 0.0 , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . log10sumLog10 ( new double [ ] { - 5.15 } ) , - 5.15 , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . log10sumLog10 ( new double [ ] { 130.0 } ) , 130.0 , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . log10sumLog10 ( new double [ ] { - 0.145 } ) , - 0.145 , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . log10sumLog10 ( new double [ ] { 0.0 , 0.0 } ) , Math . log10 ( Math . pow ( 10.0 , 0.0 ) + Math . pow ( 10.0 , 0.0 ) ) , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . log10sumLog10 ( new double [ ] { - 1.0 , 0.0 } ) , Math . log10 ( Math . pow ( 10.0 , - 1.0 ) + Math . pow ( 10.0 , 0.0 ) ) , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . log10sumLog10 ( new double [ ] { 0.0 , - 1.0 } ) , Math . log10 ( Math . pow ( 10.0 , 0.0 ) + Math . pow ( 10.0 , - 1.0 ) ) , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . log10sumLog10 ( new double [ ] { - 2.2 , - 3.5 } ) , Math . log10 ( Math . pow ( 10.0 , - 2.2 ) + Math . pow ( 10.0 , - 3.5 ) ) , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . log10sumLog10 ( new double [ ] { - 1.0 , - 7.1 } ) , Math . log10 ( Math . pow ( 10.0 , - 1.0 ) + Math . pow ( 10.0 , - 7.1 ) ) , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . log10sumLog10 ( new double [ ] { 5.0 , 6.2 } ) , Math . log10 ( Math . pow ( 10.0 , 5.0 ) + Math . pow ( 10.0 , 6.2 ) ) , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . log10sumLog10 ( new double [ ] { 38.1 , 16.2 } ) , Math . log10 ( Math . pow ( 10.0 , 38.1 ) + Math . pow ( 10.0 , 16.2 ) ) , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . log10sumLog10 ( new double [ ] { - 38.1 , 6.2 } ) , Math . log10 ( Math . pow ( 10.0 , - 38.1 ) + Math . pow ( 10.0 , 6.2 ) ) , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . log10sumLog10 ( new double [ ] { - 19.1 , - 37.1 } ) , Math . log10 ( Math . pow ( 10.0 , - 19.1 ) + Math . pow ( 10.0 , - 37.1 ) ) , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . log10sumLog10 ( new double [ ] { - 29.1 , - 27.6 } ) , Math . log10 ( Math . pow ( 10.0 , - 29.1 ) + Math . pow ( 10.0 , - 27.6 ) ) , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . log10sumLog10 ( new double [ ] { - 0.12345 , - 0.23456 } ) , Math . log10 ( Math . pow ( 10.0 , - 0.12345 ) + Math . pow ( 10.0 , - 0.23456 ) ) , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . log10sumLog10 ( new double [ ] { - 15.7654 , - 17.0101 } ) , Math . log10 ( Math . pow ( 10.0 , - 15.7654 ) + Math . pow ( 10.0 , - 17.0101 ) ) , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . log10sumLog10 ( new double [ ] { 0.0 , 0.0 , 0.0 } ) , Math . log10 ( Math . pow ( 10.0 , 0.0 ) + Math . pow ( 10.0 , 0.0 ) + Math . pow ( 10.0 , 0.0 ) ) , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . log10sumLog10 ( new double [ ] { - 1.0 , 0.0 , 0.0 } ) , Math . log10 ( Math . pow ( 10.0 , - 1.0 ) + Math . pow ( 10.0 , 0.0 ) + Math . pow ( 10.0 , 0.0 ) ) , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . log10sumLog10 ( new double [ ] { 0.0 , - 1.0 , - 2.5 } ) , Math . log10 ( Math . pow ( 10.0 , 0.0 ) + Math . pow ( 10.0 , - 1.0 ) + Math . pow ( 10.0 , - 2.5 ) ) , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . log10sumLog10 ( new double [ ] { - 2.2 , - 3.5 , - 1.1 } ) , Math . log10 ( Math . pow ( 10.0 , - 2.2 ) + Math . pow ( 10.0 , - 3.5 ) + Math . pow ( 10.0 , - 1.1 ) ) , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . log10sumLog10 ( new double [ ] { - 1.0 , - 7.1 , 0.5 } ) , Math . log10 ( Math . pow ( 10.0 , - 1.0 ) + Math . pow ( 10.0 , - 7.1 ) + Math . pow ( 10.0 , 0.5 ) ) , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . log10sumLog10 ( new double [ ] { 5.0 , 6.2 , 1.3 } ) , Math . log10 ( Math . pow ( 10.0 , 5.0 ) + Math . pow ( 10.0 , 6.2 ) + Math . pow ( 10.0 , 1.3 ) ) , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . log10sumLog10 ( new double [ ] { 38.1 , 16.2 , 18.1 } ) , Math . log10 ( Math . pow ( 10.0 , 38.1 ) + Math . pow ( 10.0 , 16.2 ) + Math . pow ( 10.0 , 18.1 ) ) , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . log10sumLog10 ( new double [ ] { - 38.1 , 6.2 , 26.6 } ) , Math . log10 ( Math . pow ( 10.0 , - 38.1 ) + Math . pow ( 10.0 , 6.2 ) + Math . pow ( 10.0 , 26.6 ) ) , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . log10sumLog10 ( new double [ ] { - 19.1 , - 37.1 , - 45.1 } ) , Math . log10 ( Math . pow ( 10.0 , - 19.1 ) + Math . pow ( 10.0 , - 37.1 ) + Math . pow ( 10.0 , - 45.1 ) ) , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . log10sumLog10 ( new double [ ] { - 29.1 , - 27.6 , - 26.2 } ) , Math . log10 ( Math . pow ( 10.0 , - 29.1 ) + Math . pow ( 10.0 , - 27.6 ) + Math . pow ( 10.0 , - 26.2 ) ) , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . log10sumLog10 ( new double [ ] { - 0.12345 , - 0.23456 , - 0.34567 } ) , Math . log10 ( Math . pow ( 10.0 , - 0.12345 ) + Math . pow ( 10.0 , - 0.23456 ) + Math . pow ( 10.0 , - 0.34567 ) ) , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . log10sumLog10 ( new double [ ] { - 15.7654 , - 17.0101 , - 17.9341 } ) , Math . log10 ( Math . pow ( 10.0 , - 15.7654 ) + Math . pow ( 10.0 , - 17.0101 ) + Math . pow ( 10.0 , - 17.9341 ) ) , requiredPrecision ) ;
2012-04-10 02:28:25 +08:00
Rarely-occurring logic bugfix for GenotypeConcordance, streamlining and testing of MathUtils
Currently, the multi-allelic test is covering the following case:
Eval A T,C
Comp A C
reciprocate this so that the reverse can be covered.
Eval A C
Comp A T,C
And furthermore, modify ConcordanceMetrics to more properly handle the situation where multiple alternate alleles are available in the comp. It was possible for an eval C/C sample to match a comp T/T sample, so long as the C allele were also present in at least one other comp sample.
This comes from the fact that "truth" reference alleles can be paired with *any* allele also present in the truth VCF, while truth het/hom var sites are restricted to having to match only the alleles present in the genotype. The reason that truth ref alleles are special case is as follows, imagine:
Eval: A G,T 0/0 2/0 2/2 1/1
Comp: A C,T 0/0 1/0 0/0 0/0
Even though the alt allele of the comp is a C, the assessment of genotypes should be as follows:
Sample1: ref called ref
Sample2: alleles don't match (the alt allele of the comp was not assessed in eval)
Sample3: ref called hom-var
Sample4: alleles don't match (the alt allele of the eval was not assessed in comp)
Before this change, Sample2 was evaluated as "het called het" (as the T allele in eval happens to also be in the comp record, just not in the comp sample). Thus: apply current
logic to comp hom-refs, and the more restrictive logic ("you have to match an allele in the comp genotype") when the comp is not reference.
Also in this commit,major refactoring and testing for MathUtils. A large number of methods were not used at all in the codebase, these methods were removed:
- dotProduct(several types). logDotProduct is used extensively, but not the real-space version.
- vectorSum
- array shuffle, random subset
- countOccurances (general forms, the char form is used in the codebase)
- getNMaxElements
- array permutation
- sorted array permutation
- compare floats
- sum() (for integer arrays and lists).
Final keyword was extensively added to MathUtils.
The ratio() and percentage() methods were revised to error out with non-positive denominators, except in the case of 0/0 (which returns 0.0 (ratio), or 0.0% (percentage)). Random sampling code was updated to make use of the cleaner implementations of generating permutations in MathUtils (allowing the array permutation code to be retired).
The PaperGenotyper still made use of one of these array methods, since it was the only walker it was migrated into the genotyper itself.
In addition, more extensive tests were added for
- logBinomialCoefficient (Newton's identity should always hold)
- logFactorial
- log10sumlog10 and its approximation
All unit tests pass
2013-03-29 11:25:28 +08:00
// magnitude of the sum doesn't matter, so we can combinatorially test this via partitions of unity
double [ ] mult_partitionFactor = new double [ ] { 0.999 , 0.98 , 0.95 , 0.90 , 0.8 , 0.5 , 0.3 , 0.1 , 0.05 , 0.001 } ;
int [ ] n_partitions = new int [ ] { 2 , 4 , 8 , 16 , 32 , 64 , 128 , 256 , 512 , 1028 } ;
for ( double alpha : mult_partitionFactor ) {
double log_alpha = Math . log10 ( alpha ) ;
double log_oneMinusAlpha = Math . log10 ( 1 - alpha ) ;
for ( int npart : n_partitions ) {
double [ ] multiplicative = new double [ npart ] ;
double [ ] equal = new double [ npart ] ;
double remaining_log = 0.0 ; // realspace = 1
for ( int i = 0 ; i < npart - 1 ; i + + ) {
equal [ i ] = - Math . log10 ( npart ) ;
double piece = remaining_log + log_alpha ; // take a*remaining, leaving remaining-a*remaining = (1-a)*remaining
multiplicative [ i ] = piece ;
remaining_log = remaining_log + log_oneMinusAlpha ;
}
equal [ npart - 1 ] = - Math . log10 ( npart ) ;
multiplicative [ npart - 1 ] = remaining_log ;
Assert . assertEquals ( MathUtils . log10sumLog10 ( equal ) , 0.0 , requiredPrecision ) ;
Assert . assertEquals ( MathUtils . log10sumLog10 ( multiplicative ) , 0.0 , requiredPrecision , String . format ( "Did not sum to one: nPartitions=%d, alpha=%f" , npart , alpha ) ) ;
}
}
2012-03-25 09:49:43 +08:00
}
@Test
public void testLogDotProduct ( ) {
2012-03-28 20:18:36 +08:00
Assert . assertEquals ( MathUtils . logDotProduct ( new double [ ] { - 5.0 , - 3.0 , 2.0 } , new double [ ] { 6.0 , 7.0 , 8.0 } ) , 10.0 , 1e-3 ) ;
Assert . assertEquals ( MathUtils . logDotProduct ( new double [ ] { - 5.0 } , new double [ ] { 6.0 } ) , 1.0 , 1e-3 ) ;
2012-03-25 09:49:43 +08:00
}
2013-05-30 04:17:43 +08:00
@Test
public void testNormalDistribution ( ) {
final double requiredPrecision = 1E-10 ;
final Normal n = new Normal ( 0.0 , 1.0 , null ) ;
for ( final double mu : new double [ ] { - 5.0 , - 3.2 , - 1.5 , 0.0 , 1.2 , 3.0 , 5.8977 } ) {
for ( final double sigma : new double [ ] { 1.2 , 3.0 , 5.8977 } ) {
for ( final double x : new double [ ] { - 5.0 , - 3.2 , - 1.5 , 0.0 , 1.2 , 3.0 , 5.8977 } ) {
n . setState ( mu , sigma ) ;
Assert . assertEquals ( n . pdf ( x ) , MathUtils . normalDistribution ( mu , sigma , x ) , requiredPrecision ) ;
Assert . assertEquals ( Math . log10 ( n . pdf ( x ) ) , MathUtils . normalDistributionLog10 ( mu , sigma , x ) , requiredPrecision ) ;
}
}
}
}
Working version of HaplotypeCaller ReferenceConfidenceModel that accounts for indels as well as SNP confidences
-- Assembly graph building now returns an object that describes whether the graph was successfully built and has variation, was succesfully built but didn't have variation, or truly failed in construction. Fixing an annoying bug where you'd prefectly assembly the sequence into the reference graph, but then return a null graph because of this, and you'd increase your kmer because it null was also used to indicate assembly failure
--
-- Output format looks like:
20 10026072 . T <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:3,0:3:9:0,9,120
20 10026073 . A <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:3,0:3:9:0,9,119
20 10026074 . T <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:3,0:3:9:0,9,121
20 10026075 . T <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:3,0:3:9:0,9,119
20 10026076 . T <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:3,0:3:9:0,9,120
20 10026077 . T <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:3,0:3:9:0,9,120
20 10026078 . C <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:5,0:5:15:0,15,217
20 10026079 . A <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:6,0:6:18:0,18,240
20 10026080 . G <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:6,0:6:18:0,18,268
20 10026081 . T <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:7,0:7:21:0,21,267
We use a symbolic allele to indicate that the site is hom-ref, and because we have an ALT allele we can provide AD and PL field values. Currently these are calculated as ref vs. any non-ref value (mismatch or insertion) but doesn't yet account properly for alignment uncertainty.
-- Can we enabled for single samples with --emitRefConfidence (-ERC).
-- This is accomplished by realigning the each read to its most likley haplotype, and then evaluting the resulting pileups over the active region interval. The realignment is done by the HaplotypeBAMWriter, which now has a generalized interface that lets us provide a ReadDestination object so we can capture the realigned reads
-- Provide access to the more raw LocusIteratorByState constructor so we can more easily make them programmatically without constructing lots of misc. GATK data structures. Moved the NO_DOWNSAMPLING constant from LIBSDownsamplingInfo to LocusIteratorByState so clients can use it without making LIBSDownsamplingInfo a public class.
-- Includes GVCF writer
-- Add 1 mb of WEx data to private/testdata
-- Integration tests for reference model output for WGS and WEx data
-- Emit GQ block information into VCF header for GVCF mode
-- OutputMode from StandardCallerArgumentCollection moved to UnifiedArgumentCollection as its no longer relevant for HC
-- Control max indel size for the reference confidence model from the command line. Increase default to 10
-- Don't use out_mode in HaplotypeCallerComplexAndSymbolicVariantsIntegrationTest
-- Unittests for ReferenceConfidenceModel
-- Unittests for new MathUtils functions
2013-06-20 03:07:35 +08:00
@DataProvider ( name = "ArrayMinData" )
public Object [ ] [ ] makeArrayMinData ( ) {
List < Object [ ] > tests = new ArrayList < Object [ ] > ( ) ;
// this functionality can be adapted to provide input data for whatever you might want in your data
tests . add ( new Object [ ] { Arrays . asList ( 10 ) , 10 } ) ;
tests . add ( new Object [ ] { Arrays . asList ( - 10 ) , - 10 } ) ;
for ( final List < Integer > values : Utils . makePermutations ( Arrays . asList ( 1 , 2 , 3 ) , 3 , false ) ) {
tests . add ( new Object [ ] { values , 1 } ) ;
}
for ( final List < Integer > values : Utils . makePermutations ( Arrays . asList ( 1 , 2 , - 3 ) , 3 , false ) ) {
tests . add ( new Object [ ] { values , - 3 } ) ;
}
return tests . toArray ( new Object [ ] [ ] { } ) ;
}
@Test ( dataProvider = "ArrayMinData" )
public void testArrayMinList ( final List < Integer > values , final int expected ) {
final int actual = MathUtils . arrayMin ( values ) ;
Assert . assertEquals ( actual , expected , "Failed with " + values ) ;
}
@Test ( dataProvider = "ArrayMinData" )
public void testArrayMinIntArray ( final List < Integer > values , final int expected ) {
final int [ ] asArray = ArrayUtils . toPrimitive ( values . toArray ( new Integer [ values . size ( ) ] ) ) ;
final int actual = MathUtils . arrayMin ( asArray ) ;
Assert . assertEquals ( actual , expected , "Failed with " + values ) ;
}
@Test ( dataProvider = "ArrayMinData" )
public void testArrayMinByteArray ( final List < Integer > values , final int expected ) {
final byte [ ] asArray = new byte [ values . size ( ) ] ;
for ( int i = 0 ; i < values . size ( ) ; i + + ) asArray [ i ] = ( byte ) ( values . get ( i ) & 0xFF ) ;
final byte actual = MathUtils . arrayMin ( asArray ) ;
Assert . assertEquals ( actual , ( byte ) ( expected & 0xFF ) , "Failed with " + values ) ;
}
@Test ( dataProvider = "ArrayMinData" )
public void testArrayMinDoubleArray ( final List < Integer > values , final int expected ) {
final double [ ] asArray = new double [ values . size ( ) ] ;
for ( int i = 0 ; i < values . size ( ) ; i + + ) asArray [ i ] = ( double ) ( values . get ( i ) ) ;
final double actual = MathUtils . arrayMin ( asArray ) ;
Assert . assertEquals ( actual , ( double ) expected , "Failed with " + values ) ;
}
@DataProvider ( name = "MedianData" )
public Object [ ] [ ] makeMedianData ( ) {
List < Object [ ] > tests = new ArrayList < Object [ ] > ( ) ;
// this functionality can be adapted to provide input data for whatever you might want in your data
tests . add ( new Object [ ] { Arrays . asList ( 10 ) , 10 } ) ;
tests . add ( new Object [ ] { Arrays . asList ( 1 , 10 ) , 10 } ) ;
for ( final List < Integer > values : Utils . makePermutations ( Arrays . asList ( 1 , 2 , - 3 ) , 3 , false ) ) {
tests . add ( new Object [ ] { values , 1 } ) ;
}
return tests . toArray ( new Object [ ] [ ] { } ) ;
}
@Test ( dataProvider = "MedianData" )
public void testMedian ( final List < Integer > values , final int expected ) {
final int actual = MathUtils . median ( values ) ;
Assert . assertEquals ( actual , expected , "Failed with " + values ) ;
}
2010-11-02 05:31:44 +08:00
}