Finished draft of code for new map-combine-reduce annotation framework

All VQSR annotations can be generated in allele-specific mode
Pull out allele-specific annotations in AS_Standard annotation group
This commit is contained in:
Laura Gauthier 2015-06-11 10:52:23 -04:00
parent ab247e7570
commit fcaf37279c
41 changed files with 3673 additions and 451 deletions

View File

@ -0,0 +1,99 @@
/*
* By downloading the PROGRAM you agree to the following terms of use:
*
* BROAD INSTITUTE
* SOFTWARE LICENSE AGREEMENT
* FOR ACADEMIC NON-COMMERCIAL RESEARCH PURPOSES ONLY
*
* This Agreement is made between the Broad Institute, Inc. with a principal address at 415 Main Street, Cambridge, MA 02142 (BROAD) and the LICENSEE and is effective at the date the downloading is completed (EFFECTIVE DATE).
*
* WHEREAS, LICENSEE desires to license the PROGRAM, as defined hereinafter, and BROAD wishes to have this PROGRAM utilized in the public interest, subject only to the royalty-free, nonexclusive, nontransferable license rights of the United States Government pursuant to 48 CFR 52.227-14; and
* WHEREAS, LICENSEE desires to license the PROGRAM and BROAD desires to grant a license on the following terms and conditions.
* NOW, THEREFORE, in consideration of the promises and covenants made herein, the parties hereto agree as follows:
*
* 1. DEFINITIONS
* 1.1 PROGRAM shall mean copyright in the object code and source code known as GATK3 and related documentation, if any, as they exist on the EFFECTIVE DATE and can be downloaded from http://www.broadinstitute.org/gatk on the EFFECTIVE DATE.
*
* 2. LICENSE
* 2.1 Grant. Subject to the terms of this Agreement, BROAD hereby grants to LICENSEE, solely for academic non-commercial research purposes, a non-exclusive, non-transferable license to: (a) download, execute and display the PROGRAM and (b) create bug fixes and modify the PROGRAM. LICENSEE hereby automatically grants to BROAD a non-exclusive, royalty-free, irrevocable license to any LICENSEE bug fixes or modifications to the PROGRAM with unlimited rights to sublicense and/or distribute. LICENSEE agrees to provide any such modifications and bug fixes to BROAD promptly upon their creation.
* The LICENSEE may apply the PROGRAM in a pipeline to data owned by users other than the LICENSEE and provide these users the results of the PROGRAM provided LICENSEE does so for academic non-commercial purposes only. For clarification purposes, academic sponsored research is not a commercial use under the terms of this Agreement.
* 2.2 No Sublicensing or Additional Rights. LICENSEE shall not sublicense or distribute the PROGRAM, in whole or in part, without prior written permission from BROAD. LICENSEE shall ensure that all of its users agree to the terms of this Agreement. LICENSEE further agrees that it shall not put the PROGRAM on a network, server, or other similar technology that may be accessed by anyone other than the LICENSEE and its employees and users who have agreed to the terms of this agreement.
* 2.3 License Limitations. Nothing in this Agreement shall be construed to confer any rights upon LICENSEE by implication, estoppel, or otherwise to any computer software, trademark, intellectual property, or patent rights of BROAD, or of any other entity, except as expressly granted herein. LICENSEE agrees that the PROGRAM, in whole or part, shall not be used for any commercial purpose, including without limitation, as the basis of a commercial software or hardware product or to provide services. LICENSEE further agrees that the PROGRAM shall not be copied or otherwise adapted in order to circumvent the need for obtaining a license for use of the PROGRAM.
*
* 3. PHONE-HOME FEATURE
* LICENSEE expressly acknowledges that the PROGRAM contains an embedded automatic reporting system (PHONE-HOME) which is enabled by default upon download. Unless LICENSEE requests disablement of PHONE-HOME, LICENSEE agrees that BROAD may collect limited information transmitted by PHONE-HOME regarding LICENSEE and its use of the PROGRAM. Such information shall include LICENSEES user identification, version number of the PROGRAM and tools being run, mode of analysis employed, and any error reports generated during run-time. Collection of such information is used by BROAD solely to monitor usage rates, fulfill reporting requirements to BROAD funding agencies, drive improvements to the PROGRAM, and facilitate adjustments to PROGRAM-related documentation.
*
* 4. OWNERSHIP OF INTELLECTUAL PROPERTY
* LICENSEE acknowledges that title to the PROGRAM shall remain with BROAD. The PROGRAM is marked with the following BROAD copyright notice and notice of attribution to contributors. LICENSEE shall retain such notice on all copies. LICENSEE agrees to include appropriate attribution if any results obtained from use of the PROGRAM are included in any publication.
* Copyright 2012-2015 Broad Institute, Inc.
* Notice of attribution: The GATK3 program was made available through the generosity of Medical and Population Genetics program at the Broad Institute, Inc.
* LICENSEE shall not use any trademark or trade name of BROAD, or any variation, adaptation, or abbreviation, of such marks or trade names, or any names of officers, faculty, students, employees, or agents of BROAD except as states above for attribution purposes.
*
* 5. INDEMNIFICATION
* LICENSEE shall indemnify, defend, and hold harmless BROAD, and their respective officers, faculty, students, employees, associated investigators and agents, and their respective successors, heirs and assigns, (Indemnitees), against any liability, damage, loss, or expense (including reasonable attorneys fees and expenses) incurred by or imposed upon any of the Indemnitees in connection with any claims, suits, actions, demands or judgments arising out of any theory of liability (including, without limitation, actions in the form of tort, warranty, or strict liability and regardless of whether such action has any factual basis) pursuant to any right or license granted under this Agreement.
*
* 6. NO REPRESENTATIONS OR WARRANTIES
* THE PROGRAM IS DELIVERED AS IS. BROAD MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND CONCERNING THE PROGRAM OR THE COPYRIGHT, EXPRESS OR IMPLIED, INCLUDING, WITHOUT LIMITATION, WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, NONINFRINGEMENT, OR THE ABSENCE OF LATENT OR OTHER DEFECTS, WHETHER OR NOT DISCOVERABLE. BROAD EXTENDS NO WARRANTIES OF ANY KIND AS TO PROGRAM CONFORMITY WITH WHATEVER USER MANUALS OR OTHER LITERATURE MAY BE ISSUED FROM TIME TO TIME.
* IN NO EVENT SHALL BROAD OR ITS RESPECTIVE DIRECTORS, OFFICERS, EMPLOYEES, AFFILIATED INVESTIGATORS AND AFFILIATES BE LIABLE FOR INCIDENTAL OR CONSEQUENTIAL DAMAGES OF ANY KIND, INCLUDING, WITHOUT LIMITATION, ECONOMIC DAMAGES OR INJURY TO PROPERTY AND LOST PROFITS, REGARDLESS OF WHETHER BROAD SHALL BE ADVISED, SHALL HAVE OTHER REASON TO KNOW, OR IN FACT SHALL KNOW OF THE POSSIBILITY OF THE FOREGOING.
*
* 7. ASSIGNMENT
* This Agreement is personal to LICENSEE and any rights or obligations assigned by LICENSEE without the prior written consent of BROAD shall be null and void.
*
* 8. MISCELLANEOUS
* 8.1 Export Control. LICENSEE gives assurance that it will comply with all United States export control laws and regulations controlling the export of the PROGRAM, including, without limitation, all Export Administration Regulations of the United States Department of Commerce. Among other things, these laws and regulations prohibit, or require a license for, the export of certain types of software to specified countries.
* 8.2 Termination. LICENSEE shall have the right to terminate this Agreement for any reason upon prior written notice to BROAD. If LICENSEE breaches any provision hereunder, and fails to cure such breach within thirty (30) days, BROAD may terminate this Agreement immediately. Upon termination, LICENSEE shall provide BROAD with written assurance that the original and all copies of the PROGRAM have been destroyed, except that, upon prior written authorization from BROAD, LICENSEE may retain a copy for archive purposes.
* 8.3 Survival. The following provisions shall survive the expiration or termination of this Agreement: Articles 1, 3, 4, 5 and Sections 2.2, 2.3, 7.3, and 7.4.
* 8.4 Notice. Any notices under this Agreement shall be in writing, shall specifically refer to this Agreement, and shall be sent by hand, recognized national overnight courier, confirmed facsimile transmission, confirmed electronic mail, or registered or certified mail, postage prepaid, return receipt requested. All notices under this Agreement shall be deemed effective upon receipt.
* 8.5 Amendment and Waiver; Entire Agreement. This Agreement may be amended, supplemented, or otherwise modified only by means of a written instrument signed by all parties. Any waiver of any rights or failure to act in a specific instance shall relate only to such instance and shall not be construed as an agreement to waive any rights or fail to act in any other instance, whether or not similar. This Agreement constitutes the entire agreement among the parties with respect to its subject matter and supersedes prior agreements or understandings between the parties relating to its subject matter.
* 8.6 Binding Effect; Headings. This Agreement shall be binding upon and inure to the benefit of the parties and their respective permitted successors and assigns. All headings are for convenience only and shall not affect the meaning of any provision of this Agreement.
* 8.7 Governing Law. This Agreement shall be construed, governed, interpreted and applied in accordance with the internal laws of the Commonwealth of Massachusetts, U.S.A., without regard to conflict of laws principles.
*/
package org.broadinstitute.gatk.tools.walkers.annotator;
import htsjdk.variant.vcf.VCFInfoHeaderLine;
import org.broadinstitute.gatk.tools.walkers.annotator.interfaces.AS_StandardAnnotation;
import org.broadinstitute.gatk.tools.walkers.annotator.interfaces.StandardAnnotation;
import org.broadinstitute.gatk.utils.sam.GATKSAMRecord;
import org.broadinstitute.gatk.utils.sam.ReadUtils;
import org.broadinstitute.gatk.utils.variant.GATKVCFConstants;
import org.broadinstitute.gatk.utils.variant.GATKVCFHeaderLines;
import java.util.Arrays;
import java.util.List;
/**
* Allele-specific rank Sum Test of REF versus each ALT base quality scores
*
* <p>This variant-level annotation tests compares the base qualities of the data supporting the reference allele with those supporting each alternate allele. The ideal result is a value close to zero, which indicates there is little to no difference. A negative value indicates that the bases supporting the alternate allele have lower quality scores than those supporting the reference allele. Conversely, a positive value indicates that the bases supporting the alternate allele have higher quality scores than those supporting the reference allele. Finding a statistically significant difference either way suggests that the sequencing process may have been biased or affected by an artifact.</p>
*
* <h3>Statistical notes</h3>
* <p>The value output for this annotation is the u-based z-approximation from the Mann-Whitney-Wilcoxon Rank Sum Test for base qualities (bases supporting REF vs. bases supporting ALT). See the <a href="http://www.broadinstitute.org/gatk/guide/article?id=4732">method document on statistical tests</a> for a more detailed explanation of the ranksum test.</p>
*
* <h3>Caveat</h3>
* <p>Uninformative reads are not used in these calculations.</p>
*
*/
public class AS_BaseQualityRankSumTest extends AS_RankSumTest implements AS_StandardAnnotation {
@Override
public List<String> getKeyNames() {
return Arrays.asList(GATKVCFConstants.AS_BASE_QUAL_RANK_SUM_KEY);
}
@Override
public String getRawKeyName() { return GATKVCFConstants.AS_RAW_BASE_QUAL_RANK_SUM_KEY;}
/**
* Get the element for the given read at the given reference position
*
* @param read the read
* @param refLoc the reference position
* @return a Double representing the element to be used in the rank sum test, or null if it should not be used
*/
@Override
protected Double getElementForRead(final GATKSAMRecord read, final int refLoc) {
return (double) read.getBaseQualities()[ReadUtils.getReadCoordinateForReferenceCoordinateUpToEndOfRead(read, refLoc, ReadUtils.ClippingTail.RIGHT_TAIL)];
}
}

View File

@ -0,0 +1,134 @@
/*
* By downloading the PROGRAM you agree to the following terms of use:
*
* BROAD INSTITUTE
* SOFTWARE LICENSE AGREEMENT
* FOR ACADEMIC NON-COMMERCIAL RESEARCH PURPOSES ONLY
*
* This Agreement is made between the Broad Institute, Inc. with a principal address at 415 Main Street, Cambridge, MA 02142 (BROAD) and the LICENSEE and is effective at the date the downloading is completed (EFFECTIVE DATE).
*
* WHEREAS, LICENSEE desires to license the PROGRAM, as defined hereinafter, and BROAD wishes to have this PROGRAM utilized in the public interest, subject only to the royalty-free, nonexclusive, nontransferable license rights of the United States Government pursuant to 48 CFR 52.227-14; and
* WHEREAS, LICENSEE desires to license the PROGRAM and BROAD desires to grant a license on the following terms and conditions.
* NOW, THEREFORE, in consideration of the promises and covenants made herein, the parties hereto agree as follows:
*
* 1. DEFINITIONS
* 1.1 PROGRAM shall mean copyright in the object code and source code known as GATK3 and related documentation, if any, as they exist on the EFFECTIVE DATE and can be downloaded from http://www.broadinstitute.org/gatk on the EFFECTIVE DATE.
*
* 2. LICENSE
* 2.1 Grant. Subject to the terms of this Agreement, BROAD hereby grants to LICENSEE, solely for academic non-commercial research purposes, a non-exclusive, non-transferable license to: (a) download, execute and display the PROGRAM and (b) create bug fixes and modify the PROGRAM. LICENSEE hereby automatically grants to BROAD a non-exclusive, royalty-free, irrevocable license to any LICENSEE bug fixes or modifications to the PROGRAM with unlimited rights to sublicense and/or distribute. LICENSEE agrees to provide any such modifications and bug fixes to BROAD promptly upon their creation.
* The LICENSEE may apply the PROGRAM in a pipeline to data owned by users other than the LICENSEE and provide these users the results of the PROGRAM provided LICENSEE does so for academic non-commercial purposes only. For clarification purposes, academic sponsored research is not a commercial use under the terms of this Agreement.
* 2.2 No Sublicensing or Additional Rights. LICENSEE shall not sublicense or distribute the PROGRAM, in whole or in part, without prior written permission from BROAD. LICENSEE shall ensure that all of its users agree to the terms of this Agreement. LICENSEE further agrees that it shall not put the PROGRAM on a network, server, or other similar technology that may be accessed by anyone other than the LICENSEE and its employees and users who have agreed to the terms of this agreement.
* 2.3 License Limitations. Nothing in this Agreement shall be construed to confer any rights upon LICENSEE by implication, estoppel, or otherwise to any computer software, trademark, intellectual property, or patent rights of BROAD, or of any other entity, except as expressly granted herein. LICENSEE agrees that the PROGRAM, in whole or part, shall not be used for any commercial purpose, including without limitation, as the basis of a commercial software or hardware product or to provide services. LICENSEE further agrees that the PROGRAM shall not be copied or otherwise adapted in order to circumvent the need for obtaining a license for use of the PROGRAM.
*
* 3. PHONE-HOME FEATURE
* LICENSEE expressly acknowledges that the PROGRAM contains an embedded automatic reporting system (PHONE-HOME) which is enabled by default upon download. Unless LICENSEE requests disablement of PHONE-HOME, LICENSEE agrees that BROAD may collect limited information transmitted by PHONE-HOME regarding LICENSEE and its use of the PROGRAM. Such information shall include LICENSEES user identification, version number of the PROGRAM and tools being run, mode of analysis employed, and any error reports generated during run-time. Collection of such information is used by BROAD solely to monitor usage rates, fulfill reporting requirements to BROAD funding agencies, drive improvements to the PROGRAM, and facilitate adjustments to PROGRAM-related documentation.
*
* 4. OWNERSHIP OF INTELLECTUAL PROPERTY
* LICENSEE acknowledges that title to the PROGRAM shall remain with BROAD. The PROGRAM is marked with the following BROAD copyright notice and notice of attribution to contributors. LICENSEE shall retain such notice on all copies. LICENSEE agrees to include appropriate attribution if any results obtained from use of the PROGRAM are included in any publication.
* Copyright 2012-2015 Broad Institute, Inc.
* Notice of attribution: The GATK3 program was made available through the generosity of Medical and Population Genetics program at the Broad Institute, Inc.
* LICENSEE shall not use any trademark or trade name of BROAD, or any variation, adaptation, or abbreviation, of such marks or trade names, or any names of officers, faculty, students, employees, or agents of BROAD except as states above for attribution purposes.
*
* 5. INDEMNIFICATION
* LICENSEE shall indemnify, defend, and hold harmless BROAD, and their respective officers, faculty, students, employees, associated investigators and agents, and their respective successors, heirs and assigns, (Indemnitees), against any liability, damage, loss, or expense (including reasonable attorneys fees and expenses) incurred by or imposed upon any of the Indemnitees in connection with any claims, suits, actions, demands or judgments arising out of any theory of liability (including, without limitation, actions in the form of tort, warranty, or strict liability and regardless of whether such action has any factual basis) pursuant to any right or license granted under this Agreement.
*
* 6. NO REPRESENTATIONS OR WARRANTIES
* THE PROGRAM IS DELIVERED AS IS. BROAD MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND CONCERNING THE PROGRAM OR THE COPYRIGHT, EXPRESS OR IMPLIED, INCLUDING, WITHOUT LIMITATION, WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, NONINFRINGEMENT, OR THE ABSENCE OF LATENT OR OTHER DEFECTS, WHETHER OR NOT DISCOVERABLE. BROAD EXTENDS NO WARRANTIES OF ANY KIND AS TO PROGRAM CONFORMITY WITH WHATEVER USER MANUALS OR OTHER LITERATURE MAY BE ISSUED FROM TIME TO TIME.
* IN NO EVENT SHALL BROAD OR ITS RESPECTIVE DIRECTORS, OFFICERS, EMPLOYEES, AFFILIATED INVESTIGATORS AND AFFILIATES BE LIABLE FOR INCIDENTAL OR CONSEQUENTIAL DAMAGES OF ANY KIND, INCLUDING, WITHOUT LIMITATION, ECONOMIC DAMAGES OR INJURY TO PROPERTY AND LOST PROFITS, REGARDLESS OF WHETHER BROAD SHALL BE ADVISED, SHALL HAVE OTHER REASON TO KNOW, OR IN FACT SHALL KNOW OF THE POSSIBILITY OF THE FOREGOING.
*
* 7. ASSIGNMENT
* This Agreement is personal to LICENSEE and any rights or obligations assigned by LICENSEE without the prior written consent of BROAD shall be null and void.
*
* 8. MISCELLANEOUS
* 8.1 Export Control. LICENSEE gives assurance that it will comply with all United States export control laws and regulations controlling the export of the PROGRAM, including, without limitation, all Export Administration Regulations of the United States Department of Commerce. Among other things, these laws and regulations prohibit, or require a license for, the export of certain types of software to specified countries.
* 8.2 Termination. LICENSEE shall have the right to terminate this Agreement for any reason upon prior written notice to BROAD. If LICENSEE breaches any provision hereunder, and fails to cure such breach within thirty (30) days, BROAD may terminate this Agreement immediately. Upon termination, LICENSEE shall provide BROAD with written assurance that the original and all copies of the PROGRAM have been destroyed, except that, upon prior written authorization from BROAD, LICENSEE may retain a copy for archive purposes.
* 8.3 Survival. The following provisions shall survive the expiration or termination of this Agreement: Articles 1, 3, 4, 5 and Sections 2.2, 2.3, 7.3, and 7.4.
* 8.4 Notice. Any notices under this Agreement shall be in writing, shall specifically refer to this Agreement, and shall be sent by hand, recognized national overnight courier, confirmed facsimile transmission, confirmed electronic mail, or registered or certified mail, postage prepaid, return receipt requested. All notices under this Agreement shall be deemed effective upon receipt.
* 8.5 Amendment and Waiver; Entire Agreement. This Agreement may be amended, supplemented, or otherwise modified only by means of a written instrument signed by all parties. Any waiver of any rights or failure to act in a specific instance shall relate only to such instance and shall not be construed as an agreement to waive any rights or fail to act in any other instance, whether or not similar. This Agreement constitutes the entire agreement among the parties with respect to its subject matter and supersedes prior agreements or understandings between the parties relating to its subject matter.
* 8.6 Binding Effect; Headings. This Agreement shall be binding upon and inure to the benefit of the parties and their respective permitted successors and assigns. All headings are for convenience only and shall not affect the meaning of any provision of this Agreement.
* 8.7 Governing Law. This Agreement shall be construed, governed, interpreted and applied in accordance with the internal laws of the Commonwealth of Massachusetts, U.S.A., without regard to conflict of laws principles.
*/
package org.broadinstitute.gatk.tools.walkers.annotator;
import htsjdk.variant.variantcontext.Allele;
import htsjdk.variant.variantcontext.VariantContext;
import org.broadinstitute.gatk.tools.walkers.annotator.interfaces.*;
import org.broadinstitute.gatk.utils.QualityUtils;
import org.broadinstitute.gatk.utils.genotyper.PerReadAlleleLikelihoodMap;
import org.broadinstitute.gatk.utils.variant.GATKVCFConstants;
import java.util.Collections;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
/**
* Allele specific strand bias estimated using Fisher's Exact Test
*
*/
public class AS_FisherStrand extends AS_StrandBiasTest implements AS_StandardAnnotation {
@Override
public List<String> getKeyNames() {
return Collections.singletonList(GATKVCFConstants.AS_FISHER_STRAND_KEY);
}
@Override
protected Map<String, Object> calculateAnnotationFromLikelihoodMap(final Map<String, PerReadAlleleLikelihoodMap> stratifiedPerReadAlleleLikelihoodMap,
final VariantContext vc) {
// either SNP with no alignment context, or indels: per-read likelihood map needed
final int[][] table = getContingencyTable(stratifiedPerReadAlleleLikelihoodMap, vc, MIN_COUNT);
//logger.info("VC " + vc);
//printTable(table, 0.0);
return pValueAnnotationForBestTable(table, null);
}
/**
* Create an annotation for the highest (i.e., least significant) p-value of table1 and table2
*
* @param table1 a contingency table, may be null
* @param table2 a contingency table, may be null
* @return annotation result for FS given tables
*/
private Map<String, Object> pValueAnnotationForBestTable(final int[][] table1, final int[][] table2) {
if ( table2 == null )
return table1 == null ? null : annotationForOneTable(StrandBiasTableUtils.FisherExactPValueForContingencyTable(table1));
else if (table1 == null)
return annotationForOneTable(StrandBiasTableUtils.FisherExactPValueForContingencyTable(table2));
else { // take the one with the best (i.e., least significant pvalue)
double pvalue1 = StrandBiasTableUtils.FisherExactPValueForContingencyTable(table1);
double pvalue2 = StrandBiasTableUtils.FisherExactPValueForContingencyTable(table2);
return annotationForOneTable(Math.max(pvalue1, pvalue2));
}
}
/**
* Returns an annotation result given a pValue
*
* @param pValue
* @return a hash map from FS -> phred-scaled pValue
*/
protected Map<String, Object> annotationForOneTable(final double pValue) {
final Object value = String.format("%.3f", QualityUtils.phredScaleErrorRate(Math.max(pValue, MIN_PVALUE))); // prevent INFINITYs
return Collections.singletonMap(getKeyNames().get(0), value);
}
@Override
protected Map<Allele,Double> calculateReducedData(AlleleSpecificAnnotationData<List<Integer>> combinedData) {
final Map<Allele,Double> annotationMap = new HashMap<>();
final Map<Allele,List<Integer>> perAlleleData = combinedData.getAttributeMap();
final List<Integer> refStrandCounts = perAlleleData.get(combinedData.getRefAllele());
for (final Allele a : perAlleleData.keySet()) {
if(a.equals(combinedData.getRefAllele(),true))
continue;
final List<Integer> altStrandCounts = combinedData.getAttribute(a);
final int[][] refAltTable = new int[][] {new int[]{refStrandCounts.get(0),refStrandCounts.get(1)},new int[]{altStrandCounts.get(0),altStrandCounts.get(1)}};
annotationMap.put(a,QualityUtils.phredScaleErrorRate(Math.max(StrandBiasTableUtils.FisherExactPValueForContingencyTable(refAltTable), MIN_PVALUE)));
}
return annotationMap;
}
}

View File

@ -0,0 +1,87 @@
/*
* By downloading the PROGRAM you agree to the following terms of use:
*
* BROAD INSTITUTE
* SOFTWARE LICENSE AGREEMENT
* FOR ACADEMIC NON-COMMERCIAL RESEARCH PURPOSES ONLY
*
* This Agreement is made between the Broad Institute, Inc. with a principal address at 415 Main Street, Cambridge, MA 02142 (BROAD) and the LICENSEE and is effective at the date the downloading is completed (EFFECTIVE DATE).
*
* WHEREAS, LICENSEE desires to license the PROGRAM, as defined hereinafter, and BROAD wishes to have this PROGRAM utilized in the public interest, subject only to the royalty-free, nonexclusive, nontransferable license rights of the United States Government pursuant to 48 CFR 52.227-14; and
* WHEREAS, LICENSEE desires to license the PROGRAM and BROAD desires to grant a license on the following terms and conditions.
* NOW, THEREFORE, in consideration of the promises and covenants made herein, the parties hereto agree as follows:
*
* 1. DEFINITIONS
* 1.1 PROGRAM shall mean copyright in the object code and source code known as GATK3 and related documentation, if any, as they exist on the EFFECTIVE DATE and can be downloaded from http://www.broadinstitute.org/gatk on the EFFECTIVE DATE.
*
* 2. LICENSE
* 2.1 Grant. Subject to the terms of this Agreement, BROAD hereby grants to LICENSEE, solely for academic non-commercial research purposes, a non-exclusive, non-transferable license to: (a) download, execute and display the PROGRAM and (b) create bug fixes and modify the PROGRAM. LICENSEE hereby automatically grants to BROAD a non-exclusive, royalty-free, irrevocable license to any LICENSEE bug fixes or modifications to the PROGRAM with unlimited rights to sublicense and/or distribute. LICENSEE agrees to provide any such modifications and bug fixes to BROAD promptly upon their creation.
* The LICENSEE may apply the PROGRAM in a pipeline to data owned by users other than the LICENSEE and provide these users the results of the PROGRAM provided LICENSEE does so for academic non-commercial purposes only. For clarification purposes, academic sponsored research is not a commercial use under the terms of this Agreement.
* 2.2 No Sublicensing or Additional Rights. LICENSEE shall not sublicense or distribute the PROGRAM, in whole or in part, without prior written permission from BROAD. LICENSEE shall ensure that all of its users agree to the terms of this Agreement. LICENSEE further agrees that it shall not put the PROGRAM on a network, server, or other similar technology that may be accessed by anyone other than the LICENSEE and its employees and users who have agreed to the terms of this agreement.
* 2.3 License Limitations. Nothing in this Agreement shall be construed to confer any rights upon LICENSEE by implication, estoppel, or otherwise to any computer software, trademark, intellectual property, or patent rights of BROAD, or of any other entity, except as expressly granted herein. LICENSEE agrees that the PROGRAM, in whole or part, shall not be used for any commercial purpose, including without limitation, as the basis of a commercial software or hardware product or to provide services. LICENSEE further agrees that the PROGRAM shall not be copied or otherwise adapted in order to circumvent the need for obtaining a license for use of the PROGRAM.
*
* 3. PHONE-HOME FEATURE
* LICENSEE expressly acknowledges that the PROGRAM contains an embedded automatic reporting system (PHONE-HOME) which is enabled by default upon download. Unless LICENSEE requests disablement of PHONE-HOME, LICENSEE agrees that BROAD may collect limited information transmitted by PHONE-HOME regarding LICENSEE and its use of the PROGRAM. Such information shall include LICENSEES user identification, version number of the PROGRAM and tools being run, mode of analysis employed, and any error reports generated during run-time. Collection of such information is used by BROAD solely to monitor usage rates, fulfill reporting requirements to BROAD funding agencies, drive improvements to the PROGRAM, and facilitate adjustments to PROGRAM-related documentation.
*
* 4. OWNERSHIP OF INTELLECTUAL PROPERTY
* LICENSEE acknowledges that title to the PROGRAM shall remain with BROAD. The PROGRAM is marked with the following BROAD copyright notice and notice of attribution to contributors. LICENSEE shall retain such notice on all copies. LICENSEE agrees to include appropriate attribution if any results obtained from use of the PROGRAM are included in any publication.
* Copyright 2012-2015 Broad Institute, Inc.
* Notice of attribution: The GATK3 program was made available through the generosity of Medical and Population Genetics program at the Broad Institute, Inc.
* LICENSEE shall not use any trademark or trade name of BROAD, or any variation, adaptation, or abbreviation, of such marks or trade names, or any names of officers, faculty, students, employees, or agents of BROAD except as states above for attribution purposes.
*
* 5. INDEMNIFICATION
* LICENSEE shall indemnify, defend, and hold harmless BROAD, and their respective officers, faculty, students, employees, associated investigators and agents, and their respective successors, heirs and assigns, (Indemnitees), against any liability, damage, loss, or expense (including reasonable attorneys fees and expenses) incurred by or imposed upon any of the Indemnitees in connection with any claims, suits, actions, demands or judgments arising out of any theory of liability (including, without limitation, actions in the form of tort, warranty, or strict liability and regardless of whether such action has any factual basis) pursuant to any right or license granted under this Agreement.
*
* 6. NO REPRESENTATIONS OR WARRANTIES
* THE PROGRAM IS DELIVERED AS IS. BROAD MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND CONCERNING THE PROGRAM OR THE COPYRIGHT, EXPRESS OR IMPLIED, INCLUDING, WITHOUT LIMITATION, WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, NONINFRINGEMENT, OR THE ABSENCE OF LATENT OR OTHER DEFECTS, WHETHER OR NOT DISCOVERABLE. BROAD EXTENDS NO WARRANTIES OF ANY KIND AS TO PROGRAM CONFORMITY WITH WHATEVER USER MANUALS OR OTHER LITERATURE MAY BE ISSUED FROM TIME TO TIME.
* IN NO EVENT SHALL BROAD OR ITS RESPECTIVE DIRECTORS, OFFICERS, EMPLOYEES, AFFILIATED INVESTIGATORS AND AFFILIATES BE LIABLE FOR INCIDENTAL OR CONSEQUENTIAL DAMAGES OF ANY KIND, INCLUDING, WITHOUT LIMITATION, ECONOMIC DAMAGES OR INJURY TO PROPERTY AND LOST PROFITS, REGARDLESS OF WHETHER BROAD SHALL BE ADVISED, SHALL HAVE OTHER REASON TO KNOW, OR IN FACT SHALL KNOW OF THE POSSIBILITY OF THE FOREGOING.
*
* 7. ASSIGNMENT
* This Agreement is personal to LICENSEE and any rights or obligations assigned by LICENSEE without the prior written consent of BROAD shall be null and void.
*
* 8. MISCELLANEOUS
* 8.1 Export Control. LICENSEE gives assurance that it will comply with all United States export control laws and regulations controlling the export of the PROGRAM, including, without limitation, all Export Administration Regulations of the United States Department of Commerce. Among other things, these laws and regulations prohibit, or require a license for, the export of certain types of software to specified countries.
* 8.2 Termination. LICENSEE shall have the right to terminate this Agreement for any reason upon prior written notice to BROAD. If LICENSEE breaches any provision hereunder, and fails to cure such breach within thirty (30) days, BROAD may terminate this Agreement immediately. Upon termination, LICENSEE shall provide BROAD with written assurance that the original and all copies of the PROGRAM have been destroyed, except that, upon prior written authorization from BROAD, LICENSEE may retain a copy for archive purposes.
* 8.3 Survival. The following provisions shall survive the expiration or termination of this Agreement: Articles 1, 3, 4, 5 and Sections 2.2, 2.3, 7.3, and 7.4.
* 8.4 Notice. Any notices under this Agreement shall be in writing, shall specifically refer to this Agreement, and shall be sent by hand, recognized national overnight courier, confirmed facsimile transmission, confirmed electronic mail, or registered or certified mail, postage prepaid, return receipt requested. All notices under this Agreement shall be deemed effective upon receipt.
* 8.5 Amendment and Waiver; Entire Agreement. This Agreement may be amended, supplemented, or otherwise modified only by means of a written instrument signed by all parties. Any waiver of any rights or failure to act in a specific instance shall relate only to such instance and shall not be construed as an agreement to waive any rights or fail to act in any other instance, whether or not similar. This Agreement constitutes the entire agreement among the parties with respect to its subject matter and supersedes prior agreements or understandings between the parties relating to its subject matter.
* 8.6 Binding Effect; Headings. This Agreement shall be binding upon and inure to the benefit of the parties and their respective permitted successors and assigns. All headings are for convenience only and shall not affect the meaning of any provision of this Agreement.
* 8.7 Governing Law. This Agreement shall be construed, governed, interpreted and applied in accordance with the internal laws of the Commonwealth of Massachusetts, U.S.A., without regard to conflict of laws principles.
*/
package org.broadinstitute.gatk.tools.walkers.annotator;
import htsjdk.variant.variantcontext.Allele;
import htsjdk.variant.variantcontext.VariantContext;
import htsjdk.variant.vcf.VCFInfoHeaderLine;
import org.broadinstitute.gatk.tools.walkers.annotator.interfaces.AS_StandardAnnotation;
import org.broadinstitute.gatk.tools.walkers.annotator.interfaces.StandardAnnotation;
import org.broadinstitute.gatk.utils.genotyper.PerReadAlleleLikelihoodMap;
import org.broadinstitute.gatk.utils.pileup.PileupElement;
import org.broadinstitute.gatk.utils.sam.GATKSAMRecord;
import org.broadinstitute.gatk.utils.variant.GATKVCFConstants;
import org.broadinstitute.gatk.utils.variant.GATKVCFHeaderLines;
import java.util.Arrays;
import java.util.List;
import java.util.Map;
/**
* Allele specific Rank Sum Test for mapping qualities of REF versus each ALT reads
*
* Currently this annotation duplicate the MappingQualityRankSumTest annotation
*
*/
public class AS_MappingQualityRankSumTest extends AS_RankSumTest implements AS_StandardAnnotation {
@Override
public List<String> getKeyNames() { return Arrays.asList(GATKVCFConstants.AS_MAP_QUAL_RANK_SUM_KEY); }
@Override
public String getRawKeyName() { return GATKVCFConstants.AS_RAW_MAP_QUAL_RANK_SUM_KEY;}
@Override
protected Double getElementForRead(final GATKSAMRecord read, final int refLoc) {
return (double)read.getMappingQuality();
}
}

View File

@ -0,0 +1,192 @@
/*
* By downloading the PROGRAM you agree to the following terms of use:
*
* BROAD INSTITUTE
* SOFTWARE LICENSE AGREEMENT
* FOR ACADEMIC NON-COMMERCIAL RESEARCH PURPOSES ONLY
*
* This Agreement is made between the Broad Institute, Inc. with a principal address at 415 Main Street, Cambridge, MA 02142 (BROAD) and the LICENSEE and is effective at the date the downloading is completed (EFFECTIVE DATE).
*
* WHEREAS, LICENSEE desires to license the PROGRAM, as defined hereinafter, and BROAD wishes to have this PROGRAM utilized in the public interest, subject only to the royalty-free, nonexclusive, nontransferable license rights of the United States Government pursuant to 48 CFR 52.227-14; and
* WHEREAS, LICENSEE desires to license the PROGRAM and BROAD desires to grant a license on the following terms and conditions.
* NOW, THEREFORE, in consideration of the promises and covenants made herein, the parties hereto agree as follows:
*
* 1. DEFINITIONS
* 1.1 PROGRAM shall mean copyright in the object code and source code known as GATK3 and related documentation, if any, as they exist on the EFFECTIVE DATE and can be downloaded from http://www.broadinstitute.org/gatk on the EFFECTIVE DATE.
*
* 2. LICENSE
* 2.1 Grant. Subject to the terms of this Agreement, BROAD hereby grants to LICENSEE, solely for academic non-commercial research purposes, a non-exclusive, non-transferable license to: (a) download, execute and display the PROGRAM and (b) create bug fixes and modify the PROGRAM. LICENSEE hereby automatically grants to BROAD a non-exclusive, royalty-free, irrevocable license to any LICENSEE bug fixes or modifications to the PROGRAM with unlimited rights to sublicense and/or distribute. LICENSEE agrees to provide any such modifications and bug fixes to BROAD promptly upon their creation.
* The LICENSEE may apply the PROGRAM in a pipeline to data owned by users other than the LICENSEE and provide these users the results of the PROGRAM provided LICENSEE does so for academic non-commercial purposes only. For clarification purposes, academic sponsored research is not a commercial use under the terms of this Agreement.
* 2.2 No Sublicensing or Additional Rights. LICENSEE shall not sublicense or distribute the PROGRAM, in whole or in part, without prior written permission from BROAD. LICENSEE shall ensure that all of its users agree to the terms of this Agreement. LICENSEE further agrees that it shall not put the PROGRAM on a network, server, or other similar technology that may be accessed by anyone other than the LICENSEE and its employees and users who have agreed to the terms of this agreement.
* 2.3 License Limitations. Nothing in this Agreement shall be construed to confer any rights upon LICENSEE by implication, estoppel, or otherwise to any computer software, trademark, intellectual property, or patent rights of BROAD, or of any other entity, except as expressly granted herein. LICENSEE agrees that the PROGRAM, in whole or part, shall not be used for any commercial purpose, including without limitation, as the basis of a commercial software or hardware product or to provide services. LICENSEE further agrees that the PROGRAM shall not be copied or otherwise adapted in order to circumvent the need for obtaining a license for use of the PROGRAM.
*
* 3. PHONE-HOME FEATURE
* LICENSEE expressly acknowledges that the PROGRAM contains an embedded automatic reporting system (PHONE-HOME) which is enabled by default upon download. Unless LICENSEE requests disablement of PHONE-HOME, LICENSEE agrees that BROAD may collect limited information transmitted by PHONE-HOME regarding LICENSEE and its use of the PROGRAM. Such information shall include LICENSEES user identification, version number of the PROGRAM and tools being run, mode of analysis employed, and any error reports generated during run-time. Collection of such information is used by BROAD solely to monitor usage rates, fulfill reporting requirements to BROAD funding agencies, drive improvements to the PROGRAM, and facilitate adjustments to PROGRAM-related documentation.
*
* 4. OWNERSHIP OF INTELLECTUAL PROPERTY
* LICENSEE acknowledges that title to the PROGRAM shall remain with BROAD. The PROGRAM is marked with the following BROAD copyright notice and notice of attribution to contributors. LICENSEE shall retain such notice on all copies. LICENSEE agrees to include appropriate attribution if any results obtained from use of the PROGRAM are included in any publication.
* Copyright 2012-2015 Broad Institute, Inc.
* Notice of attribution: The GATK3 program was made available through the generosity of Medical and Population Genetics program at the Broad Institute, Inc.
* LICENSEE shall not use any trademark or trade name of BROAD, or any variation, adaptation, or abbreviation, of such marks or trade names, or any names of officers, faculty, students, employees, or agents of BROAD except as states above for attribution purposes.
*
* 5. INDEMNIFICATION
* LICENSEE shall indemnify, defend, and hold harmless BROAD, and their respective officers, faculty, students, employees, associated investigators and agents, and their respective successors, heirs and assigns, (Indemnitees), against any liability, damage, loss, or expense (including reasonable attorneys fees and expenses) incurred by or imposed upon any of the Indemnitees in connection with any claims, suits, actions, demands or judgments arising out of any theory of liability (including, without limitation, actions in the form of tort, warranty, or strict liability and regardless of whether such action has any factual basis) pursuant to any right or license granted under this Agreement.
*
* 6. NO REPRESENTATIONS OR WARRANTIES
* THE PROGRAM IS DELIVERED AS IS. BROAD MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND CONCERNING THE PROGRAM OR THE COPYRIGHT, EXPRESS OR IMPLIED, INCLUDING, WITHOUT LIMITATION, WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, NONINFRINGEMENT, OR THE ABSENCE OF LATENT OR OTHER DEFECTS, WHETHER OR NOT DISCOVERABLE. BROAD EXTENDS NO WARRANTIES OF ANY KIND AS TO PROGRAM CONFORMITY WITH WHATEVER USER MANUALS OR OTHER LITERATURE MAY BE ISSUED FROM TIME TO TIME.
* IN NO EVENT SHALL BROAD OR ITS RESPECTIVE DIRECTORS, OFFICERS, EMPLOYEES, AFFILIATED INVESTIGATORS AND AFFILIATES BE LIABLE FOR INCIDENTAL OR CONSEQUENTIAL DAMAGES OF ANY KIND, INCLUDING, WITHOUT LIMITATION, ECONOMIC DAMAGES OR INJURY TO PROPERTY AND LOST PROFITS, REGARDLESS OF WHETHER BROAD SHALL BE ADVISED, SHALL HAVE OTHER REASON TO KNOW, OR IN FACT SHALL KNOW OF THE POSSIBILITY OF THE FOREGOING.
*
* 7. ASSIGNMENT
* This Agreement is personal to LICENSEE and any rights or obligations assigned by LICENSEE without the prior written consent of BROAD shall be null and void.
*
* 8. MISCELLANEOUS
* 8.1 Export Control. LICENSEE gives assurance that it will comply with all United States export control laws and regulations controlling the export of the PROGRAM, including, without limitation, all Export Administration Regulations of the United States Department of Commerce. Among other things, these laws and regulations prohibit, or require a license for, the export of certain types of software to specified countries.
* 8.2 Termination. LICENSEE shall have the right to terminate this Agreement for any reason upon prior written notice to BROAD. If LICENSEE breaches any provision hereunder, and fails to cure such breach within thirty (30) days, BROAD may terminate this Agreement immediately. Upon termination, LICENSEE shall provide BROAD with written assurance that the original and all copies of the PROGRAM have been destroyed, except that, upon prior written authorization from BROAD, LICENSEE may retain a copy for archive purposes.
* 8.3 Survival. The following provisions shall survive the expiration or termination of this Agreement: Articles 1, 3, 4, 5 and Sections 2.2, 2.3, 7.3, and 7.4.
* 8.4 Notice. Any notices under this Agreement shall be in writing, shall specifically refer to this Agreement, and shall be sent by hand, recognized national overnight courier, confirmed facsimile transmission, confirmed electronic mail, or registered or certified mail, postage prepaid, return receipt requested. All notices under this Agreement shall be deemed effective upon receipt.
* 8.5 Amendment and Waiver; Entire Agreement. This Agreement may be amended, supplemented, or otherwise modified only by means of a written instrument signed by all parties. Any waiver of any rights or failure to act in a specific instance shall relate only to such instance and shall not be construed as an agreement to waive any rights or fail to act in any other instance, whether or not similar. This Agreement constitutes the entire agreement among the parties with respect to its subject matter and supersedes prior agreements or understandings between the parties relating to its subject matter.
* 8.6 Binding Effect; Headings. This Agreement shall be binding upon and inure to the benefit of the parties and their respective permitted successors and assigns. All headings are for convenience only and shall not affect the meaning of any provision of this Agreement.
* 8.7 Governing Law. This Agreement shall be construed, governed, interpreted and applied in accordance with the internal laws of the Commonwealth of Massachusetts, U.S.A., without regard to conflict of laws principles.
*/
package org.broadinstitute.gatk.tools.walkers.annotator;
import htsjdk.variant.variantcontext.Allele;
import htsjdk.variant.variantcontext.Genotype;
import htsjdk.variant.variantcontext.GenotypesContext;
import htsjdk.variant.variantcontext.VariantContext;
import htsjdk.variant.vcf.VCFHeaderLine;
import htsjdk.variant.vcf.VCFInfoHeaderLine;
import org.apache.log4j.Logger;
import org.broadinstitute.gatk.engine.GenomeAnalysisEngine;
import org.broadinstitute.gatk.tools.walkers.annotator.interfaces.AnnotatorCompatible;
import org.broadinstitute.gatk.utils.genotyper.PerReadAlleleLikelihoodMap;
import org.broadinstitute.gatk.utils.variant.GATKVCFHeaderLines;
import java.util.*;
/**
* Allele-specific implementation of root-mean-squared annotations
*/
public abstract class AS_RMSAnnotation extends RMSAnnotation {
protected final static Logger logger = Logger.getLogger(AS_RMSAnnotation.class);
protected final String splitDelim = "\\|"; //String.split takes a regex, so we need to escape the pipe
protected final String printDelim = "|";
protected AnnotatorCompatible callingWalker;
@Override
public void initialize(final AnnotatorCompatible walker, final GenomeAnalysisEngine toolkit, final Set<VCFHeaderLine> headerLines) {
if (!AnnotationUtils.walkerSupportsAlleleSpecificAnnotations(walker))
logger.warn("Allele-specific annotations can only be used with HaplotypeCaller, CombineGVCFs and GenotypeGVCFs -- no data will be output");
callingWalker = walker;
}
public List<VCFInfoHeaderLine> getDescriptions() {
if (AnnotationUtils.walkerRequiresRawData(callingWalker))
return Arrays.asList(GATKVCFHeaderLines.getInfoLine(getRawKeyName()));
else
return Arrays.asList(GATKVCFHeaderLines.getInfoLine(getKeyNames().get(0)));
}
//For the raw data here, we're only keeping track of the sum of the squares of our values
//When we go to reduce, we'll use the AD info to get the number of reads
public void calculateRawData(final VariantContext vc,
final Map<String, PerReadAlleleLikelihoodMap> perReadAlleleLikelihoodMap,
final ReducibleAnnotationData myData) {
//must use perReadAlleleLikelihoodMap for allele-specific annotations
if (perReadAlleleLikelihoodMap != null) {
if ( perReadAlleleLikelihoodMap.size() == 0 )
return;
getRMSDataFromPRALM(perReadAlleleLikelihoodMap, myData);
}
else
return;
}
abstract void getRMSDataFromPRALM(final Map<String, PerReadAlleleLikelihoodMap> perReadAlleleLikelihoodMap, final ReducibleAnnotationData<Number> myData);
@Override
public Map<String, Object> finalizeRawData(final VariantContext vc, final VariantContext originalVC) {
if (!vc.hasAttribute(getRawKeyName()))
return new HashMap<>();
final String rawMQdata = vc.getAttributeAsString(getRawKeyName(),null);
if (rawMQdata == null)
return new HashMap<>();
final Map<String,Object> annotations = new HashMap<>();
final ReducibleAnnotationData myData = new AlleleSpecificAnnotationData<Double>(originalVC.getAlleles(), rawMQdata);
parseRawDataString(myData);
final String annotationString = makeFinalizedAnnotationString(vc, myData.getAttributeMap());
annotations.put(getKeyNames().get(0), annotationString);
return annotations;
}
@Override
protected void parseRawDataString(final ReducibleAnnotationData<Number> myData) {
final String rawDataString = myData.getRawData();
//get per-allele data by splitting on allele delimiter
final String[] rawDataPerAllele = rawDataString.split(splitDelim);
for (int i=0; i<rawDataPerAllele.length; i++) {
final String alleleData = rawDataPerAllele[i];
myData.putAttribute(myData.getAlleles().get(i), Double.parseDouble(alleleData));
}
}
@Override
public Map<String, Object> combineRawData(final List<Allele> vcAlleles, final List<? extends ReducibleAnnotationData> annotationList) {
//VC already contains merged alleles from ReferenceConfidenceVariantContextMerger
ReducibleAnnotationData combinedData = new AlleleSpecificAnnotationData(vcAlleles, null);
for (final ReducibleAnnotationData currentValue : annotationList) {
parseRawDataString(currentValue);
combineAttributeMap(currentValue, combinedData);
}
final Map<String, Object> annotations = new HashMap<>();
String annotationString = makeRawAnnotationString(vcAlleles, combinedData.getAttributeMap());
annotations.put(getRawKeyName(), annotationString);
return annotations;
}
@Override
public void combineAttributeMap(final ReducibleAnnotationData<Number> toAdd, final ReducibleAnnotationData<Number> combined) {
//check that alleles match
for (final Allele currentAllele : combined.getAlleles()){
//combined is initialized with all alleles, but toAdd might have only a subset
if(toAdd.getAttribute(currentAllele) == null)
continue;
if (toAdd.getAttribute(currentAllele) != null && combined.getAttribute(currentAllele) != null) {
combined.putAttribute(currentAllele, (double) combined.getAttribute(currentAllele) + (double) toAdd.getAttribute(currentAllele));
}
else
combined.putAttribute(currentAllele, toAdd.getAttribute(currentAllele));
}
}
protected Map<Allele, Integer> getADcounts(final VariantContext vc) {
final GenotypesContext genotypes = vc.getGenotypes();
if ( genotypes == null || genotypes.size() == 0 ) {
logger.warn("VC does not have genotypes -- annotations were calculated in wrong order");
return null;
}
final Map<Allele, Integer> variantADs = new HashMap<>();
for(final Allele a : vc.getAlleles())
variantADs.put(a,0);
for (final Genotype gt : vc.getGenotypes()) {
if(!gt.hasAD()) {
continue;
}
final int[] ADs = gt.getAD();
for(int i = 1; i < vc.getNAlleles(); i++) {
variantADs.put(vc.getAlternateAllele(i-1), variantADs.get(vc.getAlternateAllele(i-1))+ADs[i]); //here -1 is to reconcile allele index with alt allele index
}
}
return variantADs;
}
}

View File

@ -0,0 +1,147 @@
/*
* By downloading the PROGRAM you agree to the following terms of use:
*
* BROAD INSTITUTE
* SOFTWARE LICENSE AGREEMENT
* FOR ACADEMIC NON-COMMERCIAL RESEARCH PURPOSES ONLY
*
* This Agreement is made between the Broad Institute, Inc. with a principal address at 415 Main Street, Cambridge, MA 02142 (BROAD) and the LICENSEE and is effective at the date the downloading is completed (EFFECTIVE DATE).
*
* WHEREAS, LICENSEE desires to license the PROGRAM, as defined hereinafter, and BROAD wishes to have this PROGRAM utilized in the public interest, subject only to the royalty-free, nonexclusive, nontransferable license rights of the United States Government pursuant to 48 CFR 52.227-14; and
* WHEREAS, LICENSEE desires to license the PROGRAM and BROAD desires to grant a license on the following terms and conditions.
* NOW, THEREFORE, in consideration of the promises and covenants made herein, the parties hereto agree as follows:
*
* 1. DEFINITIONS
* 1.1 PROGRAM shall mean copyright in the object code and source code known as GATK3 and related documentation, if any, as they exist on the EFFECTIVE DATE and can be downloaded from http://www.broadinstitute.org/gatk on the EFFECTIVE DATE.
*
* 2. LICENSE
* 2.1 Grant. Subject to the terms of this Agreement, BROAD hereby grants to LICENSEE, solely for academic non-commercial research purposes, a non-exclusive, non-transferable license to: (a) download, execute and display the PROGRAM and (b) create bug fixes and modify the PROGRAM. LICENSEE hereby automatically grants to BROAD a non-exclusive, royalty-free, irrevocable license to any LICENSEE bug fixes or modifications to the PROGRAM with unlimited rights to sublicense and/or distribute. LICENSEE agrees to provide any such modifications and bug fixes to BROAD promptly upon their creation.
* The LICENSEE may apply the PROGRAM in a pipeline to data owned by users other than the LICENSEE and provide these users the results of the PROGRAM provided LICENSEE does so for academic non-commercial purposes only. For clarification purposes, academic sponsored research is not a commercial use under the terms of this Agreement.
* 2.2 No Sublicensing or Additional Rights. LICENSEE shall not sublicense or distribute the PROGRAM, in whole or in part, without prior written permission from BROAD. LICENSEE shall ensure that all of its users agree to the terms of this Agreement. LICENSEE further agrees that it shall not put the PROGRAM on a network, server, or other similar technology that may be accessed by anyone other than the LICENSEE and its employees and users who have agreed to the terms of this agreement.
* 2.3 License Limitations. Nothing in this Agreement shall be construed to confer any rights upon LICENSEE by implication, estoppel, or otherwise to any computer software, trademark, intellectual property, or patent rights of BROAD, or of any other entity, except as expressly granted herein. LICENSEE agrees that the PROGRAM, in whole or part, shall not be used for any commercial purpose, including without limitation, as the basis of a commercial software or hardware product or to provide services. LICENSEE further agrees that the PROGRAM shall not be copied or otherwise adapted in order to circumvent the need for obtaining a license for use of the PROGRAM.
*
* 3. PHONE-HOME FEATURE
* LICENSEE expressly acknowledges that the PROGRAM contains an embedded automatic reporting system (PHONE-HOME) which is enabled by default upon download. Unless LICENSEE requests disablement of PHONE-HOME, LICENSEE agrees that BROAD may collect limited information transmitted by PHONE-HOME regarding LICENSEE and its use of the PROGRAM. Such information shall include LICENSEES user identification, version number of the PROGRAM and tools being run, mode of analysis employed, and any error reports generated during run-time. Collection of such information is used by BROAD solely to monitor usage rates, fulfill reporting requirements to BROAD funding agencies, drive improvements to the PROGRAM, and facilitate adjustments to PROGRAM-related documentation.
*
* 4. OWNERSHIP OF INTELLECTUAL PROPERTY
* LICENSEE acknowledges that title to the PROGRAM shall remain with BROAD. The PROGRAM is marked with the following BROAD copyright notice and notice of attribution to contributors. LICENSEE shall retain such notice on all copies. LICENSEE agrees to include appropriate attribution if any results obtained from use of the PROGRAM are included in any publication.
* Copyright 2012-2015 Broad Institute, Inc.
* Notice of attribution: The GATK3 program was made available through the generosity of Medical and Population Genetics program at the Broad Institute, Inc.
* LICENSEE shall not use any trademark or trade name of BROAD, or any variation, adaptation, or abbreviation, of such marks or trade names, or any names of officers, faculty, students, employees, or agents of BROAD except as states above for attribution purposes.
*
* 5. INDEMNIFICATION
* LICENSEE shall indemnify, defend, and hold harmless BROAD, and their respective officers, faculty, students, employees, associated investigators and agents, and their respective successors, heirs and assigns, (Indemnitees), against any liability, damage, loss, or expense (including reasonable attorneys fees and expenses) incurred by or imposed upon any of the Indemnitees in connection with any claims, suits, actions, demands or judgments arising out of any theory of liability (including, without limitation, actions in the form of tort, warranty, or strict liability and regardless of whether such action has any factual basis) pursuant to any right or license granted under this Agreement.
*
* 6. NO REPRESENTATIONS OR WARRANTIES
* THE PROGRAM IS DELIVERED AS IS. BROAD MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND CONCERNING THE PROGRAM OR THE COPYRIGHT, EXPRESS OR IMPLIED, INCLUDING, WITHOUT LIMITATION, WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, NONINFRINGEMENT, OR THE ABSENCE OF LATENT OR OTHER DEFECTS, WHETHER OR NOT DISCOVERABLE. BROAD EXTENDS NO WARRANTIES OF ANY KIND AS TO PROGRAM CONFORMITY WITH WHATEVER USER MANUALS OR OTHER LITERATURE MAY BE ISSUED FROM TIME TO TIME.
* IN NO EVENT SHALL BROAD OR ITS RESPECTIVE DIRECTORS, OFFICERS, EMPLOYEES, AFFILIATED INVESTIGATORS AND AFFILIATES BE LIABLE FOR INCIDENTAL OR CONSEQUENTIAL DAMAGES OF ANY KIND, INCLUDING, WITHOUT LIMITATION, ECONOMIC DAMAGES OR INJURY TO PROPERTY AND LOST PROFITS, REGARDLESS OF WHETHER BROAD SHALL BE ADVISED, SHALL HAVE OTHER REASON TO KNOW, OR IN FACT SHALL KNOW OF THE POSSIBILITY OF THE FOREGOING.
*
* 7. ASSIGNMENT
* This Agreement is personal to LICENSEE and any rights or obligations assigned by LICENSEE without the prior written consent of BROAD shall be null and void.
*
* 8. MISCELLANEOUS
* 8.1 Export Control. LICENSEE gives assurance that it will comply with all United States export control laws and regulations controlling the export of the PROGRAM, including, without limitation, all Export Administration Regulations of the United States Department of Commerce. Among other things, these laws and regulations prohibit, or require a license for, the export of certain types of software to specified countries.
* 8.2 Termination. LICENSEE shall have the right to terminate this Agreement for any reason upon prior written notice to BROAD. If LICENSEE breaches any provision hereunder, and fails to cure such breach within thirty (30) days, BROAD may terminate this Agreement immediately. Upon termination, LICENSEE shall provide BROAD with written assurance that the original and all copies of the PROGRAM have been destroyed, except that, upon prior written authorization from BROAD, LICENSEE may retain a copy for archive purposes.
* 8.3 Survival. The following provisions shall survive the expiration or termination of this Agreement: Articles 1, 3, 4, 5 and Sections 2.2, 2.3, 7.3, and 7.4.
* 8.4 Notice. Any notices under this Agreement shall be in writing, shall specifically refer to this Agreement, and shall be sent by hand, recognized national overnight courier, confirmed facsimile transmission, confirmed electronic mail, or registered or certified mail, postage prepaid, return receipt requested. All notices under this Agreement shall be deemed effective upon receipt.
* 8.5 Amendment and Waiver; Entire Agreement. This Agreement may be amended, supplemented, or otherwise modified only by means of a written instrument signed by all parties. Any waiver of any rights or failure to act in a specific instance shall relate only to such instance and shall not be construed as an agreement to waive any rights or fail to act in any other instance, whether or not similar. This Agreement constitutes the entire agreement among the parties with respect to its subject matter and supersedes prior agreements or understandings between the parties relating to its subject matter.
* 8.6 Binding Effect; Headings. This Agreement shall be binding upon and inure to the benefit of the parties and their respective permitted successors and assigns. All headings are for convenience only and shall not affect the meaning of any provision of this Agreement.
* 8.7 Governing Law. This Agreement shall be construed, governed, interpreted and applied in accordance with the internal laws of the Commonwealth of Massachusetts, U.S.A., without regard to conflict of laws principles.
*/
package org.broadinstitute.gatk.tools.walkers.annotator;
import htsjdk.variant.variantcontext.Allele;
import htsjdk.variant.variantcontext.VariantContext;
import org.broadinstitute.gatk.tools.walkers.annotator.interfaces.*;
import org.broadinstitute.gatk.utils.QualityUtils;
import org.broadinstitute.gatk.utils.contexts.AlignmentContext;
import org.broadinstitute.gatk.utils.genotyper.PerReadAlleleLikelihoodMap;
import org.broadinstitute.gatk.utils.sam.GATKSAMRecord;
import org.broadinstitute.gatk.utils.variant.GATKVCFConstants;
import java.util.*;
/**
* Allele-specific Root Mean Square of the mapping quality of reads across all samples.
*
* <p>This annotation provides an estimation of the mapping quality of reads supporting each alternate allele in a variant call. Depending on the tool it is called from, it produces either raw data (sum of squared MQs) or the calculated root mean square.</p>
*
* The raw data is used to accurately calculate the root mean square when combining more than one sample.
*
* <h3>Statistical notes</h3>
* <p>The root mean square is equivalent to the mean of the mapping qualities plus the standard deviation of the mapping qualities.</p>
*
* <h3>Related annotations</h3>
* <ul>
* <li><b><a href="https://www.broadinstitute.org/gatk/guide/tooldocs/org_broadinstitute_gatk_tools_walkers_annotator_MappingQualityRankSumTest.php">MappingQualityRankSumTest</a></b> compares the mapping quality of reads supporting the REF and ALT alleles.</li>
* </ul>
*
* <h3>Caveat</h3>
* <p>Uninformative reads are not used in these annotations.</p>
*
*/
public class AS_RMSMappingQuality extends AS_RMSAnnotation implements AS_StandardAnnotation, ActiveRegionBasedAnnotation {
protected final String printFormat = "%.2f";
public List<String> getKeyNames() { return Arrays.asList(GATKVCFConstants.AS_RMS_MAPPING_QUALITY_KEY); }
public String getRawKeyName() { return GATKVCFConstants.AS_RAW_RMS_MAPPING_QUALITY_KEY; }
public void getRMSDataFromPRALM(Map<String, PerReadAlleleLikelihoodMap> perReadAlleleLikelihoodMap, ReducibleAnnotationData<Number> myData) {
//over all the samples in the Map...
for ( final PerReadAlleleLikelihoodMap perReadLikelihoods : perReadAlleleLikelihoodMap.values() ) {
//for each read...
for ( final Map.Entry<GATKSAMRecord,Map<Allele,Double>> readLikelihoods : perReadLikelihoods.getLikelihoodReadMap().entrySet() ) {
final int mq = readLikelihoods.getKey().getMappingQuality();
if ( mq != QualityUtils.MAPPING_QUALITY_UNAVAILABLE ) {
if (!PerReadAlleleLikelihoodMap.getMostLikelyAllele(readLikelihoods.getValue()).isInformative())
continue;
final Allele bestAllele =PerReadAlleleLikelihoodMap.getMostLikelyAllele(readLikelihoods.getValue()).getMostLikelyAllele();
double currSquareSum = 0;
if (myData.hasAttribute(bestAllele))
currSquareSum += (double)myData.getAttribute(bestAllele);
myData.putAttribute(bestAllele, currSquareSum + mq * mq);
}
}
}
}
@Override
public String makeRawAnnotationString(final List<Allele> vcAlleles, final Map<Allele, Number> perAlleleValues) {
String annotationString = "";
for (final Allele current : vcAlleles) {
if (!annotationString.isEmpty())
annotationString += printDelim;
if(perAlleleValues.get(current) != null)
annotationString += String.format(printFormat,perAlleleValues.get(current));
else
annotationString += String.format(printFormat, 0.0);
}
return annotationString;
}
//this just overrides the RMSAnnotation function that's used for UG -- we don't do allele-specific annotations for UG
@Override
public String makeFinalizedAnnotationString(final VariantContext vc, final Map<Allele, Number> perAlleleData, final Map<String, AlignmentContext> stratifiedContexts, final Map<String, PerReadAlleleLikelihoodMap> perReadAlleleLikelihoodMap) {
return makeFinalizedAnnotationString(vc, perAlleleData);
}
@Override
public String makeFinalizedAnnotationString(final VariantContext vc, final Map<Allele, Number> perAlleleValues) {
final Map<Allele, Integer> variantADs = getADcounts(vc);
String annotationString = "";
for (final Allele current : vc.getAlternateAlleles()) {
if (!annotationString.isEmpty())
annotationString += ",";
if (perAlleleValues.containsKey(current))
annotationString += String.format(printFormat, Math.sqrt((double) perAlleleValues.get(current) / variantADs.get(current)));
else {
logger.warn("ERROR: VC allele is not found in annotation alleles -- maybe there was trimming?");
}
}
return annotationString;
}
}

View File

@ -0,0 +1,329 @@
/*
* By downloading the PROGRAM you agree to the following terms of use:
*
* BROAD INSTITUTE
* SOFTWARE LICENSE AGREEMENT
* FOR ACADEMIC NON-COMMERCIAL RESEARCH PURPOSES ONLY
*
* This Agreement is made between the Broad Institute, Inc. with a principal address at 415 Main Street, Cambridge, MA 02142 (BROAD) and the LICENSEE and is effective at the date the downloading is completed (EFFECTIVE DATE).
*
* WHEREAS, LICENSEE desires to license the PROGRAM, as defined hereinafter, and BROAD wishes to have this PROGRAM utilized in the public interest, subject only to the royalty-free, nonexclusive, nontransferable license rights of the United States Government pursuant to 48 CFR 52.227-14; and
* WHEREAS, LICENSEE desires to license the PROGRAM and BROAD desires to grant a license on the following terms and conditions.
* NOW, THEREFORE, in consideration of the promises and covenants made herein, the parties hereto agree as follows:
*
* 1. DEFINITIONS
* 1.1 PROGRAM shall mean copyright in the object code and source code known as GATK3 and related documentation, if any, as they exist on the EFFECTIVE DATE and can be downloaded from http://www.broadinstitute.org/gatk on the EFFECTIVE DATE.
*
* 2. LICENSE
* 2.1 Grant. Subject to the terms of this Agreement, BROAD hereby grants to LICENSEE, solely for academic non-commercial research purposes, a non-exclusive, non-transferable license to: (a) download, execute and display the PROGRAM and (b) create bug fixes and modify the PROGRAM. LICENSEE hereby automatically grants to BROAD a non-exclusive, royalty-free, irrevocable license to any LICENSEE bug fixes or modifications to the PROGRAM with unlimited rights to sublicense and/or distribute. LICENSEE agrees to provide any such modifications and bug fixes to BROAD promptly upon their creation.
* The LICENSEE may apply the PROGRAM in a pipeline to data owned by users other than the LICENSEE and provide these users the results of the PROGRAM provided LICENSEE does so for academic non-commercial purposes only. For clarification purposes, academic sponsored research is not a commercial use under the terms of this Agreement.
* 2.2 No Sublicensing or Additional Rights. LICENSEE shall not sublicense or distribute the PROGRAM, in whole or in part, without prior written permission from BROAD. LICENSEE shall ensure that all of its users agree to the terms of this Agreement. LICENSEE further agrees that it shall not put the PROGRAM on a network, server, or other similar technology that may be accessed by anyone other than the LICENSEE and its employees and users who have agreed to the terms of this agreement.
* 2.3 License Limitations. Nothing in this Agreement shall be construed to confer any rights upon LICENSEE by implication, estoppel, or otherwise to any computer software, trademark, intellectual property, or patent rights of BROAD, or of any other entity, except as expressly granted herein. LICENSEE agrees that the PROGRAM, in whole or part, shall not be used for any commercial purpose, including without limitation, as the basis of a commercial software or hardware product or to provide services. LICENSEE further agrees that the PROGRAM shall not be copied or otherwise adapted in order to circumvent the need for obtaining a license for use of the PROGRAM.
*
* 3. PHONE-HOME FEATURE
* LICENSEE expressly acknowledges that the PROGRAM contains an embedded automatic reporting system (PHONE-HOME) which is enabled by default upon download. Unless LICENSEE requests disablement of PHONE-HOME, LICENSEE agrees that BROAD may collect limited information transmitted by PHONE-HOME regarding LICENSEE and its use of the PROGRAM. Such information shall include LICENSEES user identification, version number of the PROGRAM and tools being run, mode of analysis employed, and any error reports generated during run-time. Collection of such information is used by BROAD solely to monitor usage rates, fulfill reporting requirements to BROAD funding agencies, drive improvements to the PROGRAM, and facilitate adjustments to PROGRAM-related documentation.
*
* 4. OWNERSHIP OF INTELLECTUAL PROPERTY
* LICENSEE acknowledges that title to the PROGRAM shall remain with BROAD. The PROGRAM is marked with the following BROAD copyright notice and notice of attribution to contributors. LICENSEE shall retain such notice on all copies. LICENSEE agrees to include appropriate attribution if any results obtained from use of the PROGRAM are included in any publication.
* Copyright 2012-2015 Broad Institute, Inc.
* Notice of attribution: The GATK3 program was made available through the generosity of Medical and Population Genetics program at the Broad Institute, Inc.
* LICENSEE shall not use any trademark or trade name of BROAD, or any variation, adaptation, or abbreviation, of such marks or trade names, or any names of officers, faculty, students, employees, or agents of BROAD except as states above for attribution purposes.
*
* 5. INDEMNIFICATION
* LICENSEE shall indemnify, defend, and hold harmless BROAD, and their respective officers, faculty, students, employees, associated investigators and agents, and their respective successors, heirs and assigns, (Indemnitees), against any liability, damage, loss, or expense (including reasonable attorneys fees and expenses) incurred by or imposed upon any of the Indemnitees in connection with any claims, suits, actions, demands or judgments arising out of any theory of liability (including, without limitation, actions in the form of tort, warranty, or strict liability and regardless of whether such action has any factual basis) pursuant to any right or license granted under this Agreement.
*
* 6. NO REPRESENTATIONS OR WARRANTIES
* THE PROGRAM IS DELIVERED AS IS. BROAD MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND CONCERNING THE PROGRAM OR THE COPYRIGHT, EXPRESS OR IMPLIED, INCLUDING, WITHOUT LIMITATION, WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, NONINFRINGEMENT, OR THE ABSENCE OF LATENT OR OTHER DEFECTS, WHETHER OR NOT DISCOVERABLE. BROAD EXTENDS NO WARRANTIES OF ANY KIND AS TO PROGRAM CONFORMITY WITH WHATEVER USER MANUALS OR OTHER LITERATURE MAY BE ISSUED FROM TIME TO TIME.
* IN NO EVENT SHALL BROAD OR ITS RESPECTIVE DIRECTORS, OFFICERS, EMPLOYEES, AFFILIATED INVESTIGATORS AND AFFILIATES BE LIABLE FOR INCIDENTAL OR CONSEQUENTIAL DAMAGES OF ANY KIND, INCLUDING, WITHOUT LIMITATION, ECONOMIC DAMAGES OR INJURY TO PROPERTY AND LOST PROFITS, REGARDLESS OF WHETHER BROAD SHALL BE ADVISED, SHALL HAVE OTHER REASON TO KNOW, OR IN FACT SHALL KNOW OF THE POSSIBILITY OF THE FOREGOING.
*
* 7. ASSIGNMENT
* This Agreement is personal to LICENSEE and any rights or obligations assigned by LICENSEE without the prior written consent of BROAD shall be null and void.
*
* 8. MISCELLANEOUS
* 8.1 Export Control. LICENSEE gives assurance that it will comply with all United States export control laws and regulations controlling the export of the PROGRAM, including, without limitation, all Export Administration Regulations of the United States Department of Commerce. Among other things, these laws and regulations prohibit, or require a license for, the export of certain types of software to specified countries.
* 8.2 Termination. LICENSEE shall have the right to terminate this Agreement for any reason upon prior written notice to BROAD. If LICENSEE breaches any provision hereunder, and fails to cure such breach within thirty (30) days, BROAD may terminate this Agreement immediately. Upon termination, LICENSEE shall provide BROAD with written assurance that the original and all copies of the PROGRAM have been destroyed, except that, upon prior written authorization from BROAD, LICENSEE may retain a copy for archive purposes.
* 8.3 Survival. The following provisions shall survive the expiration or termination of this Agreement: Articles 1, 3, 4, 5 and Sections 2.2, 2.3, 7.3, and 7.4.
* 8.4 Notice. Any notices under this Agreement shall be in writing, shall specifically refer to this Agreement, and shall be sent by hand, recognized national overnight courier, confirmed facsimile transmission, confirmed electronic mail, or registered or certified mail, postage prepaid, return receipt requested. All notices under this Agreement shall be deemed effective upon receipt.
* 8.5 Amendment and Waiver; Entire Agreement. This Agreement may be amended, supplemented, or otherwise modified only by means of a written instrument signed by all parties. Any waiver of any rights or failure to act in a specific instance shall relate only to such instance and shall not be construed as an agreement to waive any rights or fail to act in any other instance, whether or not similar. This Agreement constitutes the entire agreement among the parties with respect to its subject matter and supersedes prior agreements or understandings between the parties relating to its subject matter.
* 8.6 Binding Effect; Headings. This Agreement shall be binding upon and inure to the benefit of the parties and their respective permitted successors and assigns. All headings are for convenience only and shall not affect the meaning of any provision of this Agreement.
* 8.7 Governing Law. This Agreement shall be construed, governed, interpreted and applied in accordance with the internal laws of the Commonwealth of Massachusetts, U.S.A., without regard to conflict of laws principles.
*/
package org.broadinstitute.gatk.tools.walkers.annotator;
import htsjdk.variant.variantcontext.Allele;
import htsjdk.variant.variantcontext.VariantContext;
import htsjdk.variant.vcf.VCFHeaderLine;
import htsjdk.variant.vcf.VCFInfoHeaderLine;
import org.apache.log4j.Logger;
import org.broadinstitute.gatk.engine.GenomeAnalysisEngine;
import org.broadinstitute.gatk.tools.walkers.annotator.interfaces.AnnotatorCompatible;
import org.broadinstitute.gatk.tools.walkers.annotator.interfaces.ReducibleAnnotation;
import org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCaller;
import org.broadinstitute.gatk.tools.walkers.variantutils.CombineGVCFs;
import org.broadinstitute.gatk.tools.walkers.variantutils.GenotypeGVCFs;
import org.broadinstitute.gatk.utils.MannWhitneyU;
import org.broadinstitute.gatk.utils.collections.Pair;
import org.broadinstitute.gatk.utils.contexts.AlignmentContext;
import org.broadinstitute.gatk.utils.contexts.ReferenceContext;
import org.broadinstitute.gatk.utils.exceptions.GATKException;
import org.broadinstitute.gatk.utils.genotyper.MostLikelyAllele;
import org.broadinstitute.gatk.utils.genotyper.PerReadAlleleLikelihoodMap;
import org.broadinstitute.gatk.utils.refdata.RefMetaDataTracker;
import org.broadinstitute.gatk.utils.sam.GATKSAMRecord;
import org.broadinstitute.gatk.utils.variant.GATKVCFHeaderLines;
import java.util.*;
/**
* Allele-specific implementation of rank sum test annotations
*/
public abstract class AS_RankSumTest extends RankSumTest implements ReducibleAnnotation {
private final static Logger logger = Logger.getLogger(AS_RMSAnnotation.class);
protected final String splitDelim = "\\|"; //String.split takes a regex, so we need to escape the pipe
protected final String printDelim = "|";
protected final String reducedDelim = ",";
protected AnnotatorCompatible callingWalker;
@Override
public void initialize(final AnnotatorCompatible walker, final GenomeAnalysisEngine toolkit, final Set<VCFHeaderLine> headerLines) {
if (!AnnotationUtils.walkerSupportsAlleleSpecificAnnotations(walker))
logger.warn("Allele-specific annotations can only be used with HaplotypeCaller, CombineGVCFs and GenotypeGVCFs -- no data will be output");
callingWalker = walker;
super.initialize(walker, toolkit, headerLines);
}
public List<VCFInfoHeaderLine> getDescriptions() {
if (AnnotationUtils.walkerRequiresRawData(callingWalker))
return Arrays.asList(GATKVCFHeaderLines.getInfoLine(getRawKeyName()));
else
return Arrays.asList(GATKVCFHeaderLines.getInfoLine(getKeyNames().get(0)));
}
public Map<String, Object> annotateRawData(final RefMetaDataTracker tracker,
final AnnotatorCompatible walker,
final ReferenceContext ref,
final Map<String, AlignmentContext> stratifiedContexts,
final VariantContext vc,
final Map<String, PerReadAlleleLikelihoodMap> perReadAlleleLikelihoodMap ) {
if ( perReadAlleleLikelihoodMap == null)
return new HashMap<>();
final Map<String, Object> annotations = new HashMap<>();
final AlleleSpecificAnnotationData<CompressedDataList<Integer>> myData = initializeNewAnnotationData(vc.getAlleles());
calculateRawData(vc, perReadAlleleLikelihoodMap, myData);
final String annotationString = makeRawAnnotationString(vc.getAlleles(), myData.getAttributeMap());
annotations.put(getRawKeyName(), annotationString);
return annotations;
}
protected void parseRawDataString(final ReducibleAnnotationData<CompressedDataList<Integer>> myData) {
final String rawDataString = myData.getRawData();
String rawDataNoBrackets;
final Map<Allele, CompressedDataList<Integer>> perAlleleValues = new HashMap<>();
//Initialize maps
for (final Allele current : myData.getAlleles()) {
perAlleleValues.put(current, new CompressedDataList<Integer>());
}
//Map gives back list with []
if (rawDataString.charAt(0) == '[') {
rawDataNoBrackets = rawDataString.substring(1, rawDataString.length() - 1);
}
else {
rawDataNoBrackets = rawDataString;
}
//rawDataPerAllele is the list of values for each allele (each of variable length)
final String[] rawDataPerAllele = rawDataNoBrackets.split(splitDelim);
for (int i=0; i<rawDataPerAllele.length; i++) {
final String alleleData = rawDataPerAllele[i];
if (alleleData.isEmpty())
continue;
final CompressedDataList<Integer> alleleList = perAlleleValues.get(myData.getAlleles().get(i));
final String[] rawListEntriesAsStringVector = alleleData.split(",");
if (rawListEntriesAsStringVector.length %2 != 0)
throw new GATKException("ERROR: rank sum test raw annotation data must occur in <value,count> pairs");
for (int j=0; j<rawListEntriesAsStringVector.length; j+=2) {
int value, count;
if (!rawListEntriesAsStringVector[j].isEmpty()) {
value = Integer.parseInt(rawListEntriesAsStringVector[j].trim());
if (!rawListEntriesAsStringVector[j + 1].isEmpty()) {
count = Integer.parseInt(rawListEntriesAsStringVector[j + 1].trim());
alleleList.add(value,count);
}
}
}
}
myData.setAttributeMap(perAlleleValues);
//check the alleles list
boolean foundRef = false;
for (final Allele a : myData.getAlleles()) {
if (a.isReference()) {
if (foundRef)
throw new GATKException("ERROR: multiple reference alleles found in annotation data\n");
foundRef = true;
}
}
if (!foundRef)
throw new GATKException("ERROR: no reference alleles found in annotation data\n");
}
@Override
public Map<String, Object> combineRawData(final List<Allele> vcAlleles, final List<? extends ReducibleAnnotationData> annotationList) {
//VC already contains merged alleles from ReferenceConfidenceVariantContextMerger
final ReducibleAnnotationData combinedData = initializeNewAnnotationData(vcAlleles);
for (final ReducibleAnnotationData currentValue : annotationList) {
parseRawDataString(currentValue);
combineAttributeMap(currentValue, combinedData);
}
final Map<String, Object> annotations = new HashMap<>();
final String annotationString = makeRawAnnotationString(vcAlleles, combinedData.getAttributeMap());
annotations.put(getRawKeyName(), annotationString);
return annotations;
}
protected AlleleSpecificAnnotationData initializeNewAnnotationData(final List<Allele> vcAlleles) {
Map<Allele, CompressedDataList<Integer>> perAlleleValues = new HashMap<>();
for (Allele a : vcAlleles) {
perAlleleValues.put(a, new CompressedDataList<Integer>());
}
AlleleSpecificAnnotationData ret = new AlleleSpecificAnnotationData(vcAlleles, perAlleleValues.toString());
ret.setAttributeMap(perAlleleValues);
return ret;
}
protected void combineAttributeMap(final ReducibleAnnotationData<CompressedDataList<Integer>> toAdd, final ReducibleAnnotationData<CompressedDataList<Integer>> combined) {
for (final Allele a : combined.getAlleles()) {
if (toAdd.hasAttribute(a)) {
final CompressedDataList<Integer> alleleData = combined.getAttribute(a);
alleleData.add(toAdd.getAttribute(a));
combined.putAttribute(a, alleleData);
}
}
}
protected String makeRawAnnotationString(final List<Allele> vcAlleles, final Map<Allele, CompressedDataList<Integer>> perAlleleValues) {
String annotationString = "";
for (int i =0; i< vcAlleles.size(); i++) {
if (i!=0)
annotationString += printDelim;
CompressedDataList<Integer> alleleValues = perAlleleValues.get(vcAlleles.get(i));
annotationString += alleleValues.toString();
}
return annotationString;
}
protected String makeReducedAnnotationString(VariantContext vc, Map<Allele,Double> perAltRankSumResults) {
String annotationString = "";
for (final Allele a : vc.getAlternateAlleles()) {
if (!annotationString.isEmpty())
annotationString += reducedDelim;
if (!perAltRankSumResults.containsKey(a))
logger.warn("ERROR: VC allele not found in annotation alleles -- maybe there was trimming?");
else
annotationString += String.format("%.3f", perAltRankSumResults.get(a));
}
return annotationString;
}
/**
*
* @param vc -- contains the final set of alleles, possibly subset by GenotypeGVCFs
* @param originalVC -- used to get all the alleles for all gVCFs
* @return
*/
public Map<String, Object> finalizeRawData(final VariantContext vc, final VariantContext originalVC) {
if (!vc.hasAttribute(getRawKeyName()))
return new HashMap<>();
final String rawRankSumData = vc.getAttributeAsString(getRawKeyName(),null);
if (rawRankSumData == null)
return new HashMap<>();
final Map<String,Object> annotations = new HashMap<>();
final AlleleSpecificAnnotationData<CompressedDataList<Integer>> myData = new AlleleSpecificAnnotationData(originalVC.getAlleles(), rawRankSumData);
parseRawDataString(myData);
final Map<Allele, Double> perAltRankSumResults = calculateReducedData(myData.getAttributeMap(), myData.getRefAllele());
//shortcut for no ref values
if (perAltRankSumResults.isEmpty())
return annotations;
final String annotationString = makeReducedAnnotationString(vc, perAltRankSumResults);
annotations.put(getKeyNames().get(0), annotationString);
return annotations;
}
public void calculateRawData(VariantContext vc, Map<String, PerReadAlleleLikelihoodMap> pralm, ReducibleAnnotationData myData) {
if(pralm == null)
return;
final Map<Allele, CompressedDataList<Integer>> perAlleleValues = myData.getAttributeMap();
for ( final PerReadAlleleLikelihoodMap likelihoodMap : pralm.values() ) {
if ( likelihoodMap != null && !likelihoodMap.isEmpty() ) {
fillQualsFromLikelihoodMap(vc.getAlleles(), vc.getStart(), likelihoodMap, perAlleleValues);
}
}
}
private void fillQualsFromLikelihoodMap(final List<Allele> alleles,
final int refLoc,
final PerReadAlleleLikelihoodMap likelihoodMap,
final Map<Allele, CompressedDataList<Integer>> perAlleleValues) {
for ( final Map.Entry<GATKSAMRecord, Map<Allele,Double>> el : likelihoodMap.getLikelihoodReadMap().entrySet() ) {
final MostLikelyAllele a = PerReadAlleleLikelihoodMap.getMostLikelyAllele(el.getValue());
if ( ! a.isInformative() )
continue; // read is non-informative
final GATKSAMRecord read = el.getKey();
if ( isUsableRead(read, refLoc) ) {
final Double value = getElementForRead(read, refLoc, a);
if ( value == null )
continue;
if(perAlleleValues.containsKey(a.getMostLikelyAllele()))
perAlleleValues.get(a.getMostLikelyAllele()).add(value.intValue());
}
}
}
public Map<Allele, Double> calculateReducedData(final Map<Allele, CompressedDataList<Integer>> perAlleleValues, final Allele ref) {
final Map<Allele, Double> perAltRankSumResults = new HashMap<>();
//shortcut to not try to calculate rank sum if there are no reads that unambiguously support the ref
if (perAlleleValues.get(ref).isEmpty())
return perAltRankSumResults;
for (final Allele alt : perAlleleValues.keySet()) {
if (alt.equals(ref, false))
continue;
final MannWhitneyU mannWhitneyU = new MannWhitneyU(useDithering);
//load alts
for (final Number qual : perAlleleValues.get(alt)) {
mannWhitneyU.add(qual, MannWhitneyU.USet.SET1);
}
//load refs
for (final Number qual : perAlleleValues.get(ref)) {
mannWhitneyU.add(qual, MannWhitneyU.USet.SET2);
}
if (DEBUG) {
System.out.format("%s, REF QUALS:", this.getClass().getName());
for (final Number qual : perAlleleValues.get(ref))
System.out.format("%d ", qual);
System.out.println();
System.out.format("%s, ALT QUALS:", this.getClass().getName());
for (final Number qual : perAlleleValues.get(alt))
System.out.format("%d ", qual);
System.out.println();
}
// we are testing that set1 (the alt bases) have lower quality scores than set2 (the ref bases)
final Pair<Double, Double> testResults = mannWhitneyU.runOneSidedTest(MannWhitneyU.USet.SET1);
perAltRankSumResults.put(alt, testResults.first);
}
return perAltRankSumResults;
}
}

View File

@ -0,0 +1,107 @@
/*
* By downloading the PROGRAM you agree to the following terms of use:
*
* BROAD INSTITUTE
* SOFTWARE LICENSE AGREEMENT
* FOR ACADEMIC NON-COMMERCIAL RESEARCH PURPOSES ONLY
*
* This Agreement is made between the Broad Institute, Inc. with a principal address at 415 Main Street, Cambridge, MA 02142 (BROAD) and the LICENSEE and is effective at the date the downloading is completed (EFFECTIVE DATE).
*
* WHEREAS, LICENSEE desires to license the PROGRAM, as defined hereinafter, and BROAD wishes to have this PROGRAM utilized in the public interest, subject only to the royalty-free, nonexclusive, nontransferable license rights of the United States Government pursuant to 48 CFR 52.227-14; and
* WHEREAS, LICENSEE desires to license the PROGRAM and BROAD desires to grant a license on the following terms and conditions.
* NOW, THEREFORE, in consideration of the promises and covenants made herein, the parties hereto agree as follows:
*
* 1. DEFINITIONS
* 1.1 PROGRAM shall mean copyright in the object code and source code known as GATK3 and related documentation, if any, as they exist on the EFFECTIVE DATE and can be downloaded from http://www.broadinstitute.org/gatk on the EFFECTIVE DATE.
*
* 2. LICENSE
* 2.1 Grant. Subject to the terms of this Agreement, BROAD hereby grants to LICENSEE, solely for academic non-commercial research purposes, a non-exclusive, non-transferable license to: (a) download, execute and display the PROGRAM and (b) create bug fixes and modify the PROGRAM. LICENSEE hereby automatically grants to BROAD a non-exclusive, royalty-free, irrevocable license to any LICENSEE bug fixes or modifications to the PROGRAM with unlimited rights to sublicense and/or distribute. LICENSEE agrees to provide any such modifications and bug fixes to BROAD promptly upon their creation.
* The LICENSEE may apply the PROGRAM in a pipeline to data owned by users other than the LICENSEE and provide these users the results of the PROGRAM provided LICENSEE does so for academic non-commercial purposes only. For clarification purposes, academic sponsored research is not a commercial use under the terms of this Agreement.
* 2.2 No Sublicensing or Additional Rights. LICENSEE shall not sublicense or distribute the PROGRAM, in whole or in part, without prior written permission from BROAD. LICENSEE shall ensure that all of its users agree to the terms of this Agreement. LICENSEE further agrees that it shall not put the PROGRAM on a network, server, or other similar technology that may be accessed by anyone other than the LICENSEE and its employees and users who have agreed to the terms of this agreement.
* 2.3 License Limitations. Nothing in this Agreement shall be construed to confer any rights upon LICENSEE by implication, estoppel, or otherwise to any computer software, trademark, intellectual property, or patent rights of BROAD, or of any other entity, except as expressly granted herein. LICENSEE agrees that the PROGRAM, in whole or part, shall not be used for any commercial purpose, including without limitation, as the basis of a commercial software or hardware product or to provide services. LICENSEE further agrees that the PROGRAM shall not be copied or otherwise adapted in order to circumvent the need for obtaining a license for use of the PROGRAM.
*
* 3. PHONE-HOME FEATURE
* LICENSEE expressly acknowledges that the PROGRAM contains an embedded automatic reporting system (PHONE-HOME) which is enabled by default upon download. Unless LICENSEE requests disablement of PHONE-HOME, LICENSEE agrees that BROAD may collect limited information transmitted by PHONE-HOME regarding LICENSEE and its use of the PROGRAM. Such information shall include LICENSEES user identification, version number of the PROGRAM and tools being run, mode of analysis employed, and any error reports generated during run-time. Collection of such information is used by BROAD solely to monitor usage rates, fulfill reporting requirements to BROAD funding agencies, drive improvements to the PROGRAM, and facilitate adjustments to PROGRAM-related documentation.
*
* 4. OWNERSHIP OF INTELLECTUAL PROPERTY
* LICENSEE acknowledges that title to the PROGRAM shall remain with BROAD. The PROGRAM is marked with the following BROAD copyright notice and notice of attribution to contributors. LICENSEE shall retain such notice on all copies. LICENSEE agrees to include appropriate attribution if any results obtained from use of the PROGRAM are included in any publication.
* Copyright 2012-2015 Broad Institute, Inc.
* Notice of attribution: The GATK3 program was made available through the generosity of Medical and Population Genetics program at the Broad Institute, Inc.
* LICENSEE shall not use any trademark or trade name of BROAD, or any variation, adaptation, or abbreviation, of such marks or trade names, or any names of officers, faculty, students, employees, or agents of BROAD except as states above for attribution purposes.
*
* 5. INDEMNIFICATION
* LICENSEE shall indemnify, defend, and hold harmless BROAD, and their respective officers, faculty, students, employees, associated investigators and agents, and their respective successors, heirs and assigns, (Indemnitees), against any liability, damage, loss, or expense (including reasonable attorneys fees and expenses) incurred by or imposed upon any of the Indemnitees in connection with any claims, suits, actions, demands or judgments arising out of any theory of liability (including, without limitation, actions in the form of tort, warranty, or strict liability and regardless of whether such action has any factual basis) pursuant to any right or license granted under this Agreement.
*
* 6. NO REPRESENTATIONS OR WARRANTIES
* THE PROGRAM IS DELIVERED AS IS. BROAD MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND CONCERNING THE PROGRAM OR THE COPYRIGHT, EXPRESS OR IMPLIED, INCLUDING, WITHOUT LIMITATION, WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, NONINFRINGEMENT, OR THE ABSENCE OF LATENT OR OTHER DEFECTS, WHETHER OR NOT DISCOVERABLE. BROAD EXTENDS NO WARRANTIES OF ANY KIND AS TO PROGRAM CONFORMITY WITH WHATEVER USER MANUALS OR OTHER LITERATURE MAY BE ISSUED FROM TIME TO TIME.
* IN NO EVENT SHALL BROAD OR ITS RESPECTIVE DIRECTORS, OFFICERS, EMPLOYEES, AFFILIATED INVESTIGATORS AND AFFILIATES BE LIABLE FOR INCIDENTAL OR CONSEQUENTIAL DAMAGES OF ANY KIND, INCLUDING, WITHOUT LIMITATION, ECONOMIC DAMAGES OR INJURY TO PROPERTY AND LOST PROFITS, REGARDLESS OF WHETHER BROAD SHALL BE ADVISED, SHALL HAVE OTHER REASON TO KNOW, OR IN FACT SHALL KNOW OF THE POSSIBILITY OF THE FOREGOING.
*
* 7. ASSIGNMENT
* This Agreement is personal to LICENSEE and any rights or obligations assigned by LICENSEE without the prior written consent of BROAD shall be null and void.
*
* 8. MISCELLANEOUS
* 8.1 Export Control. LICENSEE gives assurance that it will comply with all United States export control laws and regulations controlling the export of the PROGRAM, including, without limitation, all Export Administration Regulations of the United States Department of Commerce. Among other things, these laws and regulations prohibit, or require a license for, the export of certain types of software to specified countries.
* 8.2 Termination. LICENSEE shall have the right to terminate this Agreement for any reason upon prior written notice to BROAD. If LICENSEE breaches any provision hereunder, and fails to cure such breach within thirty (30) days, BROAD may terminate this Agreement immediately. Upon termination, LICENSEE shall provide BROAD with written assurance that the original and all copies of the PROGRAM have been destroyed, except that, upon prior written authorization from BROAD, LICENSEE may retain a copy for archive purposes.
* 8.3 Survival. The following provisions shall survive the expiration or termination of this Agreement: Articles 1, 3, 4, 5 and Sections 2.2, 2.3, 7.3, and 7.4.
* 8.4 Notice. Any notices under this Agreement shall be in writing, shall specifically refer to this Agreement, and shall be sent by hand, recognized national overnight courier, confirmed facsimile transmission, confirmed electronic mail, or registered or certified mail, postage prepaid, return receipt requested. All notices under this Agreement shall be deemed effective upon receipt.
* 8.5 Amendment and Waiver; Entire Agreement. This Agreement may be amended, supplemented, or otherwise modified only by means of a written instrument signed by all parties. Any waiver of any rights or failure to act in a specific instance shall relate only to such instance and shall not be construed as an agreement to waive any rights or fail to act in any other instance, whether or not similar. This Agreement constitutes the entire agreement among the parties with respect to its subject matter and supersedes prior agreements or understandings between the parties relating to its subject matter.
* 8.6 Binding Effect; Headings. This Agreement shall be binding upon and inure to the benefit of the parties and their respective permitted successors and assigns. All headings are for convenience only and shall not affect the meaning of any provision of this Agreement.
* 8.7 Governing Law. This Agreement shall be construed, governed, interpreted and applied in accordance with the internal laws of the Commonwealth of Massachusetts, U.S.A., without regard to conflict of laws principles.
*/
package org.broadinstitute.gatk.tools.walkers.annotator;
import org.broadinstitute.gatk.tools.walkers.annotator.interfaces.AS_StandardAnnotation;
import org.broadinstitute.gatk.utils.pileup.PileupElement;
import org.broadinstitute.gatk.utils.sam.AlignmentUtils;
import org.broadinstitute.gatk.utils.sam.GATKSAMRecord;
import org.broadinstitute.gatk.utils.sam.ReadUtils;
import org.broadinstitute.gatk.utils.variant.GATKVCFConstants;
import java.util.Arrays;
import java.util.List;
/**
* Allele-specific Rank Sum Test for relative positioning of REF versus each ALT allele within reads
*
* <p>This variant-level annotation tests whether there is evidence of bias in the position of alleles within the reads that support them, between the reference and each alternate allele. Seeing an allele only near the ends of reads is indicative of error, because that is where sequencers tend to make the most errors. However, some variants located near the edges of sequenced regions will necessarily be covered by the ends of reads, so we can't just set an absolute "minimum distance from end of read" threshold. That is why we use a rank sum test to evaluate whether there is a difference in how well the reference allele and the alternate allele are supported.</p>
*
* <p>The ideal result is a value close to zero, which indicates there is little to no difference in where the alleles are found relative to the ends of reads. A negative value indicates that the alternate allele is found at the ends of reads more often than the reference allele. Conversely, a positive value indicates that the reference allele is found at the ends of reads more often than the alternate allele. </p>
*
* <p>This annotation can be used to evaluate confidence in a variant call and is a recommended covariate for variant recalibration (VQSR). Finding a statistically significant difference in relative position either way suggests that the sequencing process may have been biased or affected by an artifact. In practice, we only filter out low negative values when evaluating variant quality because the idea is to filter out variants for which the quality of the data supporting the alternate allele is comparatively low. The reverse case, where it is the quality of data supporting the reference allele that is lower (resulting in positive ranksum scores), is not really informative for filtering variants.</p>
*
* <h3>Statistical notes</h3>
* <p>The value output for this annotation is the u-based z-approximation from the Mann-Whitney-Wilcoxon Rank Sum Test for site position within reads (position within reads supporting REF vs. position within reads supporting ALT). See the <a href="http://www.broadinstitute.org/gatk/guide/article?id=4732">method document on statistical tests</a> for a more detailed explanation of the ranksum test.</p>
*
* <h3>Caveat</h3>
* <p>Uninformative reads are not used in these annotations.</p>
*
*
*/
public class AS_ReadPosRankSumTest extends AS_RankSumTest implements AS_StandardAnnotation {
@Override
public List<String> getKeyNames() { return Arrays.asList(GATKVCFConstants.AS_READ_POS_RANK_SUM_KEY); }
@Override
public String getRawKeyName() { return GATKVCFConstants.AS_RAW_READ_POS_RANK_SUM_KEY;}
@Override
protected Double getElementForRead(final GATKSAMRecord read, final int refLoc) {
final int offset = ReadUtils.getReadCoordinateForReferenceCoordinate(read.getSoftStart(), read.getCigar(), refLoc, ReadUtils.ClippingTail.RIGHT_TAIL, true);
if ( offset == ReadUtils.CLIPPING_GOAL_NOT_REACHED )
return null;
int readPos = AlignmentUtils.calcAlignmentByteArrayOffset(read.getCigar(), offset, false, 0, 0);
final int numAlignedBases = AlignmentUtils.getNumAlignedBasesCountingSoftClips( read );
if (readPos > numAlignedBases / 2)
readPos = numAlignedBases - (readPos + 1);
return (double)readPos;
}
@Override
protected boolean isUsableRead(final GATKSAMRecord read, final int refLoc) {
return super.isUsableRead(read, refLoc) && read.getSoftStart() + read.getCigar().getReadLength() > refLoc;
}
}

View File

@ -0,0 +1,379 @@
/*
* By downloading the PROGRAM you agree to the following terms of use:
*
* BROAD INSTITUTE
* SOFTWARE LICENSE AGREEMENT
* FOR ACADEMIC NON-COMMERCIAL RESEARCH PURPOSES ONLY
*
* This Agreement is made between the Broad Institute, Inc. with a principal address at 415 Main Street, Cambridge, MA 02142 (BROAD) and the LICENSEE and is effective at the date the downloading is completed (EFFECTIVE DATE).
*
* WHEREAS, LICENSEE desires to license the PROGRAM, as defined hereinafter, and BROAD wishes to have this PROGRAM utilized in the public interest, subject only to the royalty-free, nonexclusive, nontransferable license rights of the United States Government pursuant to 48 CFR 52.227-14; and
* WHEREAS, LICENSEE desires to license the PROGRAM and BROAD desires to grant a license on the following terms and conditions.
* NOW, THEREFORE, in consideration of the promises and covenants made herein, the parties hereto agree as follows:
*
* 1. DEFINITIONS
* 1.1 PROGRAM shall mean copyright in the object code and source code known as GATK3 and related documentation, if any, as they exist on the EFFECTIVE DATE and can be downloaded from http://www.broadinstitute.org/gatk on the EFFECTIVE DATE.
*
* 2. LICENSE
* 2.1 Grant. Subject to the terms of this Agreement, BROAD hereby grants to LICENSEE, solely for academic non-commercial research purposes, a non-exclusive, non-transferable license to: (a) download, execute and display the PROGRAM and (b) create bug fixes and modify the PROGRAM. LICENSEE hereby automatically grants to BROAD a non-exclusive, royalty-free, irrevocable license to any LICENSEE bug fixes or modifications to the PROGRAM with unlimited rights to sublicense and/or distribute. LICENSEE agrees to provide any such modifications and bug fixes to BROAD promptly upon their creation.
* The LICENSEE may apply the PROGRAM in a pipeline to data owned by users other than the LICENSEE and provide these users the results of the PROGRAM provided LICENSEE does so for academic non-commercial purposes only. For clarification purposes, academic sponsored research is not a commercial use under the terms of this Agreement.
* 2.2 No Sublicensing or Additional Rights. LICENSEE shall not sublicense or distribute the PROGRAM, in whole or in part, without prior written permission from BROAD. LICENSEE shall ensure that all of its users agree to the terms of this Agreement. LICENSEE further agrees that it shall not put the PROGRAM on a network, server, or other similar technology that may be accessed by anyone other than the LICENSEE and its employees and users who have agreed to the terms of this agreement.
* 2.3 License Limitations. Nothing in this Agreement shall be construed to confer any rights upon LICENSEE by implication, estoppel, or otherwise to any computer software, trademark, intellectual property, or patent rights of BROAD, or of any other entity, except as expressly granted herein. LICENSEE agrees that the PROGRAM, in whole or part, shall not be used for any commercial purpose, including without limitation, as the basis of a commercial software or hardware product or to provide services. LICENSEE further agrees that the PROGRAM shall not be copied or otherwise adapted in order to circumvent the need for obtaining a license for use of the PROGRAM.
*
* 3. PHONE-HOME FEATURE
* LICENSEE expressly acknowledges that the PROGRAM contains an embedded automatic reporting system (PHONE-HOME) which is enabled by default upon download. Unless LICENSEE requests disablement of PHONE-HOME, LICENSEE agrees that BROAD may collect limited information transmitted by PHONE-HOME regarding LICENSEE and its use of the PROGRAM. Such information shall include LICENSEES user identification, version number of the PROGRAM and tools being run, mode of analysis employed, and any error reports generated during run-time. Collection of such information is used by BROAD solely to monitor usage rates, fulfill reporting requirements to BROAD funding agencies, drive improvements to the PROGRAM, and facilitate adjustments to PROGRAM-related documentation.
*
* 4. OWNERSHIP OF INTELLECTUAL PROPERTY
* LICENSEE acknowledges that title to the PROGRAM shall remain with BROAD. The PROGRAM is marked with the following BROAD copyright notice and notice of attribution to contributors. LICENSEE shall retain such notice on all copies. LICENSEE agrees to include appropriate attribution if any results obtained from use of the PROGRAM are included in any publication.
* Copyright 2012-2015 Broad Institute, Inc.
* Notice of attribution: The GATK3 program was made available through the generosity of Medical and Population Genetics program at the Broad Institute, Inc.
* LICENSEE shall not use any trademark or trade name of BROAD, or any variation, adaptation, or abbreviation, of such marks or trade names, or any names of officers, faculty, students, employees, or agents of BROAD except as states above for attribution purposes.
*
* 5. INDEMNIFICATION
* LICENSEE shall indemnify, defend, and hold harmless BROAD, and their respective officers, faculty, students, employees, associated investigators and agents, and their respective successors, heirs and assigns, (Indemnitees), against any liability, damage, loss, or expense (including reasonable attorneys fees and expenses) incurred by or imposed upon any of the Indemnitees in connection with any claims, suits, actions, demands or judgments arising out of any theory of liability (including, without limitation, actions in the form of tort, warranty, or strict liability and regardless of whether such action has any factual basis) pursuant to any right or license granted under this Agreement.
*
* 6. NO REPRESENTATIONS OR WARRANTIES
* THE PROGRAM IS DELIVERED AS IS. BROAD MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND CONCERNING THE PROGRAM OR THE COPYRIGHT, EXPRESS OR IMPLIED, INCLUDING, WITHOUT LIMITATION, WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, NONINFRINGEMENT, OR THE ABSENCE OF LATENT OR OTHER DEFECTS, WHETHER OR NOT DISCOVERABLE. BROAD EXTENDS NO WARRANTIES OF ANY KIND AS TO PROGRAM CONFORMITY WITH WHATEVER USER MANUALS OR OTHER LITERATURE MAY BE ISSUED FROM TIME TO TIME.
* IN NO EVENT SHALL BROAD OR ITS RESPECTIVE DIRECTORS, OFFICERS, EMPLOYEES, AFFILIATED INVESTIGATORS AND AFFILIATES BE LIABLE FOR INCIDENTAL OR CONSEQUENTIAL DAMAGES OF ANY KIND, INCLUDING, WITHOUT LIMITATION, ECONOMIC DAMAGES OR INJURY TO PROPERTY AND LOST PROFITS, REGARDLESS OF WHETHER BROAD SHALL BE ADVISED, SHALL HAVE OTHER REASON TO KNOW, OR IN FACT SHALL KNOW OF THE POSSIBILITY OF THE FOREGOING.
*
* 7. ASSIGNMENT
* This Agreement is personal to LICENSEE and any rights or obligations assigned by LICENSEE without the prior written consent of BROAD shall be null and void.
*
* 8. MISCELLANEOUS
* 8.1 Export Control. LICENSEE gives assurance that it will comply with all United States export control laws and regulations controlling the export of the PROGRAM, including, without limitation, all Export Administration Regulations of the United States Department of Commerce. Among other things, these laws and regulations prohibit, or require a license for, the export of certain types of software to specified countries.
* 8.2 Termination. LICENSEE shall have the right to terminate this Agreement for any reason upon prior written notice to BROAD. If LICENSEE breaches any provision hereunder, and fails to cure such breach within thirty (30) days, BROAD may terminate this Agreement immediately. Upon termination, LICENSEE shall provide BROAD with written assurance that the original and all copies of the PROGRAM have been destroyed, except that, upon prior written authorization from BROAD, LICENSEE may retain a copy for archive purposes.
* 8.3 Survival. The following provisions shall survive the expiration or termination of this Agreement: Articles 1, 3, 4, 5 and Sections 2.2, 2.3, 7.3, and 7.4.
* 8.4 Notice. Any notices under this Agreement shall be in writing, shall specifically refer to this Agreement, and shall be sent by hand, recognized national overnight courier, confirmed facsimile transmission, confirmed electronic mail, or registered or certified mail, postage prepaid, return receipt requested. All notices under this Agreement shall be deemed effective upon receipt.
* 8.5 Amendment and Waiver; Entire Agreement. This Agreement may be amended, supplemented, or otherwise modified only by means of a written instrument signed by all parties. Any waiver of any rights or failure to act in a specific instance shall relate only to such instance and shall not be construed as an agreement to waive any rights or fail to act in any other instance, whether or not similar. This Agreement constitutes the entire agreement among the parties with respect to its subject matter and supersedes prior agreements or understandings between the parties relating to its subject matter.
* 8.6 Binding Effect; Headings. This Agreement shall be binding upon and inure to the benefit of the parties and their respective permitted successors and assigns. All headings are for convenience only and shall not affect the meaning of any provision of this Agreement.
* 8.7 Governing Law. This Agreement shall be construed, governed, interpreted and applied in accordance with the internal laws of the Commonwealth of Massachusetts, U.S.A., without regard to conflict of laws principles.
*/
package org.broadinstitute.gatk.tools.walkers.annotator;
import htsjdk.variant.variantcontext.Allele;
import htsjdk.variant.variantcontext.GenotypesContext;
import htsjdk.variant.variantcontext.VariantContext;
import htsjdk.variant.vcf.VCFHeaderLine;
import htsjdk.variant.vcf.VCFInfoHeaderLine;
import org.apache.log4j.Logger;
import org.broadinstitute.gatk.engine.GenomeAnalysisEngine;
import org.broadinstitute.gatk.tools.walkers.annotator.interfaces.AnnotatorCompatible;
import org.broadinstitute.gatk.tools.walkers.annotator.interfaces.ReducibleAnnotation;
import org.broadinstitute.gatk.utils.contexts.AlignmentContext;
import org.broadinstitute.gatk.utils.contexts.ReferenceContext;
import org.broadinstitute.gatk.utils.exceptions.GATKException;
import org.broadinstitute.gatk.utils.genotyper.MostLikelyAllele;
import org.broadinstitute.gatk.utils.genotyper.PerReadAlleleLikelihoodMap;
import org.broadinstitute.gatk.utils.refdata.RefMetaDataTracker;
import org.broadinstitute.gatk.utils.sam.GATKSAMRecord;
import org.broadinstitute.gatk.utils.variant.GATKVCFConstants;
import org.broadinstitute.gatk.utils.variant.GATKVCFHeaderLines;
import java.util.*;
/**
* Allele-specific implementation of strand bias annotations
*/
public abstract class AS_StrandBiasTest extends StrandBiasTest implements ReducibleAnnotation {
private final static Logger logger = Logger.getLogger(StrandBiasTest.class);
protected final String splitDelim = "\\|"; //String.split takes a regex, so we need to escape the pipe
protected final String printDelim = "|";
protected final String reducedDelim = ",";
protected AnnotatorCompatible callingWalker;
protected final int MIN_COUNT = 2;
protected static final double MIN_PVALUE = 1E-320;
protected final int FORWARD = 0;
protected final int REVERSE = 1;
protected final ArrayList<Integer> ZERO_LIST = new ArrayList<>();
@Override
public void initialize(final AnnotatorCompatible walker, final GenomeAnalysisEngine toolkit, final Set<VCFHeaderLine> headerLines) {
if (!AnnotationUtils.walkerSupportsAlleleSpecificAnnotations(walker))
logger.warn("Allele-specific annotations can only be used with HaplotypeCaller, CombineGVCFs and GenotypeGVCFs -- no data will be output");
callingWalker = walker;
ZERO_LIST.add(0,0);
ZERO_LIST.add(1,0);
}
@Override
public List<VCFInfoHeaderLine> getDescriptions() {
if (AnnotationUtils.walkerRequiresRawData(callingWalker))
return Arrays.asList(GATKVCFHeaderLines.getInfoLine(getRawKeyName()));
else
return Arrays.asList(GATKVCFHeaderLines.getInfoLine(getKeyNames().get(0)));
}
@Override
public String getRawKeyName() { return GATKVCFConstants.AS_SB_TABLE_KEY; }
public Map<String, Object> annotateRawData(final RefMetaDataTracker tracker,
final AnnotatorCompatible walker,
final ReferenceContext ref,
final Map<String, AlignmentContext> stratifiedContexts,
final VariantContext vc,
final Map<String, PerReadAlleleLikelihoodMap> perReadAlleleLikelihoodMap ) {
//for allele-specific annotations we only call from HC and we only use perReadAlleleLikelihoodMap
if ( perReadAlleleLikelihoodMap == null)
return new HashMap<>();
// calculate the annotation from the stratified per read likelihood map
// stratifiedPerReadAllelelikelihoodMap can come from HaplotypeCaller call to VariantAnnotatorEngine
else if (perReadAlleleLikelihoodMap != null) {
final HashMap<String, Object> annotations = new HashMap<>();
final ReducibleAnnotationData<List<Integer>> myData = new AlleleSpecificAnnotationData<>(vc.getAlleles(),null);
calculateRawData(vc, perReadAlleleLikelihoodMap, myData);
final Map<Allele, List<Integer>> perAlleleValues = myData.getAttributeMap();
final String annotationString = makeRawAnnotationString(vc.getAlleles(), perAlleleValues);
annotations.put(getRawKeyName(), annotationString);
return annotations;
}
else {
// for non-snp variants, we need per-read likelihoods.
// for snps, we can get same result from simple pileup
// for indels that do not have a computed strand bias (SB) or strand bias by sample (SBBS)
return null;
}
}
protected void parseRawDataString(ReducibleAnnotationData<List<Integer>> myData) {
final String rawDataString = myData.getRawData();
String[] rawDataPerAllele;
String[] rawListEntriesAsStringVector;
Map<Allele, List<Integer>> perAlleleValues = new HashMap<>();
//Initialize maps
for (Allele current : myData.getAlleles()) {
perAlleleValues.put(current, new LinkedList<Integer>());
}
//rawDataPerAllele is the list of values for each allele (each of variable length)
rawDataPerAllele = rawDataString.split(splitDelim);
for (int i=0; i<rawDataPerAllele.length; i++) {
String alleleData = rawDataPerAllele[i];
if (alleleData.isEmpty())
continue;
List<Integer> alleleList = perAlleleValues.get(myData.getAlleles().get(i));
rawListEntriesAsStringVector = alleleData.split(",");
//Read counts will only ever be integers
for (String s : rawListEntriesAsStringVector) {
if (!s.isEmpty())
alleleList.add(Integer.parseInt(s.trim()));
}
}
myData.setAttributeMap(perAlleleValues);
}
@Override
public Map<String, Object> combineRawData(final List<Allele> vcAlleles, final List<? extends ReducibleAnnotationData> annotationList) {
//VC already contains merged alleles from ReferenceConfidenceVariantContextMerger
ReducibleAnnotationData combinedData = new AlleleSpecificAnnotationData(vcAlleles, null);
for (final ReducibleAnnotationData currentValue : annotationList) {
parseRawDataString(currentValue);
combineAttributeMap(currentValue, combinedData);
}
final Map<String, Object> annotations = new HashMap<>();
final String annotationString = makeRawAnnotationString(vcAlleles, combinedData.getAttributeMap());
annotations.put(getRawKeyName(), annotationString);
return annotations;
}
protected void combineAttributeMap(final ReducibleAnnotationData<List<Integer>> toAdd, final ReducibleAnnotationData<List<Integer>> combined) {
for (final Allele a : combined.getAlleles()) {
if (toAdd.hasAttribute(a) && toAdd.getAttribute(a) != null) {
if (combined.getAttribute(a) != null) {
combined.getAttribute(a).set(0, (int) combined.getAttribute(a).get(0) + (int) toAdd.getAttribute(a).get(0));
combined.getAttribute(a).set(1, (int) combined.getAttribute(a).get(1) + (int) toAdd.getAttribute(a).get(1));
}
else {
List<Integer> alleleData = new ArrayList<>();
alleleData.add(0, toAdd.getAttribute(a).get(0));
alleleData.add(1, toAdd.getAttribute(a).get(1));
combined.putAttribute(a,alleleData);
}
}
}
}
protected String makeRawAnnotationString(final List<Allele> vcAlleles, final Map<Allele, List<Integer>> perAlleleValues) {
String annotationString = "";
for (final Allele a : vcAlleles) {
if (!annotationString.isEmpty())
annotationString += printDelim;
List<Integer> alleleValues = perAlleleValues.get(a);
if (alleleValues == null)
alleleValues = ZERO_LIST;
annotationString += encode(alleleValues);
}
return annotationString;
}
protected String encode(List<Integer> alleleValues) {
String annotationString = "";
for (int j =0; j < alleleValues.size(); j++) {
annotationString += alleleValues.get(j);
if (j < alleleValues.size()-1)
annotationString += ",";
}
return annotationString;
}
protected String makeReducedAnnotationString(VariantContext vc, Map<Allele,Double> perAltsStrandCounts) {
String annotationString = "";
for (Allele a : vc.getAlternateAlleles()) {
if (!annotationString.isEmpty())
annotationString += reducedDelim;
if (!perAltsStrandCounts.containsKey(a))
logger.warn("ERROR: VC allele not found in annotation alleles -- maybe there was trimming?");
else
annotationString += String.format("%.3f", perAltsStrandCounts.get(a));
}
return annotationString;
}
/**
*
* @param vc -- contains the final set of alleles, possibly subset by GenotypeGVCFs
* @param originalVC -- used to get all the alleles for all gVCFs
* @return
*/
@Override
public Map<String, Object> finalizeRawData(final VariantContext vc, final VariantContext originalVC) {
Map<String, Object> annotations = new HashMap<>();
if (!vc.hasAttribute(getRawKeyName()))
return new HashMap<>();
String rawRankSumData = vc.getAttributeAsString(getRawKeyName(),null);
if (rawRankSumData == null)
return new HashMap<>();
AlleleSpecificAnnotationData<List<Integer>> myData = new AlleleSpecificAnnotationData<>(originalVC.getAlleles(), rawRankSumData);
parseRawDataString(myData);
Map<Allele, Double> perAltRankSumResults = calculateReducedData(myData);
String annotationString = makeReducedAnnotationString(vc, perAltRankSumResults);
annotations.put(getKeyNames().get(0), annotationString);
return annotations;
}
@Override
public void calculateRawData(final VariantContext vc, Map<String, PerReadAlleleLikelihoodMap> pralm, final ReducibleAnnotationData rawAnnotations) {
if(pralm == null)
return;
getStrandCountsFromLikelihoodMap(vc, pralm, rawAnnotations, MIN_COUNT);
}
protected abstract Map<Allele,Double> calculateReducedData(final AlleleSpecificAnnotationData<List<Integer>> combinedData );
/**
Allocate and fill a 2x2 strand contingency table. In the end, it'll look something like this:
* fw rc
* allele1 # #
* allele2 # #
* @return a 2x2 contingency table
*/
public void getStrandCountsFromLikelihoodMap( final VariantContext vc,
final Map<String, PerReadAlleleLikelihoodMap> stratifiedPerReadAlleleLikelihoodMap,
final ReducibleAnnotationData perAlleleValues,
final int minCount) {
if( stratifiedPerReadAlleleLikelihoodMap == null )
return;
if( vc == null )
return;
final Allele ref = vc.getReference();
final List<Allele> allAlts = vc.getAlternateAlleles();
for (final PerReadAlleleLikelihoodMap maps : stratifiedPerReadAlleleLikelihoodMap.values() ) {
final ReducibleAnnotationData<List<Integer>> sampleTable = new AlleleSpecificAnnotationData<>(vc.getAlleles(),null);
for (final Map.Entry<GATKSAMRecord,Map<Allele,Double>> el : maps.getLikelihoodReadMap().entrySet()) {
final MostLikelyAllele mostLikelyAllele = PerReadAlleleLikelihoodMap.getMostLikelyAllele(el.getValue());
final GATKSAMRecord read = el.getKey();
updateTable(mostLikelyAllele.getAlleleIfInformative(), read, ref, allAlts, sampleTable);
}
//for each sample (value in stratified PRALM), only include it if there are >minCount informative reads
if ( passesMinimumThreshold(sampleTable, minCount) )
combineAttributeMap(sampleTable, perAlleleValues);
}
}
private void updateTable(final Allele bestAllele, final GATKSAMRecord read, final Allele ref, final List<Allele> allAlts, final ReducibleAnnotationData<List<Integer>> perAlleleValues) {
final boolean matchesRef = bestAllele.equals(ref, true);
final boolean matchesAnyAlt = allAlts.contains(bestAllele);
//for uninformative reads
if(bestAllele.isNoCall())
return;
//can happen if a read's most likely allele has been removed when --max_alternate_alleles is exceeded
if (!( matchesRef || matchesAnyAlt ))
return;
final List<Integer> alleleStrandCounts;
if (perAlleleValues.hasAttribute(bestAllele) && perAlleleValues.getAttribute(bestAllele) != null)
alleleStrandCounts = perAlleleValues.getAttribute(bestAllele);
else {
alleleStrandCounts = new ArrayList<>();
alleleStrandCounts.add(0,0);
alleleStrandCounts.add(1,0);
}
if (read.isStrandless()) {
// a strandless read counts as observations on both strand, at 50% weight, with a minimum of 1
// (the 1 is to ensure that a strandless read always counts as an observation on both strands, even
// if the read is only seen once, because it's a merged read or other)
alleleStrandCounts.set(FORWARD, alleleStrandCounts.get(FORWARD)+1);
alleleStrandCounts.set(REVERSE, alleleStrandCounts.get(REVERSE)+1);
} else {
// a normal read with an actual strand
final boolean isFW = !read.getReadNegativeStrandFlag();
if (isFW)
alleleStrandCounts.set(FORWARD, alleleStrandCounts.get(FORWARD)+1);
else
alleleStrandCounts.set(REVERSE, alleleStrandCounts.get(REVERSE)+1);
}
perAlleleValues.putAttribute(bestAllele, alleleStrandCounts);
}
/**
* Does this strand data array pass the minimum threshold for inclusion?
*
* @param sampleTable the per-allele fwd/rev read counts for a single sample
* @param minCount The minimum threshold of counts in the array
* @return true if it passes the minimum threshold, false otherwise
*/
protected boolean passesMinimumThreshold(final ReducibleAnnotationData<List<Integer>> sampleTable, final int minCount) {
// the read total must be greater than MIN_COUNT
int readTotal = 0;
for (final List<Integer> alleleValues : sampleTable.getAttributeMap().values()) {
if (alleleValues != null) {
readTotal += alleleValues.get(FORWARD);
readTotal += alleleValues.get(REVERSE);
}
}
return readTotal > minCount;
}
@Override
//Allele-specific annotations cannot be called from walkers other than HaplotypeCaller
protected Map<String, Object> calculateAnnotationFromGTfield(final GenotypesContext genotypes){
return new HashMap<>();
}
@Override
//Allele-specific annotations cannot be called from walkers other than HaplotypeCaller
protected Map<String, Object> calculateAnnotationFromStratifiedContexts(final Map<String, AlignmentContext> stratifiedContexts,
final VariantContext vc){
return new HashMap<>();
}
@Override
//This just calls the non-allele-specific code in StrandBiasTest.java
protected abstract Map<String, Object> calculateAnnotationFromLikelihoodMap(final Map<String, PerReadAlleleLikelihoodMap> stratifiedPerReadAlleleLikelihoodMap,
final VariantContext vc);
}

View File

@ -0,0 +1,123 @@
/*
* By downloading the PROGRAM you agree to the following terms of use:
*
* BROAD INSTITUTE
* SOFTWARE LICENSE AGREEMENT
* FOR ACADEMIC NON-COMMERCIAL RESEARCH PURPOSES ONLY
*
* This Agreement is made between the Broad Institute, Inc. with a principal address at 415 Main Street, Cambridge, MA 02142 (BROAD) and the LICENSEE and is effective at the date the downloading is completed (EFFECTIVE DATE).
*
* WHEREAS, LICENSEE desires to license the PROGRAM, as defined hereinafter, and BROAD wishes to have this PROGRAM utilized in the public interest, subject only to the royalty-free, nonexclusive, nontransferable license rights of the United States Government pursuant to 48 CFR 52.227-14; and
* WHEREAS, LICENSEE desires to license the PROGRAM and BROAD desires to grant a license on the following terms and conditions.
* NOW, THEREFORE, in consideration of the promises and covenants made herein, the parties hereto agree as follows:
*
* 1. DEFINITIONS
* 1.1 PROGRAM shall mean copyright in the object code and source code known as GATK3 and related documentation, if any, as they exist on the EFFECTIVE DATE and can be downloaded from http://www.broadinstitute.org/gatk on the EFFECTIVE DATE.
*
* 2. LICENSE
* 2.1 Grant. Subject to the terms of this Agreement, BROAD hereby grants to LICENSEE, solely for academic non-commercial research purposes, a non-exclusive, non-transferable license to: (a) download, execute and display the PROGRAM and (b) create bug fixes and modify the PROGRAM. LICENSEE hereby automatically grants to BROAD a non-exclusive, royalty-free, irrevocable license to any LICENSEE bug fixes or modifications to the PROGRAM with unlimited rights to sublicense and/or distribute. LICENSEE agrees to provide any such modifications and bug fixes to BROAD promptly upon their creation.
* The LICENSEE may apply the PROGRAM in a pipeline to data owned by users other than the LICENSEE and provide these users the results of the PROGRAM provided LICENSEE does so for academic non-commercial purposes only. For clarification purposes, academic sponsored research is not a commercial use under the terms of this Agreement.
* 2.2 No Sublicensing or Additional Rights. LICENSEE shall not sublicense or distribute the PROGRAM, in whole or in part, without prior written permission from BROAD. LICENSEE shall ensure that all of its users agree to the terms of this Agreement. LICENSEE further agrees that it shall not put the PROGRAM on a network, server, or other similar technology that may be accessed by anyone other than the LICENSEE and its employees and users who have agreed to the terms of this agreement.
* 2.3 License Limitations. Nothing in this Agreement shall be construed to confer any rights upon LICENSEE by implication, estoppel, or otherwise to any computer software, trademark, intellectual property, or patent rights of BROAD, or of any other entity, except as expressly granted herein. LICENSEE agrees that the PROGRAM, in whole or part, shall not be used for any commercial purpose, including without limitation, as the basis of a commercial software or hardware product or to provide services. LICENSEE further agrees that the PROGRAM shall not be copied or otherwise adapted in order to circumvent the need for obtaining a license for use of the PROGRAM.
*
* 3. PHONE-HOME FEATURE
* LICENSEE expressly acknowledges that the PROGRAM contains an embedded automatic reporting system (PHONE-HOME) which is enabled by default upon download. Unless LICENSEE requests disablement of PHONE-HOME, LICENSEE agrees that BROAD may collect limited information transmitted by PHONE-HOME regarding LICENSEE and its use of the PROGRAM. Such information shall include LICENSEES user identification, version number of the PROGRAM and tools being run, mode of analysis employed, and any error reports generated during run-time. Collection of such information is used by BROAD solely to monitor usage rates, fulfill reporting requirements to BROAD funding agencies, drive improvements to the PROGRAM, and facilitate adjustments to PROGRAM-related documentation.
*
* 4. OWNERSHIP OF INTELLECTUAL PROPERTY
* LICENSEE acknowledges that title to the PROGRAM shall remain with BROAD. The PROGRAM is marked with the following BROAD copyright notice and notice of attribution to contributors. LICENSEE shall retain such notice on all copies. LICENSEE agrees to include appropriate attribution if any results obtained from use of the PROGRAM are included in any publication.
* Copyright 2012-2015 Broad Institute, Inc.
* Notice of attribution: The GATK3 program was made available through the generosity of Medical and Population Genetics program at the Broad Institute, Inc.
* LICENSEE shall not use any trademark or trade name of BROAD, or any variation, adaptation, or abbreviation, of such marks or trade names, or any names of officers, faculty, students, employees, or agents of BROAD except as states above for attribution purposes.
*
* 5. INDEMNIFICATION
* LICENSEE shall indemnify, defend, and hold harmless BROAD, and their respective officers, faculty, students, employees, associated investigators and agents, and their respective successors, heirs and assigns, (Indemnitees), against any liability, damage, loss, or expense (including reasonable attorneys fees and expenses) incurred by or imposed upon any of the Indemnitees in connection with any claims, suits, actions, demands or judgments arising out of any theory of liability (including, without limitation, actions in the form of tort, warranty, or strict liability and regardless of whether such action has any factual basis) pursuant to any right or license granted under this Agreement.
*
* 6. NO REPRESENTATIONS OR WARRANTIES
* THE PROGRAM IS DELIVERED AS IS. BROAD MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND CONCERNING THE PROGRAM OR THE COPYRIGHT, EXPRESS OR IMPLIED, INCLUDING, WITHOUT LIMITATION, WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, NONINFRINGEMENT, OR THE ABSENCE OF LATENT OR OTHER DEFECTS, WHETHER OR NOT DISCOVERABLE. BROAD EXTENDS NO WARRANTIES OF ANY KIND AS TO PROGRAM CONFORMITY WITH WHATEVER USER MANUALS OR OTHER LITERATURE MAY BE ISSUED FROM TIME TO TIME.
* IN NO EVENT SHALL BROAD OR ITS RESPECTIVE DIRECTORS, OFFICERS, EMPLOYEES, AFFILIATED INVESTIGATORS AND AFFILIATES BE LIABLE FOR INCIDENTAL OR CONSEQUENTIAL DAMAGES OF ANY KIND, INCLUDING, WITHOUT LIMITATION, ECONOMIC DAMAGES OR INJURY TO PROPERTY AND LOST PROFITS, REGARDLESS OF WHETHER BROAD SHALL BE ADVISED, SHALL HAVE OTHER REASON TO KNOW, OR IN FACT SHALL KNOW OF THE POSSIBILITY OF THE FOREGOING.
*
* 7. ASSIGNMENT
* This Agreement is personal to LICENSEE and any rights or obligations assigned by LICENSEE without the prior written consent of BROAD shall be null and void.
*
* 8. MISCELLANEOUS
* 8.1 Export Control. LICENSEE gives assurance that it will comply with all United States export control laws and regulations controlling the export of the PROGRAM, including, without limitation, all Export Administration Regulations of the United States Department of Commerce. Among other things, these laws and regulations prohibit, or require a license for, the export of certain types of software to specified countries.
* 8.2 Termination. LICENSEE shall have the right to terminate this Agreement for any reason upon prior written notice to BROAD. If LICENSEE breaches any provision hereunder, and fails to cure such breach within thirty (30) days, BROAD may terminate this Agreement immediately. Upon termination, LICENSEE shall provide BROAD with written assurance that the original and all copies of the PROGRAM have been destroyed, except that, upon prior written authorization from BROAD, LICENSEE may retain a copy for archive purposes.
* 8.3 Survival. The following provisions shall survive the expiration or termination of this Agreement: Articles 1, 3, 4, 5 and Sections 2.2, 2.3, 7.3, and 7.4.
* 8.4 Notice. Any notices under this Agreement shall be in writing, shall specifically refer to this Agreement, and shall be sent by hand, recognized national overnight courier, confirmed facsimile transmission, confirmed electronic mail, or registered or certified mail, postage prepaid, return receipt requested. All notices under this Agreement shall be deemed effective upon receipt.
* 8.5 Amendment and Waiver; Entire Agreement. This Agreement may be amended, supplemented, or otherwise modified only by means of a written instrument signed by all parties. Any waiver of any rights or failure to act in a specific instance shall relate only to such instance and shall not be construed as an agreement to waive any rights or fail to act in any other instance, whether or not similar. This Agreement constitutes the entire agreement among the parties with respect to its subject matter and supersedes prior agreements or understandings between the parties relating to its subject matter.
* 8.6 Binding Effect; Headings. This Agreement shall be binding upon and inure to the benefit of the parties and their respective permitted successors and assigns. All headings are for convenience only and shall not affect the meaning of any provision of this Agreement.
* 8.7 Governing Law. This Agreement shall be construed, governed, interpreted and applied in accordance with the internal laws of the Commonwealth of Massachusetts, U.S.A., without regard to conflict of laws principles.
*/
package org.broadinstitute.gatk.tools.walkers.annotator;
import htsjdk.variant.variantcontext.Allele;
import htsjdk.variant.variantcontext.VariantContext;
import org.broadinstitute.gatk.tools.walkers.annotator.interfaces.*;
import org.broadinstitute.gatk.utils.genotyper.PerReadAlleleLikelihoodMap;
import org.broadinstitute.gatk.utils.variant.GATKVCFConstants;
import java.util.Collections;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
/**
* Allele-specific strand bias estimated by the Symmetric Odds Ratio test
*
*/
public class AS_StrandOddsRatio extends AS_StrandBiasTest implements AS_StandardAnnotation {
@Override
public List<String> getKeyNames() {
return Collections.singletonList(GATKVCFConstants.AS_STRAND_ODDS_RATIO_KEY);
}
@Override
protected Map<String, Object> calculateAnnotationFromLikelihoodMap(Map<String, PerReadAlleleLikelihoodMap> stratifiedPerReadAlleleLikelihoodMap,
final VariantContext vc){
// either SNP with no alignment context, or indels: per-read likelihood map needed
final int[][] table = getContingencyTable(stratifiedPerReadAlleleLikelihoodMap, vc, MIN_COUNT);
final double ratio = calculateSOR(table);
return Collections.singletonMap(getKeyNames().get(0), (Object)String.format("%.3f",ratio));
}
@Override
protected Map<Allele,Double> calculateReducedData(AlleleSpecificAnnotationData<List<Integer>> combinedData) {
final Map<Allele,Double> annotationMap = new HashMap<>();
final Map<Allele, List<Integer>> perAlleleData = combinedData.getAttributeMap();
final List<Integer> refStrandCounts = perAlleleData.get(combinedData.getRefAllele());
for (final Allele a : perAlleleData.keySet()) {
List<Integer> altStrandCounts = perAlleleData.get(a);
int[][] refAltTable = new int[][] {new int[]{refStrandCounts.get(0),refStrandCounts.get(1)},new int[]{altStrandCounts.get(0),altStrandCounts.get(1)}};
annotationMap.put(a,calculateSOR(refAltTable));
}
return annotationMap;
}
/**
* Computes the SOR value of a table after augmentation (adding pseudocounts). Based on the symmetric odds ratio but modified to take on
* low values when the reference +/- read count ratio is skewed but the alt count ratio is not. Natural log is taken
* to keep values within roughly the same range as other annotations.
*
* Adding pseudocounts prevent divide-by-zero.
*
* @param originalTable The table before augmentation
* @return the SOR annotation value
*/
final protected double calculateSOR(final int[][] originalTable) {
final double[][] augmentedTable = StrandBiasTableUtils.augmentContingencyTable(originalTable);
double ratio = 0;
ratio += (augmentedTable[0][0] / augmentedTable[0][1]) * (augmentedTable[1][1] / augmentedTable[1][0]);
ratio += (augmentedTable[0][1] / augmentedTable[0][0]) * (augmentedTable[1][0] / augmentedTable[1][1]);
final double refRatio = (Math.min(augmentedTable[0][0], augmentedTable[0][1])/Math.max(augmentedTable[0][0], augmentedTable[0][1]));
final double altRatio = (Math.min(augmentedTable[1][0], augmentedTable[1][1])/Math.max(augmentedTable[1][0], augmentedTable[1][1]));
ratio = ratio*refRatio/altRatio;
return Math.log(ratio);
}
}

View File

@ -51,18 +51,40 @@
package org.broadinstitute.gatk.tools.walkers.annotator;
import htsjdk.samtools.Cigar;
import htsjdk.samtools.CigarElement;
import htsjdk.samtools.CigarOperator;
import htsjdk.samtools.SAMRecord;
import htsjdk.variant.variantcontext.Genotype;
import org.apache.log4j.Logger;
import org.broadinstitute.gatk.tools.walkers.annotator.interfaces.AnnotatorCompatible;
import org.broadinstitute.gatk.tools.walkers.indels.PairHMMIndelErrorModel;
import org.broadinstitute.gatk.tools.walkers.variantutils.CombineGVCFs;
import org.broadinstitute.gatk.tools.walkers.variantutils.GenotypeGVCFs;
import org.broadinstitute.gatk.utils.exceptions.ReviewedGATKException;
import org.broadinstitute.gatk.utils.genotyper.PerReadAlleleLikelihoodMap;
import org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCaller;
import org.broadinstitute.gatk.utils.sam.GATKSAMRecord;
public class AnnotationUtils {
public static final String ANNOTATION_HC_WARN_MSG = " annotation will not be calculated, must be called from HaplotyepCaller";
public static final String ANNOTATION_HC_WARN_MSG = " annotation will not be calculated, must be called from HaplotypeCaller";
public static final int WARNINGS_LOGGED_SIZE = 3;
/**
* Checks if the walker is compatible with allele-specific annotations
*/
public static boolean walkerSupportsAlleleSpecificAnnotations(final AnnotatorCompatible walker) {
return ((walker instanceof HaplotypeCaller) || (walker instanceof CombineGVCFs) || (walker instanceof GenotypeGVCFs));
}
/**
* Checks if the walker should get raw annotation data
*/
public static boolean walkerRequiresRawData(final AnnotatorCompatible walker) {
return ((walker instanceof HaplotypeCaller && ((HaplotypeCaller) walker).emitReferenceConfidence()) || walker instanceof CombineGVCFs);
}
/**
* Checks if the input data is appropriate
*
@ -129,4 +151,98 @@ public class AnnotationUtils {
return true;
}
/**
* Get the position of a variant within a read with respect to the closer end, accounting for hard clipped bases and low quality ends
* Used by ReadPosRankSum annotations
*
* @param read a read containing the variant
* @param initialReadPosition the position based on the modified, post-hard-clipped CIGAR
* @return read position
*/
public static int getFinalVariantReadPosition(final GATKSAMRecord read, final int initialReadPosition) {
final int numAlignedBases = getNumAlignedBases(read);
int readPos = initialReadPosition;
//TODO: this doesn't work for the middle-right position if we index from zero
if (initialReadPosition > numAlignedBases / 2) {
readPos = numAlignedBases - (initialReadPosition + 1);
}
return readPos;
}
/**
*
* @param read a read containing the variant
* @return the number of hard clipped and low qual bases at the read start (where start is the leftmost end w.r.t. the reference)
*/
public static int getNumClippedBasesAtStart(final SAMRecord read) {
// check for hard clips (never consider these bases):
final Cigar c = read.getCigar();
final CigarElement first = c.getCigarElement(0);
int numStartClippedBases = 0;
if (first.getOperator() == CigarOperator.H) {
numStartClippedBases = first.getLength();
}
final byte[] unclippedReadBases = read.getReadBases();
final byte[] unclippedReadQuals = read.getBaseQualities();
// Do a stricter base clipping than provided by CIGAR string, since this one may be too conservative,
// and may leave a string of Q2 bases still hanging off the reads.
//TODO: this code may not even get used because HaplotypeCaller already hard clips low quality tails
for (int i = numStartClippedBases; i < unclippedReadBases.length; i++) {
if (unclippedReadQuals[i] < PairHMMIndelErrorModel.BASE_QUAL_THRESHOLD)
numStartClippedBases++;
else
break;
}
return numStartClippedBases;
}
/**
*
* @param read a read containing the variant
* @return number of non-hard clipped, aligned bases (excluding low quality bases at either end)
*/
//TODO: this is bizarre -- this code counts hard clips, but then subtracts them from the read length, which already doesn't count hard clips
public static int getNumAlignedBases(final GATKSAMRecord read) {
return read.getReadLength() - getNumClippedBasesAtStart(read) - getNumClippedBasesAtEnd(read);
}
/**
*
* @param read a read containing the variant
* @return number of hard clipped and low qual bases at the read end (where end is right end w.r.t. the reference)
*/
public static int getNumClippedBasesAtEnd(final GATKSAMRecord read) {
// check for hard clips (never consider these bases):
final Cigar c = read.getCigar();
CigarElement last = c.getCigarElement(c.numCigarElements() - 1);
int numEndClippedBases = 0;
if (last.getOperator() == CigarOperator.H) {
numEndClippedBases = last.getLength();
}
final byte[] unclippedReadBases = read.getReadBases();
final byte[] unclippedReadQuals = read.getBaseQualities();
// Do a stricter base clipping than provided by CIGAR string, since this one may be too conservative,
// and may leave a string of Q2 bases still hanging off the reads.
//TODO: this code may not even get used because HaplotypeCaller already hard clips low quality tails
for (int i = unclippedReadBases.length - numEndClippedBases - 1; i >= 0; i--) {
if (unclippedReadQuals[i] < PairHMMIndelErrorModel.BASE_QUAL_THRESHOLD)
numEndClippedBases++;
else
break;
}
return numEndClippedBases;
}
}

View File

@ -90,6 +90,7 @@ import java.util.*;
public class ChromosomeCounts extends InfoFieldAnnotation implements StandardAnnotation, ActiveRegionBasedAnnotation {
private Set<String> founderIds = new HashSet<String>();
private boolean didUniquifiedSampleNameCheck = false;
public Map<String, Object> annotate(final RefMetaDataTracker tracker,
final AnnotatorCompatible walker,
@ -99,6 +100,11 @@ public class ChromosomeCounts extends InfoFieldAnnotation implements StandardAnn
final Map<String, PerReadAlleleLikelihoodMap> perReadAlleleLikelihoodMap ) {
if ( ! vc.hasGenotypes() )
return null;
//if none of the "founders" are in the vc samples, assume we uniquified the samples upstream and they are all founders
if (!didUniquifiedSampleNameCheck) {
checkSampleNames(vc);
didUniquifiedSampleNameCheck = true;
}
return VariantContextUtils.calculateChromosomeCounts(vc, new HashMap<String, Object>(), true,founderIds);
}
@ -113,4 +119,21 @@ public class ChromosomeCounts extends InfoFieldAnnotation implements StandardAnn
}
public List<VCFInfoHeaderLine> getDescriptions() { return Arrays.asList(ChromosomeCountConstants.descriptions); }
//this method is intended to reconcile uniquified sample names
// it comes into play when calling this annotation from GenotypeGVCFs with --uniquifySamples because founderIds
// is derived from the sampleDB, which comes from the input sample names, but vc will have uniquified (i.e. different)
// sample names. Without this check, the founderIds won't be found in the vc and the annotation won't be calculated.
protected void checkSampleNames(final VariantContext vc) {
Set<String> vcSamples = new HashSet<>();
vcSamples.addAll(vc.getSampleNames());
if (!vcSamples.isEmpty()) {
if (founderIds!=null) {
vcSamples.retainAll(founderIds);
if (vcSamples.isEmpty())
founderIds = vc.getSampleNames();
}
}
}
}

View File

@ -55,6 +55,7 @@ import org.apache.commons.lang.StringUtils;
import org.apache.log4j.Logger;
import org.broadinstitute.gatk.tools.walkers.annotator.interfaces.AnnotatorCompatible;
import org.broadinstitute.gatk.tools.walkers.annotator.interfaces.GenotypeAnnotation;
import org.broadinstitute.gatk.tools.walkers.annotator.interfaces.StandardAnnotation;
import org.broadinstitute.gatk.utils.contexts.AlignmentContext;
import org.broadinstitute.gatk.utils.contexts.ReferenceContext;
import org.broadinstitute.gatk.utils.genotyper.MostLikelyAllele;

View File

@ -51,9 +51,7 @@
package org.broadinstitute.gatk.tools.walkers.annotator;
import cern.jet.math.Arithmetic;
import htsjdk.variant.variantcontext.GenotypesContext;
import org.apache.log4j.Logger;
import org.broadinstitute.gatk.tools.walkers.annotator.interfaces.ActiveRegionBasedAnnotation;
import org.broadinstitute.gatk.tools.walkers.annotator.interfaces.StandardAnnotation;
import org.broadinstitute.gatk.utils.contexts.AlignmentContext;
@ -90,16 +88,25 @@ import java.util.*;
*/
public class FisherStrand extends StrandBiasTest implements StandardAnnotation, ActiveRegionBasedAnnotation {
private final static boolean ENABLE_DEBUGGING = false;
private final static Logger logger = Logger.getLogger(FisherStrand.class);
private static final double MIN_PVALUE = 1E-320;
private static final int MIN_QUAL_FOR_FILTERED_TEST = 17;
private static final int MIN_COUNT = ARRAY_DIM;
@Override
public List<String> getKeyNames() {
return Collections.singletonList(GATKVCFConstants.FISHER_STRAND_KEY);
}
@Override
public List<VCFInfoHeaderLine> getDescriptions() {
return Collections.singletonList(GATKVCFHeaderLines.getInfoLine(getKeyNames().get(0)));
}
@Override
protected Map<String, Object> calculateAnnotationFromGTfield(final GenotypesContext genotypes){
final int[][] tableFromPerSampleAnnotations = getTableFromSamples( genotypes, MIN_COUNT );
return ( tableFromPerSampleAnnotations != null )? pValueForBestTable(tableFromPerSampleAnnotations, null) : null;
return ( tableFromPerSampleAnnotations != null )? pValueAnnotationForBestTable(tableFromPerSampleAnnotations, null) : null;
}
@Override
@ -107,9 +114,11 @@ public class FisherStrand extends StrandBiasTest implements StandardAnnotation,
final VariantContext vc){
final int[][] tableNoFiltering = getSNPContingencyTable(stratifiedContexts, vc.getReference(), vc.getAlternateAlleles(), -1, MIN_COUNT);
final int[][] tableFiltering = getSNPContingencyTable(stratifiedContexts, vc.getReference(), vc.getAlternateAlleles(), MIN_QUAL_FOR_FILTERED_TEST, MIN_COUNT);
printTable("unfiltered", tableNoFiltering);
printTable("filtered", tableFiltering);
return pValueForBestTable(tableFiltering, tableNoFiltering);
if (ENABLE_DEBUGGING) {
StrandBiasTableUtils.printTable("unfiltered", tableNoFiltering);
StrandBiasTableUtils.printTable("filtered", tableFiltering);
}
return pValueAnnotationForBestTable(tableFiltering, tableNoFiltering);
}
@Override
@ -119,11 +128,9 @@ public class FisherStrand extends StrandBiasTest implements StandardAnnotation,
final int[][] table = getContingencyTable(stratifiedPerReadAlleleLikelihoodMap, vc, MIN_COUNT);
//logger.info("VC " + vc);
//printTable(table, 0.0);
return pValueForBestTable(table, null);
return pValueAnnotationForBestTable(table, null);
}
/**
* Create an annotation for the highest (i.e., least significant) p-value of table1 and table2
*
@ -131,14 +138,14 @@ public class FisherStrand extends StrandBiasTest implements StandardAnnotation,
* @param table2 a contingency table, may be null
* @return annotation result for FS given tables
*/
private Map<String, Object> pValueForBestTable(final int[][] table1, final int[][] table2) {
private Map<String, Object> pValueAnnotationForBestTable(final int[][] table1, final int[][] table2) {
if ( table2 == null )
return table1 == null ? null : annotationForOneTable(pValueForContingencyTable(table1));
return table1 == null ? null : annotationForOneTable(StrandBiasTableUtils.FisherExactPValueForContingencyTable(table1));
else if (table1 == null)
return annotationForOneTable(pValueForContingencyTable(table2));
return annotationForOneTable(StrandBiasTableUtils.FisherExactPValueForContingencyTable(table2));
else { // take the one with the best (i.e., least significant pvalue)
double pvalue1 = pValueForContingencyTable(table1);
double pvalue2 = pValueForContingencyTable(table2);
double pvalue1 = StrandBiasTableUtils.FisherExactPValueForContingencyTable(table1);
double pvalue2 = StrandBiasTableUtils.FisherExactPValueForContingencyTable(table2);
return annotationForOneTable(Math.max(pvalue1, pvalue2));
}
}
@ -153,185 +160,4 @@ public class FisherStrand extends StrandBiasTest implements StandardAnnotation,
final Object value = String.format("%.3f", QualityUtils.phredScaleErrorRate(Math.max(pValue, MIN_PVALUE))); // prevent INFINITYs
return Collections.singletonMap(getKeyNames().get(0), value);
}
@Override
public List<String> getKeyNames() {
return Collections.singletonList(GATKVCFConstants.FISHER_STRAND_KEY);
}
@Override
public List<VCFInfoHeaderLine> getDescriptions() {
return Collections.singletonList(GATKVCFHeaderLines.getInfoLine(getKeyNames().get(0)));
}
/**
* Helper function to turn the FisherStrand table into the SB annotation array
* @param table the table used by the FisherStrand annotation
* @return the array used by the per-sample Strand Bias annotation
*/
public static List<Integer> getContingencyArray( final int[][] table ) {
if(table.length != ARRAY_DIM || table[0].length != ARRAY_DIM) {
logger.warn("Expecting a " + ARRAY_DIM + "x" + ARRAY_DIM + " strand bias table.");
return null;
}
final List<Integer> list = new ArrayList<>(ARRAY_SIZE); // TODO - if we ever want to do something clever with multi-allelic sites this will need to change
list.add(table[0][0]);
list.add(table[0][1]);
list.add(table[1][0]);
list.add(table[1][1]);
return list;
}
public static Double pValueForContingencyTable(int[][] originalTable) {
final int[][] normalizedTable = normalizeContingencyTable(originalTable);
int[][] table = copyContingencyTable(normalizedTable);
double pCutoff = computePValue(table);
//printTable(table, pCutoff);
double pValue = pCutoff;
while (rotateTable(table)) {
double pValuePiece = computePValue(table);
//printTable(table, pValuePiece);
if (pValuePiece <= pCutoff) {
pValue += pValuePiece;
}
}
table = copyContingencyTable(normalizedTable);
while (unrotateTable(table)) {
double pValuePiece = computePValue(table);
//printTable(table, pValuePiece);
if (pValuePiece <= pCutoff) {
pValue += pValuePiece;
}
}
//System.out.printf("P-cutoff: %f\n", pCutoff);
//System.out.printf("P-value: %f\n\n", pValue);
// min is necessary as numerical precision can result in pValue being slightly greater than 1.0
return Math.min(pValue, 1.0);
}
// how large do we want the normalized table to be?
private static final double TARGET_TABLE_SIZE = 200.0;
/**
* Normalize the table so that the entries are not too large.
* Note that this method does NOT necessarily make a copy of the table being passed in!
*
* @param table the original table
* @return a normalized version of the table or the original table if it is already normalized
*/
private static int[][] normalizeContingencyTable(final int[][] table) {
final int sum = table[0][0] + table[0][1] + table[1][0] + table[1][1];
if ( sum <= TARGET_TABLE_SIZE * 2 )
return table;
final double normalizationFactor = (double)sum / TARGET_TABLE_SIZE;
final int[][] normalized = new int[ARRAY_DIM][ARRAY_DIM];
for ( int i = 0; i < ARRAY_DIM; i++ ) {
for ( int j = 0; j < ARRAY_DIM; j++ )
normalized[i][j] = (int)(table[i][j] / normalizationFactor);
}
return normalized;
}
private static int [][] copyContingencyTable(int [][] t) {
int[][] c = new int[ARRAY_DIM][ARRAY_DIM];
for ( int i = 0; i < ARRAY_DIM; i++ )
for ( int j = 0; j < ARRAY_DIM; j++ )
c[i][j] = t[i][j];
return c;
}
private static void printTable(int[][] table, double pValue) {
logger.info(String.format("%d %d; %d %d : %f", table[0][0], table[0][1], table[1][0], table[1][1], pValue));
}
/**
* Printing information to logger.info for debugging purposes
*
* @param name the name of the table
* @param table the table itself
*/
private void printTable(final String name, final int[][] table) {
if ( ENABLE_DEBUGGING ) {
final String pValue = (String)annotationForOneTable(pValueForContingencyTable(table)).get(getKeyNames().get(0));
logger.info(String.format("FS %s (REF+, REF-, ALT+, ALT-) = (%d, %d, %d, %d) = %s",
name, table[0][0], table[0][1], table[1][0], table[1][1], pValue));
}
}
private static boolean rotateTable(int[][] table) {
table[0][0]--;
table[1][0]++;
table[0][1]++;
table[1][1]--;
return (table[0][0] >= 0 && table[1][1] >= 0);
}
private static boolean unrotateTable(int[][] table) {
table[0][0]++;
table[1][0]--;
table[0][1]--;
table[1][1]++;
return (table[0][1] >= 0 && table[1][0] >= 0);
}
private static double computePValue(int[][] table) {
int[] rowSums = { sumRow(table, 0), sumRow(table, 1) };
int[] colSums = { sumColumn(table, 0), sumColumn(table, 1) };
int N = rowSums[0] + rowSums[1];
// calculate in log space so we don't die with high numbers
double pCutoff = Arithmetic.logFactorial(rowSums[0])
+ Arithmetic.logFactorial(rowSums[1])
+ Arithmetic.logFactorial(colSums[0])
+ Arithmetic.logFactorial(colSums[1])
- Arithmetic.logFactorial(table[0][0])
- Arithmetic.logFactorial(table[0][1])
- Arithmetic.logFactorial(table[1][0])
- Arithmetic.logFactorial(table[1][1])
- Arithmetic.logFactorial(N);
return Math.exp(pCutoff);
}
private static int sumRow(int[][] table, int column) {
int sum = 0;
for (int r = 0; r < table.length; r++) {
sum += table[r][column];
}
return sum;
}
private static int sumColumn(int[][] table, int row) {
int sum = 0;
for (int c = 0; c < table[row].length; c++) {
sum += table[row][c];
}
return sum;
}
}

View File

@ -0,0 +1,248 @@
/*
* By downloading the PROGRAM you agree to the following terms of use:
*
* BROAD INSTITUTE
* SOFTWARE LICENSE AGREEMENT
* FOR ACADEMIC NON-COMMERCIAL RESEARCH PURPOSES ONLY
*
* This Agreement is made between the Broad Institute, Inc. with a principal address at 415 Main Street, Cambridge, MA 02142 (BROAD) and the LICENSEE and is effective at the date the downloading is completed (EFFECTIVE DATE).
*
* WHEREAS, LICENSEE desires to license the PROGRAM, as defined hereinafter, and BROAD wishes to have this PROGRAM utilized in the public interest, subject only to the royalty-free, nonexclusive, nontransferable license rights of the United States Government pursuant to 48 CFR 52.227-14; and
* WHEREAS, LICENSEE desires to license the PROGRAM and BROAD desires to grant a license on the following terms and conditions.
* NOW, THEREFORE, in consideration of the promises and covenants made herein, the parties hereto agree as follows:
*
* 1. DEFINITIONS
* 1.1 PROGRAM shall mean copyright in the object code and source code known as GATK3 and related documentation, if any, as they exist on the EFFECTIVE DATE and can be downloaded from http://www.broadinstitute.org/gatk on the EFFECTIVE DATE.
*
* 2. LICENSE
* 2.1 Grant. Subject to the terms of this Agreement, BROAD hereby grants to LICENSEE, solely for academic non-commercial research purposes, a non-exclusive, non-transferable license to: (a) download, execute and display the PROGRAM and (b) create bug fixes and modify the PROGRAM. LICENSEE hereby automatically grants to BROAD a non-exclusive, royalty-free, irrevocable license to any LICENSEE bug fixes or modifications to the PROGRAM with unlimited rights to sublicense and/or distribute. LICENSEE agrees to provide any such modifications and bug fixes to BROAD promptly upon their creation.
* The LICENSEE may apply the PROGRAM in a pipeline to data owned by users other than the LICENSEE and provide these users the results of the PROGRAM provided LICENSEE does so for academic non-commercial purposes only. For clarification purposes, academic sponsored research is not a commercial use under the terms of this Agreement.
* 2.2 No Sublicensing or Additional Rights. LICENSEE shall not sublicense or distribute the PROGRAM, in whole or in part, without prior written permission from BROAD. LICENSEE shall ensure that all of its users agree to the terms of this Agreement. LICENSEE further agrees that it shall not put the PROGRAM on a network, server, or other similar technology that may be accessed by anyone other than the LICENSEE and its employees and users who have agreed to the terms of this agreement.
* 2.3 License Limitations. Nothing in this Agreement shall be construed to confer any rights upon LICENSEE by implication, estoppel, or otherwise to any computer software, trademark, intellectual property, or patent rights of BROAD, or of any other entity, except as expressly granted herein. LICENSEE agrees that the PROGRAM, in whole or part, shall not be used for any commercial purpose, including without limitation, as the basis of a commercial software or hardware product or to provide services. LICENSEE further agrees that the PROGRAM shall not be copied or otherwise adapted in order to circumvent the need for obtaining a license for use of the PROGRAM.
*
* 3. PHONE-HOME FEATURE
* LICENSEE expressly acknowledges that the PROGRAM contains an embedded automatic reporting system (PHONE-HOME) which is enabled by default upon download. Unless LICENSEE requests disablement of PHONE-HOME, LICENSEE agrees that BROAD may collect limited information transmitted by PHONE-HOME regarding LICENSEE and its use of the PROGRAM. Such information shall include LICENSEES user identification, version number of the PROGRAM and tools being run, mode of analysis employed, and any error reports generated during run-time. Collection of such information is used by BROAD solely to monitor usage rates, fulfill reporting requirements to BROAD funding agencies, drive improvements to the PROGRAM, and facilitate adjustments to PROGRAM-related documentation.
*
* 4. OWNERSHIP OF INTELLECTUAL PROPERTY
* LICENSEE acknowledges that title to the PROGRAM shall remain with BROAD. The PROGRAM is marked with the following BROAD copyright notice and notice of attribution to contributors. LICENSEE shall retain such notice on all copies. LICENSEE agrees to include appropriate attribution if any results obtained from use of the PROGRAM are included in any publication.
* Copyright 2012-2015 Broad Institute, Inc.
* Notice of attribution: The GATK3 program was made available through the generosity of Medical and Population Genetics program at the Broad Institute, Inc.
* LICENSEE shall not use any trademark or trade name of BROAD, or any variation, adaptation, or abbreviation, of such marks or trade names, or any names of officers, faculty, students, employees, or agents of BROAD except as states above for attribution purposes.
*
* 5. INDEMNIFICATION
* LICENSEE shall indemnify, defend, and hold harmless BROAD, and their respective officers, faculty, students, employees, associated investigators and agents, and their respective successors, heirs and assigns, (Indemnitees), against any liability, damage, loss, or expense (including reasonable attorneys fees and expenses) incurred by or imposed upon any of the Indemnitees in connection with any claims, suits, actions, demands or judgments arising out of any theory of liability (including, without limitation, actions in the form of tort, warranty, or strict liability and regardless of whether such action has any factual basis) pursuant to any right or license granted under this Agreement.
*
* 6. NO REPRESENTATIONS OR WARRANTIES
* THE PROGRAM IS DELIVERED AS IS. BROAD MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND CONCERNING THE PROGRAM OR THE COPYRIGHT, EXPRESS OR IMPLIED, INCLUDING, WITHOUT LIMITATION, WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, NONINFRINGEMENT, OR THE ABSENCE OF LATENT OR OTHER DEFECTS, WHETHER OR NOT DISCOVERABLE. BROAD EXTENDS NO WARRANTIES OF ANY KIND AS TO PROGRAM CONFORMITY WITH WHATEVER USER MANUALS OR OTHER LITERATURE MAY BE ISSUED FROM TIME TO TIME.
* IN NO EVENT SHALL BROAD OR ITS RESPECTIVE DIRECTORS, OFFICERS, EMPLOYEES, AFFILIATED INVESTIGATORS AND AFFILIATES BE LIABLE FOR INCIDENTAL OR CONSEQUENTIAL DAMAGES OF ANY KIND, INCLUDING, WITHOUT LIMITATION, ECONOMIC DAMAGES OR INJURY TO PROPERTY AND LOST PROFITS, REGARDLESS OF WHETHER BROAD SHALL BE ADVISED, SHALL HAVE OTHER REASON TO KNOW, OR IN FACT SHALL KNOW OF THE POSSIBILITY OF THE FOREGOING.
*
* 7. ASSIGNMENT
* This Agreement is personal to LICENSEE and any rights or obligations assigned by LICENSEE without the prior written consent of BROAD shall be null and void.
*
* 8. MISCELLANEOUS
* 8.1 Export Control. LICENSEE gives assurance that it will comply with all United States export control laws and regulations controlling the export of the PROGRAM, including, without limitation, all Export Administration Regulations of the United States Department of Commerce. Among other things, these laws and regulations prohibit, or require a license for, the export of certain types of software to specified countries.
* 8.2 Termination. LICENSEE shall have the right to terminate this Agreement for any reason upon prior written notice to BROAD. If LICENSEE breaches any provision hereunder, and fails to cure such breach within thirty (30) days, BROAD may terminate this Agreement immediately. Upon termination, LICENSEE shall provide BROAD with written assurance that the original and all copies of the PROGRAM have been destroyed, except that, upon prior written authorization from BROAD, LICENSEE may retain a copy for archive purposes.
* 8.3 Survival. The following provisions shall survive the expiration or termination of this Agreement: Articles 1, 3, 4, 5 and Sections 2.2, 2.3, 7.3, and 7.4.
* 8.4 Notice. Any notices under this Agreement shall be in writing, shall specifically refer to this Agreement, and shall be sent by hand, recognized national overnight courier, confirmed facsimile transmission, confirmed electronic mail, or registered or certified mail, postage prepaid, return receipt requested. All notices under this Agreement shall be deemed effective upon receipt.
* 8.5 Amendment and Waiver; Entire Agreement. This Agreement may be amended, supplemented, or otherwise modified only by means of a written instrument signed by all parties. Any waiver of any rights or failure to act in a specific instance shall relate only to such instance and shall not be construed as an agreement to waive any rights or fail to act in any other instance, whether or not similar. This Agreement constitutes the entire agreement among the parties with respect to its subject matter and supersedes prior agreements or understandings between the parties relating to its subject matter.
* 8.6 Binding Effect; Headings. This Agreement shall be binding upon and inure to the benefit of the parties and their respective permitted successors and assigns. All headings are for convenience only and shall not affect the meaning of any provision of this Agreement.
* 8.7 Governing Law. This Agreement shall be construed, governed, interpreted and applied in accordance with the internal laws of the Commonwealth of Massachusetts, U.S.A., without regard to conflict of laws principles.
*/
package org.broadinstitute.gatk.tools.walkers.annotator;
import htsjdk.variant.variantcontext.Allele;
import htsjdk.variant.variantcontext.Genotype;
import htsjdk.variant.variantcontext.VariantContext;
import htsjdk.variant.vcf.VCFConstants;
import htsjdk.variant.vcf.VCFHeaderLine;
import htsjdk.variant.vcf.VCFInfoHeaderLine;
import org.broadinstitute.gatk.engine.GenomeAnalysisEngine;
import org.broadinstitute.gatk.tools.walkers.annotator.interfaces.AnnotatorCompatible;
import org.broadinstitute.gatk.tools.walkers.annotator.interfaces.InfoFieldAnnotation;
import org.broadinstitute.gatk.tools.walkers.annotator.interfaces.ReducibleAnnotation;
import org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCaller;
import org.broadinstitute.gatk.tools.walkers.variantutils.CombineGVCFs;
import org.broadinstitute.gatk.utils.QualityUtils;
import org.broadinstitute.gatk.utils.contexts.AlignmentContext;
import org.broadinstitute.gatk.utils.contexts.ReferenceContext;
import org.broadinstitute.gatk.utils.genotyper.PerReadAlleleLikelihoodMap;
import org.broadinstitute.gatk.utils.pileup.PileupElement;
import org.broadinstitute.gatk.utils.refdata.RefMetaDataTracker;
import org.broadinstitute.gatk.utils.sam.GATKSAMRecord;
import org.broadinstitute.gatk.utils.variant.GATKVCFHeaderLines;
import java.util.*;
/**
* Abstract root for all RankSum-based annotations
*/
public abstract class RMSAnnotation extends InfoFieldAnnotation implements ReducibleAnnotation {
protected AnnotatorCompatible callingWalker;
@Override
public void initialize(final AnnotatorCompatible walker, final GenomeAnalysisEngine toolkit, final Set<VCFHeaderLine> headerLines) {
callingWalker = walker;
}
@Override
public List<VCFInfoHeaderLine> getDescriptions() {
final List<VCFInfoHeaderLine> headerLines = new ArrayList<>();
//ideally only HC in GVCF mode would get the raw header line, but that's a little more complicated
if (callingWalker instanceof HaplotypeCaller || callingWalker instanceof CombineGVCFs)
headerLines.add(GATKVCFHeaderLines.getInfoLine(getRawKeyName()));
headerLines.add(GATKVCFHeaderLines.getInfoLine(getKeyNames().get(0)));
return headerLines;
}
@Override
public Map<String, Object> annotate(final RefMetaDataTracker tracker,
final AnnotatorCompatible walker,
final ReferenceContext ref,
final Map<String, AlignmentContext> stratifiedContexts,
final VariantContext vc,
final Map<String, PerReadAlleleLikelihoodMap> perReadAlleleLikelihoodMap ) {
if ( (stratifiedContexts == null || stratifiedContexts.isEmpty()) && perReadAlleleLikelihoodMap == null)
return null;
final Map<String, Object> annotations = new HashMap<>();
final ReducibleAnnotationData<Number> myData = new ReducibleAnnotationData<>(null);
calculateRawData(stratifiedContexts, perReadAlleleLikelihoodMap, myData);
final String annotationString = makeFinalizedAnnotationString(vc, myData.getAttributeMap(), stratifiedContexts, perReadAlleleLikelihoodMap);
annotations.put(getKeyNames().get(0), annotationString);
return annotations;
}
public Map<String, Object> annotateRawData(final RefMetaDataTracker tracker,
final AnnotatorCompatible walker,
final ReferenceContext ref,
final Map<String, AlignmentContext> stratifiedContexts,
final VariantContext vc,
final Map<String, PerReadAlleleLikelihoodMap> perReadAlleleLikelihoodMap ) {
if ( perReadAlleleLikelihoodMap == null)
return new HashMap<>();
final Map<String, Object> annotations = new HashMap<>();
ReducibleAnnotationData<Number> myData = new ReducibleAnnotationData<>(null);
calculateRawData(vc, perReadAlleleLikelihoodMap, myData);
String annotationString = makeRawAnnotationString(vc.getAlleles(), myData.getAttributeMap());
annotations.put(getRawKeyName(), annotationString);
return annotations;
}
@Override
public Map<String, Object> combineRawData(final List<Allele> vcAlleles, final List<? extends ReducibleAnnotationData> annotationList) {
//VC already contains merged alleles from ReferenceConfidenceVariantContextMerger
ReducibleAnnotationData combinedData = new ReducibleAnnotationData(null);
for (final ReducibleAnnotationData currentValue : annotationList) {
parseRawDataString(currentValue);
combineAttributeMap(currentValue, combinedData);
}
final Map<String, Object> annotations = new HashMap<>();
String annotationString = makeRawAnnotationString(vcAlleles, combinedData.getAttributeMap());
annotations.put(getRawKeyName(), annotationString);
return annotations;
}
@Override
public Map<String, Object> finalizeRawData(final VariantContext vc, final VariantContext originalVC) {
if (!vc.hasAttribute(getRawKeyName()))
return new HashMap<>();
String rawMQdata = vc.getAttributeAsString(getRawKeyName(),null);
if (rawMQdata == null)
return new HashMap<>();
ReducibleAnnotationData myData = new ReducibleAnnotationData(rawMQdata);
parseRawDataString(myData);
String annotationString = makeFinalizedAnnotationString(vc, myData.getAttributeMap());
return Collections.singletonMap(getKeyNames().get(0), (Object)annotationString);
}
protected void parseRawDataString(ReducibleAnnotationData<Number> myData) {
final String rawDataString = myData.getRawData();
String[] rawMQdataAsStringVector;
rawMQdataAsStringVector = rawDataString.split(",");
double squareSum = Double.parseDouble(rawMQdataAsStringVector[0]);
myData.putAttribute(Allele.NO_CALL, squareSum);
}
public void combineAttributeMap(ReducibleAnnotationData<Number> toAdd, ReducibleAnnotationData<Number> combined) {
if (combined.getAttribute(Allele.NO_CALL) != null)
combined.putAttribute(Allele.NO_CALL, (Double) combined.getAttribute(Allele.NO_CALL) + (Double) toAdd.getAttribute(Allele.NO_CALL));
else
combined.putAttribute(Allele.NO_CALL, toAdd.getAttribute(Allele.NO_CALL));
}
//Implementations of this method should return a string consisting of the sum of the squared values for the attribute being annotated (or a delimited list of those if allele-specific)
abstract protected String makeRawAnnotationString(List<Allele> vcAlleles, Map<Allele,Number> sumOfSquares);
//Implementations of this method should return a string with the finalized annotation value as will appear in the INFO field
abstract protected String makeFinalizedAnnotationString(VariantContext vc, Map<Allele, Number> sumOfSquares);
//Implementations of this method should return a string with the finalized annotation value as will appear in the INFO field
abstract protected String makeFinalizedAnnotationString(VariantContext vc, Map<Allele, Number> sumOfSquares, Map<String, AlignmentContext> stratifiedContexts, final Map<String, PerReadAlleleLikelihoodMap> perReadAlleleLikelihoodMap);
protected void calculateRawData(final Map<String, AlignmentContext> stratifiedContexts,
final Map<String, PerReadAlleleLikelihoodMap> perReadAlleleLikelihoodMap,
final ReducibleAnnotationData myData) {
if (perReadAlleleLikelihoodMap != null) {
calculateRawData((VariantContext) null, perReadAlleleLikelihoodMap, myData);
}
}
/**
*
* @param vc
* @param perReadAlleleLikelihoodMap
* @param stratifiedContexts
* @return the number of reads at the vc position (-1 if all read data is null)
*/
public int getNumOfReads(final VariantContext vc,
final Map<String, PerReadAlleleLikelihoodMap> perReadAlleleLikelihoodMap,
final Map<String, AlignmentContext> stratifiedContexts) {
//don't use the full depth because we don't calculate MQ for reference blocks
int numOfReads = 0;
if(vc.hasAttribute(VCFConstants.DEPTH_KEY)) {
numOfReads += Integer.parseInt(vc.getAttributeAsString(VCFConstants.DEPTH_KEY, "-1"));
if(vc.hasGenotypes()) {
for(Genotype gt : vc.getGenotypes()) {
if(gt.isHomRef() && gt.hasExtendedAttribute("MIN_DP")) //site-level DP contribution will come from MIN_DP for gVCF-called reference variants
numOfReads -= Integer.parseInt(gt.getExtendedAttribute("MIN_DP").toString());
}
}
return numOfReads;
}
else if (stratifiedContexts != null && !stratifiedContexts.isEmpty()) {
for ( final Map.Entry<String, AlignmentContext> sample : stratifiedContexts.entrySet() ) {
final AlignmentContext context = sample.getValue();
for ( final PileupElement p : context.getBasePileup() ) {
int mq = p.getRead().getMappingQuality();
if ( mq != QualityUtils.MAPPING_QUALITY_UNAVAILABLE ) {
numOfReads++;
}
}
}
return numOfReads;
}
else if (perReadAlleleLikelihoodMap != null && !perReadAlleleLikelihoodMap.isEmpty())
{
for ( final PerReadAlleleLikelihoodMap perReadLikelihoods : perReadAlleleLikelihoodMap.values() ) {
for ( final GATKSAMRecord read : perReadLikelihoods.getStoredElements() ) {
int mq = read.getMappingQuality();
if ( mq != QualityUtils.MAPPING_QUALITY_UNAVAILABLE ) {
numOfReads++;
}
}
}
return numOfReads;
}
return -1;
}
}

View File

@ -51,22 +51,21 @@
package org.broadinstitute.gatk.tools.walkers.annotator;
import org.broadinstitute.gatk.utils.contexts.AlignmentContext;
import org.broadinstitute.gatk.utils.contexts.ReferenceContext;
import org.broadinstitute.gatk.utils.refdata.RefMetaDataTracker;
import org.broadinstitute.gatk.tools.walkers.annotator.interfaces.ActiveRegionBasedAnnotation;
import org.broadinstitute.gatk.tools.walkers.annotator.interfaces.AnnotatorCompatible;
import org.broadinstitute.gatk.tools.walkers.annotator.interfaces.InfoFieldAnnotation;
import org.broadinstitute.gatk.tools.walkers.annotator.interfaces.StandardAnnotation;
import org.broadinstitute.gatk.utils.genotyper.PerReadAlleleLikelihoodMap;
import org.broadinstitute.gatk.utils.MathUtils;
import org.broadinstitute.gatk.utils.QualityUtils;
import htsjdk.variant.variantcontext.Allele;
import htsjdk.variant.variantcontext.VariantContext;
import htsjdk.variant.vcf.VCFConstants;
import htsjdk.variant.vcf.VCFInfoHeaderLine;
import htsjdk.variant.vcf.VCFStandardHeaderLines;
import org.broadinstitute.gatk.tools.walkers.annotator.interfaces.*;
import org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCaller;
import org.broadinstitute.gatk.tools.walkers.variantutils.CombineGVCFs;
import org.broadinstitute.gatk.utils.QualityUtils;
import org.broadinstitute.gatk.utils.contexts.AlignmentContext;
import org.broadinstitute.gatk.utils.genotyper.PerReadAlleleLikelihoodMap;
import org.broadinstitute.gatk.utils.pileup.PileupElement;
import org.broadinstitute.gatk.utils.sam.GATKSAMRecord;
import htsjdk.variant.variantcontext.VariantContext;
import org.broadinstitute.gatk.utils.variant.GATKVCFConstants;
import org.broadinstitute.gatk.utils.variant.GATKVCFHeaderLines;
import java.util.*;
@ -74,7 +73,9 @@ import java.util.*;
/**
* Root Mean Square of the mapping quality of reads across all samples.
*
* <p>This annotation provides an estimation of the overall mapping quality of reads supporting a variant call, averaged over all samples in a cohort.</p>
* <p>This annotation provides an estimation of the overall mapping quality of reads supporting a variant call. It produce both raw data (sum of square and num of total reads) and the calculated root mean square.</p>
*
* The raw data is used to accurately calculate the root mean square when combining more than one sample.
*
* <h3>Statistical notes</h3>
* <p>The root mean square is equivalent to the mean of the mapping qualities plus the standard deviation of the mapping qualities.</p>
@ -85,51 +86,89 @@ import java.util.*;
* </ul>
*
*/
public class RMSMappingQuality extends InfoFieldAnnotation implements StandardAnnotation, ActiveRegionBasedAnnotation {
public class RMSMappingQuality extends RMSAnnotation implements StandardAnnotation, ActiveRegionBasedAnnotation, ReducibleAnnotation {
public Map<String, Object> annotate(final RefMetaDataTracker tracker,
final AnnotatorCompatible walker,
final ReferenceContext ref,
final Map<String, AlignmentContext> stratifiedContexts,
final VariantContext vc,
final Map<String, PerReadAlleleLikelihoodMap> perReadAlleleLikelihoodMap ) {
@Override //this needs an override because MQ is a VCF standard so it's headerline is in a different place
public List<VCFInfoHeaderLine> getDescriptions() {
final List<VCFInfoHeaderLine> headerLines = new ArrayList<>();
//only HC in GVCF mode should get the raw header line
if ((callingWalker instanceof HaplotypeCaller && ((HaplotypeCaller) callingWalker).emitReferenceConfidence()) || callingWalker instanceof CombineGVCFs)
headerLines.add(GATKVCFHeaderLines.getInfoLine(getRawKeyName()));
headerLines.add(VCFStandardHeaderLines.getInfoLine(getKeyNames().get(0)));
return headerLines;
}
final List<Integer> qualities = new ArrayList<>();
public List<String> getKeyNames() { return Arrays.asList(
VCFConstants.RMS_MAPPING_QUALITY_KEY);
}
public String getRawKeyName() { return GATKVCFConstants.RAW_RMS_MAPPING_QUALITY_KEY;}
@Override
public void calculateRawData(final VariantContext vc, final Map<String, PerReadAlleleLikelihoodMap> pralm, final ReducibleAnnotationData rawAnnotations) {
Double squareSum = 0.0;
if ( pralm.size() == 0 )
return;
for ( final PerReadAlleleLikelihoodMap perReadLikelihoods : pralm.values() ) {
for ( final GATKSAMRecord read : perReadLikelihoods.getStoredElements() ) {
int mq = read.getMappingQuality();
if ( mq != QualityUtils.MAPPING_QUALITY_UNAVAILABLE ) {
squareSum += mq * mq;
}
}
}
rawAnnotations.putAttribute(Allele.NO_CALL,squareSum);
}
//this version applies to non-HaplotypeCaller annotators
@Override
protected void calculateRawData(final Map<String, AlignmentContext> stratifiedContexts,
final Map<String, PerReadAlleleLikelihoodMap> perReadAlleleLikelihoodMap,
final ReducibleAnnotationData myData) {
Double squareSum = 0.0;
if ( stratifiedContexts != null ) {
if ( stratifiedContexts.size() == 0 )
return null;
return;
for ( final Map.Entry<String, AlignmentContext> sample : stratifiedContexts.entrySet() ) {
final AlignmentContext context = sample.getValue();
for ( final PileupElement p : context.getBasePileup() )
fillMappingQualitiesFromPileup(p.getRead().getMappingQuality(), qualities);
for ( final PileupElement p : context.getBasePileup() ) {
int mq = p.getRead().getMappingQuality();
if ( mq != QualityUtils.MAPPING_QUALITY_UNAVAILABLE ) {
squareSum += mq * mq;
}
}
}
myData.putAttribute(Allele.NO_CALL,squareSum);
}
else if (perReadAlleleLikelihoodMap != null) {
if ( perReadAlleleLikelihoodMap.size() == 0 )
return null;
for ( final PerReadAlleleLikelihoodMap perReadLikelihoods : perReadAlleleLikelihoodMap.values() ) {
for ( final GATKSAMRecord read : perReadLikelihoods.getStoredElements() )
fillMappingQualitiesFromPileup(read.getMappingQuality(), qualities);
}
}
else
return null;
final double rms = MathUtils.rms(qualities);
return Collections.singletonMap(getKeyNames().get(0), (Object)String.format("%.2f", rms));
}
private static void fillMappingQualitiesFromPileup(final int mq, final List<Integer> qualities) {
if ( mq != QualityUtils.MAPPING_QUALITY_UNAVAILABLE ) {
qualities.add(mq);
calculateRawData((VariantContext) null, perReadAlleleLikelihoodMap, myData);
}
}
public List<String> getKeyNames() { return Arrays.asList(VCFConstants.RMS_MAPPING_QUALITY_KEY); }
public List<VCFInfoHeaderLine> getDescriptions() {
return Arrays.asList(VCFStandardHeaderLines.getInfoLine(getKeyNames().get(0)));
@Override
public String makeRawAnnotationString(final List<Allele> vcAlleles, final Map<Allele, Number> perAlleleData) {
return String.format("%.2f", perAlleleData.get(Allele.NO_CALL));
}
@Override
public String makeFinalizedAnnotationString(final VariantContext vc, final Map<Allele, Number> perAlleleData, final Map<String, AlignmentContext> stratifiedContexts, final Map<String, PerReadAlleleLikelihoodMap> perReadAlleleLikelihoodMap) {
if ((stratifiedContexts != null && !stratifiedContexts.isEmpty()) || perReadAlleleLikelihoodMap != null) {
int numOfReads = getNumOfReads(vc, perReadAlleleLikelihoodMap, stratifiedContexts);
return String.format("%.2f", Math.sqrt((double) perAlleleData.get(Allele.NO_CALL) / numOfReads));
}
else {
return makeFinalizedAnnotationString(vc, perAlleleData);
}
}
@Override
public String makeFinalizedAnnotationString(final VariantContext vc, final Map<Allele, Number> perAlleleData) {
int numOfReads = getNumOfReads(vc, null, null);
return String.format("%.2f", Math.sqrt((double)perAlleleData.get(Allele.NO_CALL)/numOfReads));
}
}

View File

@ -52,12 +52,10 @@
package org.broadinstitute.gatk.tools.walkers.annotator;
import org.broadinstitute.gatk.engine.GenomeAnalysisEngine;
import org.broadinstitute.gatk.tools.walkers.annotator.interfaces.*;
import org.broadinstitute.gatk.utils.contexts.AlignmentContext;
import org.broadinstitute.gatk.utils.contexts.ReferenceContext;
import org.broadinstitute.gatk.utils.refdata.RefMetaDataTracker;
import org.broadinstitute.gatk.tools.walkers.annotator.interfaces.ActiveRegionBasedAnnotation;
import org.broadinstitute.gatk.tools.walkers.annotator.interfaces.AnnotatorCompatible;
import org.broadinstitute.gatk.tools.walkers.annotator.interfaces.InfoFieldAnnotation;
import org.broadinstitute.gatk.utils.genotyper.MostLikelyAllele;
import org.broadinstitute.gatk.utils.genotyper.PerReadAlleleLikelihoodMap;
import org.broadinstitute.gatk.utils.MannWhitneyU;
@ -76,11 +74,12 @@ import java.util.*;
/**
* Abstract root for all RankSum based annotations
* Abstract root for all RankSum-based annotations
*/
//TODO: will eventually implement ReducibleAnnotation in order to preserve accuracy for CombineGVCFs and GenotypeGVCFs -- see RMSAnnotation.java for an example of an abstract ReducibleAnnotation
public abstract class RankSumTest extends InfoFieldAnnotation implements ActiveRegionBasedAnnotation {
static final boolean DEBUG = false;
private boolean useDithering = true;
protected boolean useDithering = true;
public Map<String, Object> annotate(final RefMetaDataTracker tracker,
final AnnotatorCompatible walker,

View File

@ -51,12 +51,7 @@
package org.broadinstitute.gatk.tools.walkers.annotator;
import htsjdk.samtools.Cigar;
import htsjdk.samtools.CigarElement;
import htsjdk.samtools.CigarOperator;
import htsjdk.samtools.SAMRecord;
import org.broadinstitute.gatk.tools.walkers.annotator.interfaces.StandardAnnotation;
import org.broadinstitute.gatk.tools.walkers.indels.PairHMMIndelErrorModel;
import htsjdk.variant.vcf.VCFInfoHeaderLine;
import org.broadinstitute.gatk.utils.pileup.PileupElement;
import org.broadinstitute.gatk.utils.sam.AlignmentUtils;
@ -109,7 +104,7 @@ public class ReadPosRankSumTest extends RankSumTest implements StandardAnnotatio
@Override
protected Double getElementForPileupElement(final PileupElement p) {
final int offset = AlignmentUtils.calcAlignmentByteArrayOffset(p.getRead().getCigar(), p, 0, 0);
return (double)getFinalReadPosition(p.getRead(), offset);
return (double)AnnotationUtils.getFinalVariantReadPosition(p.getRead(), offset);
}
@Override
@ -122,69 +117,5 @@ public class ReadPosRankSumTest extends RankSumTest implements StandardAnnotatio
return super.isUsableRead(read, refLoc) && read.getSoftStart() + read.getCigar().getReadLength() > refLoc;
}
private int getFinalReadPosition(final GATKSAMRecord read, final int initialReadPosition) {
final int numAlignedBases = getNumAlignedBases(read);
int readPos = initialReadPosition;
if (initialReadPosition > numAlignedBases / 2) {
readPos = numAlignedBases - (initialReadPosition + 1);
}
return readPos;
}
private int getNumClippedBasesAtStart(final SAMRecord read) {
// compute total number of clipped bases (soft or hard clipped)
// check for hard clips (never consider these bases):
final Cigar c = read.getCigar();
final CigarElement first = c.getCigarElement(0);
int numStartClippedBases = 0;
if (first.getOperator() == CigarOperator.H) {
numStartClippedBases = first.getLength();
}
final byte[] unclippedReadBases = read.getReadBases();
final byte[] unclippedReadQuals = read.getBaseQualities();
// Do a stricter base clipping than provided by CIGAR string, since this one may be too conservative,
// and may leave a string of Q2 bases still hanging off the reads.
for (int i = numStartClippedBases; i < unclippedReadBases.length; i++) {
if (unclippedReadQuals[i] < PairHMMIndelErrorModel.BASE_QUAL_THRESHOLD)
numStartClippedBases++;
else
break;
}
return numStartClippedBases;
}
private int getNumAlignedBases(final GATKSAMRecord read) {
return read.getReadLength() - getNumClippedBasesAtStart(read) - getNumClippedBasesAtEnd(read);
}
private int getNumClippedBasesAtEnd(final GATKSAMRecord read) {
// compute total number of clipped bases (soft or hard clipped)
// check for hard clips (never consider these bases):
final Cigar c = read.getCigar();
CigarElement last = c.getCigarElement(c.numCigarElements() - 1);
int numEndClippedBases = 0;
if (last.getOperator() == CigarOperator.H) {
numEndClippedBases = last.getLength();
}
final byte[] unclippedReadBases = read.getReadBases();
final byte[] unclippedReadQuals = read.getBaseQualities();
// Do a stricter base clipping than provided by CIGAR string, since this one may be too conservative,
// and may leave a string of Q2 bases still hanging off the reads.
for (int i = unclippedReadBases.length - numEndClippedBases - 1; i >= 0; i--) {
if (unclippedReadQuals[i] < PairHMMIndelErrorModel.BASE_QUAL_THRESHOLD)
numEndClippedBases++;
else
break;
}
return numEndClippedBases;
}
}

View File

@ -52,7 +52,6 @@
package org.broadinstitute.gatk.tools.walkers.annotator;
import org.apache.log4j.Logger;
import org.apache.commons.lang.StringUtils;
import org.broadinstitute.gatk.utils.contexts.AlignmentContext;
import org.broadinstitute.gatk.utils.contexts.ReferenceContext;
import org.broadinstitute.gatk.utils.refdata.RefMetaDataTracker;
@ -117,7 +116,7 @@ public class StrandBiasBySample extends GenotypeAnnotation {
final int[][] table = FisherStrand.getContingencyTable(Collections.singletonMap(g.getSampleName(), alleleLikelihoodMap), vc, 0);
gb.attribute(GATKVCFConstants.STRAND_BIAS_BY_SAMPLE_KEY, FisherStrand.getContingencyArray(table));
gb.attribute(GATKVCFConstants.STRAND_BIAS_BY_SAMPLE_KEY, StrandBiasTableUtils.getContingencyArray(table));
}
@Override

View File

@ -0,0 +1,250 @@
/*
* By downloading the PROGRAM you agree to the following terms of use:
*
* BROAD INSTITUTE
* SOFTWARE LICENSE AGREEMENT
* FOR ACADEMIC NON-COMMERCIAL RESEARCH PURPOSES ONLY
*
* This Agreement is made between the Broad Institute, Inc. with a principal address at 415 Main Street, Cambridge, MA 02142 (BROAD) and the LICENSEE and is effective at the date the downloading is completed (EFFECTIVE DATE).
*
* WHEREAS, LICENSEE desires to license the PROGRAM, as defined hereinafter, and BROAD wishes to have this PROGRAM utilized in the public interest, subject only to the royalty-free, nonexclusive, nontransferable license rights of the United States Government pursuant to 48 CFR 52.227-14; and
* WHEREAS, LICENSEE desires to license the PROGRAM and BROAD desires to grant a license on the following terms and conditions.
* NOW, THEREFORE, in consideration of the promises and covenants made herein, the parties hereto agree as follows:
*
* 1. DEFINITIONS
* 1.1 PROGRAM shall mean copyright in the object code and source code known as GATK3 and related documentation, if any, as they exist on the EFFECTIVE DATE and can be downloaded from http://www.broadinstitute.org/gatk on the EFFECTIVE DATE.
*
* 2. LICENSE
* 2.1 Grant. Subject to the terms of this Agreement, BROAD hereby grants to LICENSEE, solely for academic non-commercial research purposes, a non-exclusive, non-transferable license to: (a) download, execute and display the PROGRAM and (b) create bug fixes and modify the PROGRAM. LICENSEE hereby automatically grants to BROAD a non-exclusive, royalty-free, irrevocable license to any LICENSEE bug fixes or modifications to the PROGRAM with unlimited rights to sublicense and/or distribute. LICENSEE agrees to provide any such modifications and bug fixes to BROAD promptly upon their creation.
* The LICENSEE may apply the PROGRAM in a pipeline to data owned by users other than the LICENSEE and provide these users the results of the PROGRAM provided LICENSEE does so for academic non-commercial purposes only. For clarification purposes, academic sponsored research is not a commercial use under the terms of this Agreement.
* 2.2 No Sublicensing or Additional Rights. LICENSEE shall not sublicense or distribute the PROGRAM, in whole or in part, without prior written permission from BROAD. LICENSEE shall ensure that all of its users agree to the terms of this Agreement. LICENSEE further agrees that it shall not put the PROGRAM on a network, server, or other similar technology that may be accessed by anyone other than the LICENSEE and its employees and users who have agreed to the terms of this agreement.
* 2.3 License Limitations. Nothing in this Agreement shall be construed to confer any rights upon LICENSEE by implication, estoppel, or otherwise to any computer software, trademark, intellectual property, or patent rights of BROAD, or of any other entity, except as expressly granted herein. LICENSEE agrees that the PROGRAM, in whole or part, shall not be used for any commercial purpose, including without limitation, as the basis of a commercial software or hardware product or to provide services. LICENSEE further agrees that the PROGRAM shall not be copied or otherwise adapted in order to circumvent the need for obtaining a license for use of the PROGRAM.
*
* 3. PHONE-HOME FEATURE
* LICENSEE expressly acknowledges that the PROGRAM contains an embedded automatic reporting system (PHONE-HOME) which is enabled by default upon download. Unless LICENSEE requests disablement of PHONE-HOME, LICENSEE agrees that BROAD may collect limited information transmitted by PHONE-HOME regarding LICENSEE and its use of the PROGRAM. Such information shall include LICENSEES user identification, version number of the PROGRAM and tools being run, mode of analysis employed, and any error reports generated during run-time. Collection of such information is used by BROAD solely to monitor usage rates, fulfill reporting requirements to BROAD funding agencies, drive improvements to the PROGRAM, and facilitate adjustments to PROGRAM-related documentation.
*
* 4. OWNERSHIP OF INTELLECTUAL PROPERTY
* LICENSEE acknowledges that title to the PROGRAM shall remain with BROAD. The PROGRAM is marked with the following BROAD copyright notice and notice of attribution to contributors. LICENSEE shall retain such notice on all copies. LICENSEE agrees to include appropriate attribution if any results obtained from use of the PROGRAM are included in any publication.
* Copyright 2012-2015 Broad Institute, Inc.
* Notice of attribution: The GATK3 program was made available through the generosity of Medical and Population Genetics program at the Broad Institute, Inc.
* LICENSEE shall not use any trademark or trade name of BROAD, or any variation, adaptation, or abbreviation, of such marks or trade names, or any names of officers, faculty, students, employees, or agents of BROAD except as states above for attribution purposes.
*
* 5. INDEMNIFICATION
* LICENSEE shall indemnify, defend, and hold harmless BROAD, and their respective officers, faculty, students, employees, associated investigators and agents, and their respective successors, heirs and assigns, (Indemnitees), against any liability, damage, loss, or expense (including reasonable attorneys fees and expenses) incurred by or imposed upon any of the Indemnitees in connection with any claims, suits, actions, demands or judgments arising out of any theory of liability (including, without limitation, actions in the form of tort, warranty, or strict liability and regardless of whether such action has any factual basis) pursuant to any right or license granted under this Agreement.
*
* 6. NO REPRESENTATIONS OR WARRANTIES
* THE PROGRAM IS DELIVERED AS IS. BROAD MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND CONCERNING THE PROGRAM OR THE COPYRIGHT, EXPRESS OR IMPLIED, INCLUDING, WITHOUT LIMITATION, WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, NONINFRINGEMENT, OR THE ABSENCE OF LATENT OR OTHER DEFECTS, WHETHER OR NOT DISCOVERABLE. BROAD EXTENDS NO WARRANTIES OF ANY KIND AS TO PROGRAM CONFORMITY WITH WHATEVER USER MANUALS OR OTHER LITERATURE MAY BE ISSUED FROM TIME TO TIME.
* IN NO EVENT SHALL BROAD OR ITS RESPECTIVE DIRECTORS, OFFICERS, EMPLOYEES, AFFILIATED INVESTIGATORS AND AFFILIATES BE LIABLE FOR INCIDENTAL OR CONSEQUENTIAL DAMAGES OF ANY KIND, INCLUDING, WITHOUT LIMITATION, ECONOMIC DAMAGES OR INJURY TO PROPERTY AND LOST PROFITS, REGARDLESS OF WHETHER BROAD SHALL BE ADVISED, SHALL HAVE OTHER REASON TO KNOW, OR IN FACT SHALL KNOW OF THE POSSIBILITY OF THE FOREGOING.
*
* 7. ASSIGNMENT
* This Agreement is personal to LICENSEE and any rights or obligations assigned by LICENSEE without the prior written consent of BROAD shall be null and void.
*
* 8. MISCELLANEOUS
* 8.1 Export Control. LICENSEE gives assurance that it will comply with all United States export control laws and regulations controlling the export of the PROGRAM, including, without limitation, all Export Administration Regulations of the United States Department of Commerce. Among other things, these laws and regulations prohibit, or require a license for, the export of certain types of software to specified countries.
* 8.2 Termination. LICENSEE shall have the right to terminate this Agreement for any reason upon prior written notice to BROAD. If LICENSEE breaches any provision hereunder, and fails to cure such breach within thirty (30) days, BROAD may terminate this Agreement immediately. Upon termination, LICENSEE shall provide BROAD with written assurance that the original and all copies of the PROGRAM have been destroyed, except that, upon prior written authorization from BROAD, LICENSEE may retain a copy for archive purposes.
* 8.3 Survival. The following provisions shall survive the expiration or termination of this Agreement: Articles 1, 3, 4, 5 and Sections 2.2, 2.3, 7.3, and 7.4.
* 8.4 Notice. Any notices under this Agreement shall be in writing, shall specifically refer to this Agreement, and shall be sent by hand, recognized national overnight courier, confirmed facsimile transmission, confirmed electronic mail, or registered or certified mail, postage prepaid, return receipt requested. All notices under this Agreement shall be deemed effective upon receipt.
* 8.5 Amendment and Waiver; Entire Agreement. This Agreement may be amended, supplemented, or otherwise modified only by means of a written instrument signed by all parties. Any waiver of any rights or failure to act in a specific instance shall relate only to such instance and shall not be construed as an agreement to waive any rights or fail to act in any other instance, whether or not similar. This Agreement constitutes the entire agreement among the parties with respect to its subject matter and supersedes prior agreements or understandings between the parties relating to its subject matter.
* 8.6 Binding Effect; Headings. This Agreement shall be binding upon and inure to the benefit of the parties and their respective permitted successors and assigns. All headings are for convenience only and shall not affect the meaning of any provision of this Agreement.
* 8.7 Governing Law. This Agreement shall be construed, governed, interpreted and applied in accordance with the internal laws of the Commonwealth of Massachusetts, U.S.A., without regard to conflict of laws principles.
*/
package org.broadinstitute.gatk.tools.walkers.annotator;
import cern.jet.math.Arithmetic;
import org.apache.log4j.Logger;
import org.broadinstitute.gatk.utils.QualityUtils;
import java.util.ArrayList;
import java.util.List;
/**
* A class containing many convenience methods used in the strand bias annotation calculations
*/
public class StrandBiasTableUtils {
private final static Logger logger = Logger.getLogger(StrandBiasTableUtils.class);
//For now this is only for 2x2 contingency tables
protected static final int ARRAY_DIM = 2;
protected static final int ARRAY_SIZE = ARRAY_DIM * ARRAY_DIM;
private static double MIN_PVALUE = 1E-320;
// how large do we want the normalized table to be?
private static final double TARGET_TABLE_SIZE = 200.0;
private final static double AUGMENTATION_CONSTANT = 1.0;
/**
* Computes a two-sided p-Value for a Fisher's exact test on the contingency table, after normalizing counts so that the sum does not exceed {@value org.broadinstitute.gatk.tools.walkers.annotator.StrandBiasTableUtils#TARGET_TABLE_SIZE}
* @param originalTable
* @return
*/
public static Double FisherExactPValueForContingencyTable(int[][] originalTable) {
final int[][] normalizedTable = normalizeContingencyTable(originalTable);
int[][] table = copyContingencyTable(normalizedTable);
double pCutoff = computePValue(table);
double pValue = pCutoff;
while (rotateTable(table)) {
double pValuePiece = computePValue(table);
if (pValuePiece <= pCutoff) {
pValue += pValuePiece;
}
}
table = copyContingencyTable(normalizedTable);
while (unrotateTable(table)) {
double pValuePiece = computePValue(table);
if (pValuePiece <= pCutoff) {
pValue += pValuePiece;
}
}
// min is necessary as numerical precision can result in pValue being slightly greater than 1.0
return Math.min(pValue, 1.0);
}
/**
* Helper function to turn the FisherStrand table into the SB annotation array
* @param table the table used by the FisherStrand annotation
* @return the array used by the per-sample Strand Bias annotation
*/
public static List<Integer> getContingencyArray( final int[][] table ) {
if(table.length != ARRAY_DIM || table[0].length != ARRAY_DIM) {
logger.warn("Expecting a " + ARRAY_DIM + "x" + ARRAY_DIM + " strand bias table.");
return null;
}
final List<Integer> list = new ArrayList<>(ARRAY_SIZE);
list.add(table[0][0]);
list.add(table[0][1]);
list.add(table[1][0]);
list.add(table[1][1]);
return list;
}
/**
* Printing information to logger.info for debugging purposes
*
* @param name the name of the table
* @param table the table itself
*/
public static void printTable(final String name, final int[][] table) {
final String pValue = String.format("%.3f", QualityUtils.phredScaleErrorRate(Math.max(FisherExactPValueForContingencyTable(table), MIN_PVALUE)));
logger.info(String.format("FS %s (REF+, REF-, ALT+, ALT-) = (%d, %d, %d, %d) = %s",
name, table[0][0], table[0][1], table[1][0], table[1][1], pValue));
}
/**
* Adds the small value AUGMENTATION_CONSTANT to all the entries of the table.
*
* @param table the table to augment
* @return the augmented table
*/
protected static double[][] augmentContingencyTable(final int[][] table) {
double[][] augmentedTable = new double[ARRAY_DIM][ARRAY_DIM];
for ( int i = 0; i < ARRAY_DIM; i++ ) {
for ( int j = 0; j < ARRAY_DIM; j++ )
augmentedTable[i][j] = table[i][j] + AUGMENTATION_CONSTANT;
}
return augmentedTable;
}
/**
* Normalize the table so that the entries are not too large.
* Note that this method does NOT necessarily make a copy of the table being passed in!
*
* @param table the original table
* @return a normalized version of the table or the original table if it is already normalized
*/
protected static int[][] normalizeContingencyTable(final int[][] table) {
final int sum = table[0][0] + table[0][1] + table[1][0] + table[1][1];
if ( sum <= TARGET_TABLE_SIZE * 2 )
return table;
final double normalizationFactor = (double)sum / TARGET_TABLE_SIZE;
final int[][] normalized = new int[ARRAY_DIM][ARRAY_DIM];
for ( int i = 0; i < ARRAY_DIM; i++ ) {
for ( int j = 0; j < ARRAY_DIM; j++ )
normalized[i][j] = (int)(table[i][j] / normalizationFactor);
}
return normalized;
}
public static int [][] copyContingencyTable(int [][] t) {
int[][] c = new int[ARRAY_DIM][ARRAY_DIM];
for ( int i = 0; i < ARRAY_DIM; i++ ) {
//System.arraycopy(t,0,c,0,ARRAY_DIM);
for (int j = 0; j < ARRAY_DIM; j++) {
c[i][j] = t[i][j];
}
}
return c;
}
protected static boolean rotateTable(int[][] table) {
table[0][0]--;
table[1][0]++;
table[0][1]++;
table[1][1]--;
return (table[0][0] >= 0 && table[1][1] >= 0);
}
protected static boolean unrotateTable(int[][] table) {
table[0][0]++;
table[1][0]--;
table[0][1]--;
table[1][1]++;
return (table[0][1] >= 0 && table[1][0] >= 0);
}
protected static double computePValue(int[][] table) {
int[] rowSums = { sumRow(table, 0), sumRow(table, 1) };
int[] colSums = { sumColumn(table, 0), sumColumn(table, 1) };
int N = rowSums[0] + rowSums[1];
// calculate in log space for better precision
double pCutoff = Arithmetic.logFactorial(rowSums[0])
+ Arithmetic.logFactorial(rowSums[1])
+ Arithmetic.logFactorial(colSums[0])
+ Arithmetic.logFactorial(colSums[1])
- Arithmetic.logFactorial(table[0][0])
- Arithmetic.logFactorial(table[0][1])
- Arithmetic.logFactorial(table[1][0])
- Arithmetic.logFactorial(table[1][1])
- Arithmetic.logFactorial(N);
return Math.exp(pCutoff);
}
private static int sumRow(int[][] table, int column) {
int sum = 0;
for (int r = 0; r < table.length; r++) {
sum += table[r][column];
}
return sum;
}
private static int sumColumn(int[][] table, int row) {
int sum = 0;
for (int c = 0; c < table[row].length; c++) {
sum += table[row][c];
}
return sum;
}
}

View File

@ -58,6 +58,7 @@ import htsjdk.variant.vcf.VCFFormatHeaderLine;
import htsjdk.variant.vcf.VCFHeaderLine;
import org.apache.log4j.Logger;
import org.broadinstitute.gatk.engine.GenomeAnalysisEngine;
import org.broadinstitute.gatk.tools.walkers.annotator.interfaces.ActiveRegionBasedAnnotation;
import org.broadinstitute.gatk.utils.contexts.AlignmentContext;
import org.broadinstitute.gatk.utils.contexts.ReferenceContext;
import org.broadinstitute.gatk.utils.refdata.RefMetaDataTracker;
@ -77,7 +78,8 @@ import java.util.*;
/**
* Class of tests to detect strand bias.
*/
public abstract class StrandBiasTest extends InfoFieldAnnotation {
//TODO: will eventually implement ReducibleAnnotation -- see RMSAnnotation.java for an example of an abstract ReducibleAnnotation
public abstract class StrandBiasTest extends InfoFieldAnnotation implements ActiveRegionBasedAnnotation {
private final static Logger logger = Logger.getLogger(StrandBiasTest.class);
private static boolean stratifiedPerReadAlleleLikelihoodMapWarningLogged = false;
private static boolean inputVariantContextWarningLogged = false;
@ -312,7 +314,6 @@ public abstract class StrandBiasTest extends InfoFieldAnnotation {
private static void updateTable(final int[] table, final Allele allele, final GATKSAMRecord read, final Allele ref, final List<Allele> allAlts) {
final boolean matchesRef = allele.equals(ref, true);
final boolean matchesAlt = allele.equals(allAlts.get(0), true);
final boolean matchesAnyAlt = allAlts.contains(allele);
if ( matchesRef || matchesAnyAlt ) {

View File

@ -101,7 +101,6 @@ import java.util.*;
*
*/
public class StrandOddsRatio extends StrandBiasTest implements StandardAnnotation, ActiveRegionBasedAnnotation {
private final static double AUGMENTATION_CONSTANT = 1.0;
private static final int MIN_COUNT = 0;
@Override
@ -132,17 +131,17 @@ public class StrandOddsRatio extends StrandBiasTest implements StandardAnnotatio
}
/**
* Computes the SOR value of a table after augmentation. Based on the symmetric odds ratio but modified to take on
* Computes the SOR value of a table after augmentation (adding pseudocounts). Based on the symmetric odds ratio but modified to take on
* low values when the reference +/- read count ratio is skewed but the alt count ratio is not. Natural log is taken
* to keep values within roughly the same range as other annotations.
*
* Augmentation avoids quotient by zero.
* Adding pseudocounts prevent divide-by-zero.
*
* @param originalTable The table before augmentation
* @return the SOR annotation value
*/
final protected double calculateSOR(final int[][] originalTable) {
final double[][] augmentedTable = augmentContingencyTable(originalTable);
final double[][] augmentedTable = StrandBiasTableUtils.augmentContingencyTable(originalTable);
double ratio = 0;
@ -158,22 +157,6 @@ public class StrandOddsRatio extends StrandBiasTest implements StandardAnnotatio
}
/**
* Adds the small value AUGMENTATION_CONSTANT to all the entries of the table.
*
* @param table the table to augment
* @return the augmented table
*/
private static double[][] augmentContingencyTable(final int[][] table) {
double[][] augmentedTable = new double[ARRAY_DIM][ARRAY_DIM];
for ( int i = 0; i < ARRAY_DIM; i++ ) {
for ( int j = 0; j < ARRAY_DIM; j++ )
augmentedTable[i][j] = table[i][j] + AUGMENTATION_CONSTANT;
}
return augmentedTable;
}
/**
* Returns an annotation result given a ratio
*

View File

@ -311,7 +311,7 @@ public class HaplotypeCaller extends ActiveRegionWalker<List<VariantContext>, In
* to provide a pedigree file for a pedigree-based annotation) may cause the run to fail.
*/
@Argument(fullName="group", shortName="G", doc="One or more classes/groups of annotations to apply to variant calls", required=false)
protected String[] annotationClassesToUse = { "Standard" };
protected String[] annotationGroupsToUse = { "Standard" };
@ArgumentCollection
private HaplotypeCallerArgumentCollection HCAC = new HaplotypeCallerArgumentCollection();
@ -623,10 +623,14 @@ public class HaplotypeCaller extends ActiveRegionWalker<List<VariantContext>, In
genotypingEngine = new HaplotypeCallerGenotypingEngine(HCAC, samplesList, genomeLocParser, FixedAFCalculatorProvider.createThreadSafeProvider(getToolkit(), HCAC,logger), !doNotRunPhysicalPhasing);
// initialize the output VCF header
final VariantAnnotatorEngine annotationEngine = new VariantAnnotatorEngine(Arrays.asList(annotationClassesToUse), annotationsToUse, annotationsToExclude, this, getToolkit());
final VariantAnnotatorEngine annotationEngine = new VariantAnnotatorEngine(Arrays.asList(annotationGroupsToUse), annotationsToUse, annotationsToExclude, this, getToolkit());
final Set<VCFHeaderLine> headerInfo = new HashSet<>();
//initialize the annotations (this is particularly important to turn off RankSumTest dithering in integration tests)
//do this before we write the header because SnpEff adds to header lines
annotationEngine.invokeAnnotationInitializationMethods(headerInfo);
headerInfo.addAll(genotypingEngine.getAppropriateVCFInfoHeaders());
// all annotation fields from VariantAnnotatorEngine
headerInfo.addAll(annotationEngine.getVCFAnnotationDescriptions());
@ -654,9 +658,6 @@ public class HaplotypeCaller extends ActiveRegionWalker<List<VariantContext>, In
vcfWriter.writeHeader(new VCFHeader(headerInfo, sampleSet));
//now that we have all the VCF headers, initialize the annotations (this is particularly important to turn off RankSumTest dithering in integration tests)
annotationEngine.invokeAnnotationInitializationMethods(headerInfo);
try {
// fasta reference reader to supplement the edges of the reference sequence
referenceReader = new CachingIndexedFastaSequenceFile(getToolkit().getArguments().referenceFile);
@ -1211,7 +1212,7 @@ public class HaplotypeCaller extends ActiveRegionWalker<List<VariantContext>, In
*
* @return true if HC must emit reference confidence.
*/
private boolean emitReferenceConfidence() {
public boolean emitReferenceConfidence() {
return HCAC.emitReferenceConfidence != ReferenceConfidenceMode.NONE;
}

View File

@ -274,7 +274,7 @@ public class HaplotypeCallerGenotypingEngine extends GenotypingEngine<AssemblyBa
genomeLocParser, emitReferenceConfidence, alleleMapper, readAlleleLikelihoods, call);
ReferenceContext referenceContext = new ReferenceContext(genomeLocParser, genomeLocParser.createGenomeLoc(mergedVC.getChr(), mergedVC.getStart(), mergedVC.getEnd()), refLoc, ref);
VariantContext annotatedCall = annotationEngine.annotateContextForActiveRegion(referenceContext, tracker,readAlleleLikelihoods, call);
VariantContext annotatedCall = annotationEngine.annotateContextForActiveRegion(referenceContext, tracker,readAlleleLikelihoods, call, emitReferenceConfidence);
if( call.getAlleles().size() != mergedVC.getAlleles().size() )
annotatedCall = GATKVariantContextUtils.reverseTrimAlleles(annotatedCall);

View File

@ -51,6 +51,9 @@
package org.broadinstitute.gatk.tools.walkers.variantutils;
import org.broadinstitute.gatk.engine.arguments.DbsnpArgumentCollection;
import org.broadinstitute.gatk.tools.walkers.annotator.VariantAnnotatorEngine;
import org.broadinstitute.gatk.tools.walkers.annotator.interfaces.AnnotatorCompatible;
import org.broadinstitute.gatk.utils.commandline.*;
import org.broadinstitute.gatk.engine.CommandLineGATK;
import org.broadinstitute.gatk.utils.contexts.AlignmentContext;
@ -110,7 +113,38 @@ import java.util.*;
*/
@DocumentedGATKFeature( groupName = HelpConstants.DOCS_CAT_VARMANIP, extraDocs = {CommandLineGATK.class} )
@Reference(window=@Window(start=0,stop=1))
public class CombineGVCFs extends RodWalker<CombineGVCFs.PositionalState, CombineGVCFs.OverallState> {
public class CombineGVCFs extends RodWalker<CombineGVCFs.PositionalState, CombineGVCFs.OverallState> implements AnnotatorCompatible {
/**
* Which annotations to recompute for the combined output VCF file.
*/
@Advanced
@Argument(fullName="annotation", shortName="A", doc="One or more specific annotations to recompute. The single value 'none' removes the default annotations", required=false)
protected List<String> annotationsToUse = new ArrayList<>(Arrays.asList(new String[]{"AS_RMSMappingQuality"}));
/**
* Which groups of annotations to add to the output VCF file. The single value 'none' removes the default group. See
* the VariantAnnotator -list argument to view available groups. Note that this usage is not recommended because
* it obscures the specific requirements of individual annotations. Any requirements that are not met (e.g. failing
* to provide a pedigree file for a pedigree-based annotation) may cause the run to fail.
*/
@Argument(fullName="group", shortName="G", doc="One or more classes/groups of annotations to apply to variant calls", required=false)
protected String[] annotationGroupsToUse = { "Standard" };
/**
* The rsIDs from this file are used to populate the ID column of the output. Also, the DB INFO flag will be set when appropriate. Note that dbSNP is not used in any way for the calculations themselves.
*/
@ArgumentCollection
protected DbsnpArgumentCollection dbsnp = new DbsnpArgumentCollection();
public RodBinding<VariantContext> getDbsnpRodBinding() { return dbsnp.dbsnp; }
public List<RodBinding<VariantContext>> getCompRodBindings() { return Collections.emptyList(); }
public RodBinding<VariantContext> getSnpEffRodBinding() { return null; }
public List<RodBinding<VariantContext>> getResourceRodBindings() { return Collections.emptyList(); }
public boolean alwaysAppendDbsnpId() { return false; }
// the annotation engine
private VariantAnnotatorEngine annotationEngine;
protected final class PositionalState {
final List<VariantContext> VCs;
@ -177,6 +211,12 @@ public class CombineGVCFs extends RodWalker<CombineGVCFs.PositionalState, Combin
genomeLocParser = getToolkit().getGenomeLocParser();
// create the annotation engine
annotationEngine = new VariantAnnotatorEngine(Arrays.asList(annotationGroupsToUse), annotationsToUse, Collections.<String>emptyList(), this, getToolkit());
//now that we have all the VCF headers, initialize the annotations (this is particularly important to turn off RankSumTest dithering in integration tests)
annotationEngine.invokeAnnotationInitializationMethods(headerLines);
// optimization to prevent mods when we always just want to break bands
if ( multipleAtWhichToBreakBands == 1 )
USE_BP_RESOLUTION = true;
@ -321,7 +361,7 @@ public class CombineGVCFs extends RodWalker<CombineGVCFs.PositionalState, Combin
// we need the specialized merge if the site contains anything other than ref blocks
final VariantContext mergedVC;
if ( containsTrueAltAllele(stoppedVCs) )
mergedVC = ReferenceConfidenceVariantContextMerger.merge(stoppedVCs, gLoc, refBase, false, false);
mergedVC = ReferenceConfidenceVariantContextMerger.merge(stoppedVCs, gLoc, refBase, false, false, annotationEngine);
else
mergedVC = referenceBlockMerge(stoppedVCs, state, pos.getStart());

View File

@ -161,6 +161,16 @@ public class GenotypeGVCFs extends RodWalker<VariantContext, VariantContextWrite
@Argument(fullName="annotation", shortName="A", doc="One or more specific annotations to recompute. The single value 'none' removes the default annotations", required=false)
protected List<String> annotationsToUse = new ArrayList<>(Arrays.asList(new String[]{"InbreedingCoeff", "FisherStrand", "QualByDepth", "ChromosomeCounts", "StrandOddsRatio"}));
/**
* Which groups of annotations to add to the output VCF file. The single value 'none' removes the default group. See
* the VariantAnnotator -list argument to view available groups. Note that this usage is not recommended because
* it obscures the specific requirements of individual annotations. Any requirements that are not met (e.g. failing
* to provide a pedigree file for a pedigree-based annotation) may cause the run to fail.
*/
@Argument(fullName="group", shortName="G", doc="One or more classes/groups of annotations to apply to variant calls", required=false)
protected String[] annotationGroupsToUse = {};
/**
* The rsIDs from this file are used to populate the ID column of the output. Also, the DB INFO flag will be set when appropriate. Note that dbSNP is not used in any way for the calculations themselves.
*/
@ -211,7 +221,7 @@ public class GenotypeGVCFs extends RodWalker<VariantContext, VariantContextWrite
genotypingEngine = new UnifiedGenotypingEngine(createUAC(), samples, toolkit.getGenomeLocParser(), GeneralPloidyFailOverAFCalculatorProvider.createThreadSafeProvider(toolkit, genotypeArgs, logger),
toolkit.getArguments().BAQMode);
// create the annotation engine
annotationEngine = new VariantAnnotatorEngine(Arrays.asList("none"), annotationsToUse, Collections.<String>emptyList(), this, toolkit);
annotationEngine = new VariantAnnotatorEngine(Arrays.asList(annotationGroupsToUse), annotationsToUse, Collections.<String>emptyList(), this, toolkit);
// take care of the VCF headers
final Set<VCFHeaderLine> headerLines = VCFUtils.smartMergeHeaders(vcfRods.values(), true);
@ -230,6 +240,9 @@ public class GenotypeGVCFs extends RodWalker<VariantContext, VariantContextWrite
final VCFHeader vcfHeader = new VCFHeader(headerLines, sampleNameSet);
vcfWriter.writeHeader(vcfHeader);
//now that we have all the VCF headers, initialize the annotations (this is particularly important to turn off RankSumTest dithering in integration tests)
annotationEngine.invokeAnnotationInitializationMethods(headerLines);
logger.info("Notice that the -ploidy parameter is ignored in " + getClass().getSimpleName() + " tool as this is automatically determined by the input variant files");
}
@ -238,7 +251,7 @@ public class GenotypeGVCFs extends RodWalker<VariantContext, VariantContextWrite
return null;
final GenomeLoc loc = ref.getLocus();
final VariantContext combinedVC = ReferenceConfidenceVariantContextMerger.merge(tracker.getPrioritizedValue(variants, loc), loc, INCLUDE_NON_VARIANTS ? ref.getBase() : null, true, uniquifySamples);
final VariantContext combinedVC = ReferenceConfidenceVariantContextMerger.merge(tracker.getPrioritizedValue(variants, loc), loc, INCLUDE_NON_VARIANTS ? ref.getBase() : null, true, uniquifySamples, annotationEngine);
if ( combinedVC == null )
return null;
return regenotypeVC(tracker, ref, combinedVC);
@ -255,21 +268,25 @@ public class GenotypeGVCFs extends RodWalker<VariantContext, VariantContextWrite
protected VariantContext regenotypeVC(final RefMetaDataTracker tracker, final ReferenceContext ref, final VariantContext originalVC) {
if ( originalVC == null ) throw new IllegalArgumentException("originalVC cannot be null");
VariantContext result = originalVC;
VariantContext rawResult = originalVC;
// only re-genotype polymorphic sites
if ( result.isVariant() ) {
VariantContext regenotypedVC = genotypingEngine.calculateGenotypes(result);
if ( rawResult.isVariant() ) {
VariantContext regenotypedVC = genotypingEngine.calculateGenotypes(rawResult);
if ( ! isProperlyPolymorphic(regenotypedVC) ) {
if (!INCLUDE_NON_VARIANTS)
return null;
}
else {
regenotypedVC = GATKVariantContextUtils.reverseTrimAlleles(regenotypedVC);
result = addGenotypingAnnotations(originalVC.getAttributes(), regenotypedVC);
rawResult = addGenotypingAnnotations(rawResult.getAttributes(), regenotypedVC);
}
}
//At this point we should already have DP and AD annotated
VariantContext result = annotationEngine.finalizeAnnotations(rawResult, originalVC);
//do trimming after allele-specific annotation reduction or the mapping is difficult
result = GATKVariantContextUtils.reverseTrimAlleles(result);
// if it turned monomorphic then we either need to ignore or fix such sites
boolean createRefGTs = false;
if ( result.isMonomorphicInSamples() ) {

View File

@ -53,13 +53,17 @@ package org.broadinstitute.gatk.tools.walkers.variantutils;
import htsjdk.variant.variantcontext.*;
import htsjdk.variant.vcf.VCFConstants;
import org.apache.log4j.Logger;
import org.broadinstitute.gatk.tools.walkers.annotator.AlleleSpecificAnnotationData;
import org.broadinstitute.gatk.tools.walkers.annotator.ReducibleAnnotationData;
import org.broadinstitute.gatk.tools.walkers.annotator.VariantAnnotatorEngine;
import org.broadinstitute.gatk.tools.walkers.genotyper.GenotypeLikelihoodCalculator;
import org.broadinstitute.gatk.tools.walkers.genotyper.GenotypeLikelihoodCalculators;
import org.broadinstitute.gatk.utils.GenomeLoc;
import org.broadinstitute.gatk.utils.MathUtils;
import org.broadinstitute.gatk.utils.Utils;
import org.broadinstitute.gatk.utils.collections.Pair;
import org.broadinstitute.gatk.utils.exceptions.ReviewedGATKException;
import org.broadinstitute.gatk.utils.exceptions.GATKException;
import org.broadinstitute.gatk.utils.exceptions.UserException;
import org.broadinstitute.gatk.utils.variant.GATKVCFConstants;
import org.broadinstitute.gatk.utils.variant.GATKVariantContextUtils;
@ -73,6 +77,8 @@ import java.util.*;
*/
public class ReferenceConfidenceVariantContextMerger {
private final static Logger logger = Logger.getLogger(ReferenceConfidenceVariantContextMerger.class);
private static Comparable combineAnnotationValues( final List<Comparable> array ) {
return MathUtils.median(array); // right now we take the median but other options could be explored
}
@ -89,10 +95,10 @@ public class ReferenceConfidenceVariantContextMerger {
* @return new VariantContext representing the merge of all VCs or null if it not relevant
*/
public static VariantContext merge(final List<VariantContext> VCs, final GenomeLoc loc, final Byte refBase, final boolean removeNonRefSymbolicAllele,
final boolean samplesAreUniquified) {
final boolean samplesAreUniquified, final VariantAnnotatorEngine annotatorEngine) {
// this can happen if e.g. you are using a dbSNP file that spans a region with no gVCFs
if ( VCs == null || VCs.isEmpty() )
return null;
if ( VCs == null || VCs.isEmpty() ) {
return null; }
// establish the baseline info (sometimes from the first VC)
final VariantContext first = VCs.get(0);
@ -100,8 +106,8 @@ public class ReferenceConfidenceVariantContextMerger {
// ref allele
final Allele refAllele = determineReferenceAlleleGivenReferenceBase(VCs, loc, refBase);
if ( refAllele == null )
return null;
if ( refAllele == null ) {
return null; }
// FinalAlleleSet contains the alleles of the new resulting VC
// Using linked set in order to guarantee a stable order
@ -112,7 +118,7 @@ public class ReferenceConfidenceVariantContextMerger {
final Map<String, Object> attributes = new LinkedHashMap<>();
final Set<String> rsIDs = new LinkedHashSet<>(1); // most of the time there's one id
int depth = 0;
final Map<String, List<Comparable>> annotationMap = new LinkedHashMap<>();
final Map<String, List<ReducibleAnnotationData>> annotationMap = new LinkedHashMap<>();
final GenotypesContext genotypes = GenotypesContext.create();
// In this list we hold the mapping of each variant context alleles.
@ -156,28 +162,42 @@ public class ReferenceConfidenceVariantContextMerger {
}
}
if ( loc.getStart() != vc.getStart() )
if ( loc.getStart() != vc.getStart() ) {
continue;
}
// special case ID (just preserve it)
if ( vc.hasID() ) rsIDs.add(vc.getID());
// add attributes
addReferenceConfidenceAttributes(vc.getAttributes(), annotationMap);
// add attributes to annotationMap, store all info field annotations as AlleleSpecificAnnotationData in case they can be parsed that way
addReferenceConfidenceAttributes(pair, annotationMap);
}
// when combining annotations use the median value from all input VCs which had annotations provided
for ( final Map.Entry<String, List<Comparable>> p : annotationMap.entrySet() ) {
//combine the annotations that are reducible and remove them from annotationMap
Map<String, Object> combinedAnnotations = new HashMap<>();
if (annotatorEngine != null) {
combinedAnnotations = annotatorEngine.combineAnnotations(allelesList, annotationMap);
}
attributes.putAll(combinedAnnotations);
// remove stale AC and AF based attributes (including MLEAC and MLEAF lists)
//these will be recalculated after genotyping
removeStaleAttributesAfterMerge(annotationMap);
//annotatorEngine.combineAnnotations removed the successfully combined annotations, so now parse those that are left
//here we're assuming that things that are left are scalars per sample
Map<String, List<Comparable>> parsedAnnotationMap = parseRemainingAnnotations(annotationMap);
// when combining remaining annotations use the median value from all input VCs which had annotations provided
for ( final Map.Entry<String, List<Comparable>> p : parsedAnnotationMap.entrySet() ) {
if ( ! p.getValue().isEmpty() ) {
attributes.put(p.getKey(), combineAnnotationValues(p.getValue()));
}
}
if ( depth > 0 )
if ( depth > 0 ) {
attributes.put(VCFConstants.DEPTH_KEY, String.valueOf(depth));
// remove stale AC and AF based attributes
removeStaleAttributesAfterMerge(attributes);
}
final String ID = rsIDs.isEmpty() ? VCFConstants.EMPTY_ID_FIELD : Utils.join(",", rsIDs);
@ -189,6 +209,39 @@ public class ReferenceConfidenceVariantContextMerger {
return builder.make();
}
/**
* parse the annotations that were not identified as reducible annotations and combined by the annotation engine
* @param annotationMap the map of info field annotation names and the list of their data from the merged VCs
* @return info field data parsed as ints or doubles
*/
private static Map<String, List<Comparable>> parseRemainingAnnotations(final Map<String, List<ReducibleAnnotationData>> annotationMap) {
final Map<String, List<Comparable>> parsedAnnotations = new HashMap<>();
for (Map.Entry<String, List<ReducibleAnnotationData>> currentData : annotationMap.entrySet()) {
List<Comparable> annotationValues = new ArrayList<>();
for (ReducibleAnnotationData value : currentData.getValue()) {
try {
final String stringValue = value.getRawData();
if (stringValue.contains(".")) {
annotationValues.add(Double.parseDouble(stringValue));
} else if (Character.isDigit(stringValue.charAt(0))){
annotationValues.add(Integer.parseInt(stringValue));
//TODO: uncomment this to parse dbSNP membership annotation once allele-specific merging for that attribute is added
/*} else if (Character.isLetter(stringValue.charAt(0))) {
if (stringValue.equalsIgnoreCase("true"))
annotationValues.add(true);
else if (stringValue.equalsIgnoreCase("false"))
annotationValues.add(false);*/
}
} catch (final NumberFormatException e) {
logger.warn("WARNING: remaining (non-reducible) annotations are assumed to be ints or doubles or booleans, but " + value.getRawData() + " doesn't parse and will not be annotated in the final VC.");
}
}
parsedAnnotations.put(currentData.getKey(),annotationValues);
}
return parsedAnnotations;
}
/**
* @param list the original alleles list
* @return a non-null list of non-symbolic alleles
@ -196,8 +249,9 @@ public class ReferenceConfidenceVariantContextMerger {
private static List<Allele> nonSymbolicAlleles(final List<Allele> list) {
final List<Allele> result = new ArrayList<>(list.size());
for ( final Allele allele : list ) {
if ( !allele.isSymbolic() )
if ( !allele.isSymbolic() ) {
result.add(allele);
}
}
return result;
}
@ -212,8 +266,9 @@ public class ReferenceConfidenceVariantContextMerger {
*/
private static Allele determineReferenceAlleleGivenReferenceBase(final List<VariantContext> VCs, final GenomeLoc loc, final Byte refBase) {
final Allele refAllele = GATKVariantContextUtils.determineReferenceAllele(VCs, loc);
if ( refAllele == null )
return ( refBase == null ? null : Allele.create(refBase, true) );
if ( refAllele == null ) {
return (refBase == null ? null : Allele.create(refBase, true));
}
return refAllele;
}
@ -222,7 +277,7 @@ public class ReferenceConfidenceVariantContextMerger {
*
* @param attributes the attribute map
*/
private static void removeStaleAttributesAfterMerge(final Map<String, Object> attributes) {
private static void removeStaleAttributesAfterMerge(final Map<String, List<ReducibleAnnotationData>> attributes) {
attributes.remove(VCFConstants.ALLELE_COUNT_KEY);
attributes.remove(VCFConstants.ALLELE_FREQUENCY_KEY);
attributes.remove(VCFConstants.ALLELE_NUMBER_KEY);
@ -234,31 +289,35 @@ public class ReferenceConfidenceVariantContextMerger {
/**
* Adds attributes to the global map from the new context in a sophisticated manner
*
* @param myAttributes attributes to add from
* @param pair VariantContext/Allele list pair from which to get attributes
* @param annotationMap map of annotations for combining later
*/
private static void addReferenceConfidenceAttributes(final Map<String, Object> myAttributes,
final Map<String, List<Comparable>> annotationMap) {
private static void addReferenceConfidenceAttributes(Pair<VariantContext,List<Allele>> pair,
final Map<String, List<ReducibleAnnotationData>> annotationMap) {
final Map<String, Object> myAttributes = pair.getFirst().getAttributes(); //these are the info field attributes of the VC in pair
final List<Allele> sampleAlleles = pair.getSecond();
for ( final Map.Entry<String, Object> p : myAttributes.entrySet() ) {
final String key = p.getKey();
final Object value = p.getValue();
//allele-specific attributes will always be in list form because they've already been parsed per-allele
//non-allele-specific attributes (DP, etc.) will be a list of length 1
final List<Object> valueList = pair.getFirst().getAttributeAsList(key);
// add the annotation values to a list for combining later
List<Comparable> values = annotationMap.get(key);
if( values == null ) {
values = new ArrayList<>();
annotationMap.put(key, values);
// add the existing annotation values to a list for combining later
List<ReducibleAnnotationData> rawValuesList = annotationMap.get(key);
if( rawValuesList == null ) {
rawValuesList = new ArrayList<>();
annotationMap.put(key, rawValuesList);
}
try {
final String stringValue = value.toString();
// Branch to avoid unintentional, implicit type conversions that occur with the ? operator.
if (stringValue.contains("."))
values.add(Double.parseDouble(stringValue));
else
values.add(Integer.parseInt(stringValue));
} catch (final NumberFormatException e) {
// nothing to do
String combinedString = "";
for(int i=0; i < valueList.size(); i++) {
if (i > 0)
combinedString += ",";
combinedString += valueList.get(i);
}
ReducibleAnnotationData pairData = new AlleleSpecificAnnotationData(sampleAlleles, combinedString);
rawValuesList.add(pairData);
annotationMap.put(key, rawValuesList);
}
}
@ -364,30 +423,34 @@ public class ReferenceConfidenceVariantContextMerger {
// the map is different depending on the ploidy, so in order to keep this method flexible (mixed ploidies)
// we need to get a map done (lazily inside the loop) for each ploidy, up to the maximum possible.
final int[][] genotypeIndexMapsByPloidy = new int[maximumPloidy + 1][];
final int maximumAlleleCount = Math.max(remappedAlleles.size(), targetAlleles.size());
final int maximumAlleleCount = Math.max(remappedAlleles.size(),targetAlleles.size());
int[] perSampleIndexesOfRelevantAlleles;
for ( final Genotype g : vc.getGenotypes() ) {
for (final Genotype g : vc.getGenotypes()) {
final String name;
if (samplesAreUniquified)
name = g.getSampleName() + "." + vc.getSource();
name = g.getSampleName() + "." + vc.getSource();
else
name = g.getSampleName();
name = g.getSampleName();
final int ploidy = g.getPloidy();
final GenotypeBuilder genotypeBuilder = new GenotypeBuilder(g).alleles(GATKVariantContextUtils.noCallAlleles(g.getPloidy()));
genotypeBuilder.name(name);
final boolean hasPL = g.hasPL();
final boolean hasSAC = g.hasExtendedAttribute(GATKVCFConstants.STRAND_COUNT_BY_SAMPLE_KEY);
if ( hasPL || hasSAC ) {
final int[] perSampleIndexesOfRelevantAlleles = getIndexesOfRelevantAlleles(remappedAlleles, targetAlleles, vc.getStart(), g);
if (hasPL || hasSAC) {
perSampleIndexesOfRelevantAlleles = getIndexesOfRelevantAlleles(remappedAlleles, targetAlleles, vc.getStart(), g);
if (g.hasPL()) {
// genotype index map by ploidy.
final int[] genotypeIndexMapByPloidy = GenotypeLikelihoodCalculators.getInstance(ploidy, maximumAlleleCount).genotypeIndexMap(perSampleIndexesOfRelevantAlleles);
// lazy initialization of the genotype index map by ploidy.
final int[] genotypeIndexMapByPloidy = genotypeIndexMapsByPloidy[ploidy] == null
? GenotypeLikelihoodCalculators.getInstance(ploidy, maximumAlleleCount).genotypeIndexMap(perSampleIndexesOfRelevantAlleles)
: genotypeIndexMapsByPloidy[ploidy];
final int[] PLs = generatePL(g, genotypeIndexMapByPloidy);
final int[] AD = g.hasAD() ? generateAD(g.getAD(), perSampleIndexesOfRelevantAlleles) : null;
genotypeBuilder.PL(PLs).AD(AD);
}
if (g.hasExtendedAttribute(GATKVCFConstants.STRAND_COUNT_BY_SAMPLE_KEY)) {
final List<Integer> sacIndexesToUse = adaptToSACIndexes(perSampleIndexesOfRelevantAlleles);
final List<Integer> sacIndexesToUse = adaptToSACIndexes(perSampleIndexesOfRelevantAlleles);
final int[] SACs = GATKVariantContextUtils.makeNewSACs(g, sacIndexesToUse);
genotypeBuilder.attribute(GATKVCFConstants.STRAND_COUNT_BY_SAMPLE_KEY, SACs);
}
@ -397,25 +460,26 @@ public class ReferenceConfidenceVariantContextMerger {
}
/**
* Adapt the relevant alleles to the SAC indexes
*
* @param perSampleIndexesOfRelevantAlleles
* @return SAC indexes
* Adapt the relevant alleles to the SAC indexes
*
* @param perSampleIndexesOfRelevantAlleles
* @return SAC indexes
*/
private static List<Integer> adaptToSACIndexes(final int[] perSampleIndexesOfRelevantAlleles){
if ( perSampleIndexesOfRelevantAlleles == null )
private static List<Integer> adaptToSACIndexes(final int[] perSampleIndexesOfRelevantAlleles) {
if (perSampleIndexesOfRelevantAlleles == null)
throw new IllegalArgumentException("The per sample index of relevant alleles must not be null");
final List<Integer> sacIndexesToUse = new ArrayList(2*perSampleIndexesOfRelevantAlleles.length);
final List<Integer> sacIndexesToUse = new ArrayList(2 * perSampleIndexesOfRelevantAlleles.length);
for ( int item : perSampleIndexesOfRelevantAlleles ) {
sacIndexesToUse.add(new Integer(2*item));
sacIndexesToUse.add(new Integer(2*item+1));
for (int item : perSampleIndexesOfRelevantAlleles) {
sacIndexesToUse.add(new Integer(2 * item));
sacIndexesToUse.add(new Integer(2 * item + 1));
}
return sacIndexesToUse;
return sacIndexesToUse;
}
/**
* Composes a new likelihood array given the original genotype and the genotype index map.
*
@ -439,8 +503,8 @@ public class ReferenceConfidenceVariantContextMerger {
}
/**
* Determines the allele mapping from remappedAlleles to the targetAlleles, substituting the generic "<ALT>" as appropriate.
* If the myAlleles set does not contain "<ALT>" as an allele, it throws an exception.
* Determines the allele mapping from myAlleles to the targetAlleles, substituting the generic "<ALT>" as appropriate.
* If the remappedAlleles set does not contain "<ALT>" as an allele, it throws an exception.
*
* @param remappedAlleles the list of alleles to evaluate
* @param targetAlleles the target list of alleles
@ -450,7 +514,7 @@ public class ReferenceConfidenceVariantContextMerger {
*/
protected static int[] getIndexesOfRelevantAlleles(final List<Allele> remappedAlleles, final List<Allele> targetAlleles, final int position, final Genotype g) {
if ( remappedAlleles == null || remappedAlleles.isEmpty() ) throw new IllegalArgumentException("The list of input alleles must not be null or empty");
if ( remappedAlleles == null || remappedAlleles.isEmpty()) throw new IllegalArgumentException("The list of input alleles must not be null or empty");
if ( targetAlleles == null || targetAlleles.isEmpty() ) throw new IllegalArgumentException("The list of target alleles must not be null or empty");
if ( !remappedAlleles.contains(GATKVCFConstants.NON_REF_SYMBOLIC_ALLELE) )

View File

@ -0,0 +1,220 @@
/*
* By downloading the PROGRAM you agree to the following terms of use:
*
* BROAD INSTITUTE
* SOFTWARE LICENSE AGREEMENT
* FOR ACADEMIC NON-COMMERCIAL RESEARCH PURPOSES ONLY
*
* This Agreement is made between the Broad Institute, Inc. with a principal address at 415 Main Street, Cambridge, MA 02142 (BROAD) and the LICENSEE and is effective at the date the downloading is completed (EFFECTIVE DATE).
*
* WHEREAS, LICENSEE desires to license the PROGRAM, as defined hereinafter, and BROAD wishes to have this PROGRAM utilized in the public interest, subject only to the royalty-free, nonexclusive, nontransferable license rights of the United States Government pursuant to 48 CFR 52.227-14; and
* WHEREAS, LICENSEE desires to license the PROGRAM and BROAD desires to grant a license on the following terms and conditions.
* NOW, THEREFORE, in consideration of the promises and covenants made herein, the parties hereto agree as follows:
*
* 1. DEFINITIONS
* 1.1 PROGRAM shall mean copyright in the object code and source code known as GATK3 and related documentation, if any, as they exist on the EFFECTIVE DATE and can be downloaded from http://www.broadinstitute.org/gatk on the EFFECTIVE DATE.
*
* 2. LICENSE
* 2.1 Grant. Subject to the terms of this Agreement, BROAD hereby grants to LICENSEE, solely for academic non-commercial research purposes, a non-exclusive, non-transferable license to: (a) download, execute and display the PROGRAM and (b) create bug fixes and modify the PROGRAM. LICENSEE hereby automatically grants to BROAD a non-exclusive, royalty-free, irrevocable license to any LICENSEE bug fixes or modifications to the PROGRAM with unlimited rights to sublicense and/or distribute. LICENSEE agrees to provide any such modifications and bug fixes to BROAD promptly upon their creation.
* The LICENSEE may apply the PROGRAM in a pipeline to data owned by users other than the LICENSEE and provide these users the results of the PROGRAM provided LICENSEE does so for academic non-commercial purposes only. For clarification purposes, academic sponsored research is not a commercial use under the terms of this Agreement.
* 2.2 No Sublicensing or Additional Rights. LICENSEE shall not sublicense or distribute the PROGRAM, in whole or in part, without prior written permission from BROAD. LICENSEE shall ensure that all of its users agree to the terms of this Agreement. LICENSEE further agrees that it shall not put the PROGRAM on a network, server, or other similar technology that may be accessed by anyone other than the LICENSEE and its employees and users who have agreed to the terms of this agreement.
* 2.3 License Limitations. Nothing in this Agreement shall be construed to confer any rights upon LICENSEE by implication, estoppel, or otherwise to any computer software, trademark, intellectual property, or patent rights of BROAD, or of any other entity, except as expressly granted herein. LICENSEE agrees that the PROGRAM, in whole or part, shall not be used for any commercial purpose, including without limitation, as the basis of a commercial software or hardware product or to provide services. LICENSEE further agrees that the PROGRAM shall not be copied or otherwise adapted in order to circumvent the need for obtaining a license for use of the PROGRAM.
*
* 3. PHONE-HOME FEATURE
* LICENSEE expressly acknowledges that the PROGRAM contains an embedded automatic reporting system (PHONE-HOME) which is enabled by default upon download. Unless LICENSEE requests disablement of PHONE-HOME, LICENSEE agrees that BROAD may collect limited information transmitted by PHONE-HOME regarding LICENSEE and its use of the PROGRAM. Such information shall include LICENSEES user identification, version number of the PROGRAM and tools being run, mode of analysis employed, and any error reports generated during run-time. Collection of such information is used by BROAD solely to monitor usage rates, fulfill reporting requirements to BROAD funding agencies, drive improvements to the PROGRAM, and facilitate adjustments to PROGRAM-related documentation.
*
* 4. OWNERSHIP OF INTELLECTUAL PROPERTY
* LICENSEE acknowledges that title to the PROGRAM shall remain with BROAD. The PROGRAM is marked with the following BROAD copyright notice and notice of attribution to contributors. LICENSEE shall retain such notice on all copies. LICENSEE agrees to include appropriate attribution if any results obtained from use of the PROGRAM are included in any publication.
* Copyright 2012-2015 Broad Institute, Inc.
* Notice of attribution: The GATK3 program was made available through the generosity of Medical and Population Genetics program at the Broad Institute, Inc.
* LICENSEE shall not use any trademark or trade name of BROAD, or any variation, adaptation, or abbreviation, of such marks or trade names, or any names of officers, faculty, students, employees, or agents of BROAD except as states above for attribution purposes.
*
* 5. INDEMNIFICATION
* LICENSEE shall indemnify, defend, and hold harmless BROAD, and their respective officers, faculty, students, employees, associated investigators and agents, and their respective successors, heirs and assigns, (Indemnitees), against any liability, damage, loss, or expense (including reasonable attorneys fees and expenses) incurred by or imposed upon any of the Indemnitees in connection with any claims, suits, actions, demands or judgments arising out of any theory of liability (including, without limitation, actions in the form of tort, warranty, or strict liability and regardless of whether such action has any factual basis) pursuant to any right or license granted under this Agreement.
*
* 6. NO REPRESENTATIONS OR WARRANTIES
* THE PROGRAM IS DELIVERED AS IS. BROAD MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND CONCERNING THE PROGRAM OR THE COPYRIGHT, EXPRESS OR IMPLIED, INCLUDING, WITHOUT LIMITATION, WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, NONINFRINGEMENT, OR THE ABSENCE OF LATENT OR OTHER DEFECTS, WHETHER OR NOT DISCOVERABLE. BROAD EXTENDS NO WARRANTIES OF ANY KIND AS TO PROGRAM CONFORMITY WITH WHATEVER USER MANUALS OR OTHER LITERATURE MAY BE ISSUED FROM TIME TO TIME.
* IN NO EVENT SHALL BROAD OR ITS RESPECTIVE DIRECTORS, OFFICERS, EMPLOYEES, AFFILIATED INVESTIGATORS AND AFFILIATES BE LIABLE FOR INCIDENTAL OR CONSEQUENTIAL DAMAGES OF ANY KIND, INCLUDING, WITHOUT LIMITATION, ECONOMIC DAMAGES OR INJURY TO PROPERTY AND LOST PROFITS, REGARDLESS OF WHETHER BROAD SHALL BE ADVISED, SHALL HAVE OTHER REASON TO KNOW, OR IN FACT SHALL KNOW OF THE POSSIBILITY OF THE FOREGOING.
*
* 7. ASSIGNMENT
* This Agreement is personal to LICENSEE and any rights or obligations assigned by LICENSEE without the prior written consent of BROAD shall be null and void.
*
* 8. MISCELLANEOUS
* 8.1 Export Control. LICENSEE gives assurance that it will comply with all United States export control laws and regulations controlling the export of the PROGRAM, including, without limitation, all Export Administration Regulations of the United States Department of Commerce. Among other things, these laws and regulations prohibit, or require a license for, the export of certain types of software to specified countries.
* 8.2 Termination. LICENSEE shall have the right to terminate this Agreement for any reason upon prior written notice to BROAD. If LICENSEE breaches any provision hereunder, and fails to cure such breach within thirty (30) days, BROAD may terminate this Agreement immediately. Upon termination, LICENSEE shall provide BROAD with written assurance that the original and all copies of the PROGRAM have been destroyed, except that, upon prior written authorization from BROAD, LICENSEE may retain a copy for archive purposes.
* 8.3 Survival. The following provisions shall survive the expiration or termination of this Agreement: Articles 1, 3, 4, 5 and Sections 2.2, 2.3, 7.3, and 7.4.
* 8.4 Notice. Any notices under this Agreement shall be in writing, shall specifically refer to this Agreement, and shall be sent by hand, recognized national overnight courier, confirmed facsimile transmission, confirmed electronic mail, or registered or certified mail, postage prepaid, return receipt requested. All notices under this Agreement shall be deemed effective upon receipt.
* 8.5 Amendment and Waiver; Entire Agreement. This Agreement may be amended, supplemented, or otherwise modified only by means of a written instrument signed by all parties. Any waiver of any rights or failure to act in a specific instance shall relate only to such instance and shall not be construed as an agreement to waive any rights or fail to act in any other instance, whether or not similar. This Agreement constitutes the entire agreement among the parties with respect to its subject matter and supersedes prior agreements or understandings between the parties relating to its subject matter.
* 8.6 Binding Effect; Headings. This Agreement shall be binding upon and inure to the benefit of the parties and their respective permitted successors and assigns. All headings are for convenience only and shall not affect the meaning of any provision of this Agreement.
* 8.7 Governing Law. This Agreement shall be construed, governed, interpreted and applied in accordance with the internal laws of the Commonwealth of Massachusetts, U.S.A., without regard to conflict of laws principles.
*/
package org.broadinstitute.gatk.tools.walkers.annotator;
import htsjdk.samtools.Cigar;
import htsjdk.samtools.CigarElement;
import htsjdk.samtools.CigarOperator;
import org.broadinstitute.gatk.utils.sam.ArtificialSAMUtils;
import org.broadinstitute.gatk.utils.sam.GATKSAMRecord;
import org.testng.annotations.BeforeClass;
import org.testng.Assert;
import org.testng.annotations.DataProvider;
import org.testng.annotations.Test;
import java.util.ArrayList;
import java.util.LinkedList;
import java.util.List;
public class AnnotationUtilsTest {
//Basic aligned read
private GATKSAMRecord allMatch;
//Read with insertion and deletion
private GATKSAMRecord twoIndels;
//Read with soft clips at start
private GATKSAMRecord softClipStart;
//Read with hard clips at start
private GATKSAMRecord hardClipStart;
//Read with low quality tail
private GATKSAMRecord lowQualTail;
//Read with low quality tail, partially soft clipped
private GATKSAMRecord lowQualClippedTail;
//Read with low quality bases at start
private GATKSAMRecord lowQualStart;
//Read with low quality bases, partially soft clipped at both ends
private GATKSAMRecord lowQualBothEnds;
@BeforeClass
public void init() {
List<CigarElement> cigarElements_allMatch = new LinkedList<>();
cigarElements_allMatch.add(new CigarElement(151, CigarOperator.M));
allMatch = ArtificialSAMUtils.createArtificialRead(new Cigar(cigarElements_allMatch));
List<CigarElement> cigarElements_2indels = new LinkedList<>();
cigarElements_2indels.add(new CigarElement(66, CigarOperator.M));
cigarElements_2indels.add(new CigarElement(10, CigarOperator.I));
cigarElements_2indels.add(new CigarElement(7, CigarOperator.M));
cigarElements_2indels.add(new CigarElement(10, CigarOperator.D));
cigarElements_2indels.add(new CigarElement(68, CigarOperator.M));
twoIndels = ArtificialSAMUtils.createArtificialRead(new Cigar(cigarElements_2indels));
List<CigarElement> cigarElements_softClipStart = new LinkedList<>();
cigarElements_softClipStart.add(new CigarElement(17, CigarOperator.S));
cigarElements_softClipStart.add(new CigarElement(134, CigarOperator.M));
softClipStart = ArtificialSAMUtils.createArtificialRead(new Cigar(cigarElements_softClipStart));
List<CigarElement> cigarElements_hardClipStart = new LinkedList<>();
cigarElements_hardClipStart.add(new CigarElement(17, CigarOperator.H));
cigarElements_hardClipStart.add(new CigarElement(134, CigarOperator.M));
hardClipStart = ArtificialSAMUtils.createArtificialRead(new Cigar(cigarElements_hardClipStart));
final byte [] bases_lowQualTail = {'A', 'C', 'T', 'G', 'A', 'A', 'A', 'A', 'A', 'A'};
final byte [] quals_lowQualTail = {30, 15, 25, 30, 2, 2, 2, 2, 2, 2};
lowQualTail = ArtificialSAMUtils.createArtificialRead(bases_lowQualTail, quals_lowQualTail, "10M");
final byte [] bases_lowQualClippedTail = {'A', 'C', 'T', 'G', 'A', 'A', 'A', 'A', 'A', 'A'};
final byte [] quals_lowQualClippedTail = {30, 15, 25, 30, 2, 2, 2, 2, 2, 2};
lowQualClippedTail = ArtificialSAMUtils.createArtificialRead(bases_lowQualClippedTail, quals_lowQualClippedTail, "8M2S");
final byte [] bases_lowQualStart = {'A', 'A', 'A', 'A', 'A', 'A', 'A', 'C', 'T', 'G'};
final byte [] quals_lowQualStart = {2, 2, 2, 2, 2, 2, 30, 15, 25, 30};
lowQualStart = ArtificialSAMUtils.createArtificialRead(bases_lowQualStart, quals_lowQualStart, "10M");
final byte [] bases_lowQualBothEnds = {'A', 'A', 'A', 'A', 'A', 'A', 'G', 'C', 'T', 'G', 'A', 'A', 'A', 'A', 'A', 'A'};
final byte [] quals_lowQualBothEnds = { 2, 2, 2, 2, 2, 2, 30, 15, 25, 30, 2, 2, 2, 2, 2, 2};
lowQualBothEnds = ArtificialSAMUtils.createArtificialRead(bases_lowQualBothEnds, quals_lowQualBothEnds, "2S12M2S");
}
@DataProvider(name = "makeGetFinalVariantReadPositionTestReads")
public Object[][] makeFinalPosTestReads() {
final List<Object[]> tests = new ArrayList<>();
tests.add(new Object[] {allMatch, 10, 10});
tests.add(new Object[] {allMatch, 140, 10});
tests.add(new Object[] {twoIndels, 10, 10});
tests.add(new Object[] {twoIndels, 140, 10});
tests.add(new Object[] {hardClipStart, 20, 20});
tests.add(new Object[] {hardClipStart, 110, 6}); //this is what the code produces as-is
tests.add(new Object[] {softClipStart, 10, 10});
tests.add(new Object[] {softClipStart, 140, 10});
tests.add(new Object[] {lowQualTail, 3, 0});
tests.add(new Object[] {lowQualTail, 2, 2});
tests.add(new Object[] {lowQualClippedTail, 3, 0});
tests.add(new Object[] {lowQualClippedTail, 2, 2});
tests.add(new Object[] {lowQualStart, 7, -4}); //this is what the code produces as-is, but should be 1
tests.add(new Object[] {lowQualStart, 8, -5}); //this is what the code produces as-is, but should be 1
tests.add(new Object[] {lowQualBothEnds, 7, -4}); //this is what the code produces as-is, but should be 1
tests.add(new Object[] {lowQualBothEnds, 8, -5}); //this is what the code produces as-is, but should be 1
return tests.toArray(new Object[][]{});
}
@Test(dataProvider = "makeGetFinalVariantReadPositionTestReads")
public void testGetFinalVariantReadPosition(GATKSAMRecord read, int variantPosition, int expected) throws Exception {
Assert.assertEquals(AnnotationUtils.getFinalVariantReadPosition(read, variantPosition), expected);
}
@DataProvider(name = "getNumClippedBasesAtStartTestReads")
public Object[][] numClipStart() {
final List<Object[]> tests = new ArrayList<>();
tests.add(new Object[] {allMatch, 0});
tests.add(new Object[] {twoIndels, 0});
tests.add(new Object[] {softClipStart, 0});
tests.add(new Object[] {hardClipStart, 17});
tests.add(new Object[] {lowQualTail, 0});
tests.add(new Object[] {lowQualClippedTail, 0});
tests.add(new Object[] {lowQualStart, 6});
tests.add(new Object[] {lowQualBothEnds, 6});
return tests.toArray(new Object[][]{});
}
@Test(dataProvider = "getNumClippedBasesAtStartTestReads")
public void testGetNumClippedBasesAtStart(GATKSAMRecord read, int expected) throws Exception {
Assert.assertEquals(AnnotationUtils.getNumClippedBasesAtStart(read),expected);
}
@DataProvider(name = "getNumAlignedBasesTestReads")
public Object[][] numAligned() {
final List<Object[]> tests = new ArrayList<>();
tests.add(new Object[] {allMatch, 151});
tests.add(new Object[] {twoIndels, 151});
tests.add(new Object[] {softClipStart, 151});
tests.add(new Object[] {hardClipStart, 117}); //This is what the code produces, but it's wrong
tests.add(new Object[] {lowQualTail, 4});
tests.add(new Object[] {lowQualClippedTail, 4});
tests.add(new Object[] {lowQualStart, 4});
tests.add(new Object[] {lowQualBothEnds, 4});
return tests.toArray(new Object[][]{});
}
@Test(dataProvider = "getNumAlignedBasesTestReads")
public void testGetNumAlignedBases(GATKSAMRecord read, int expected) throws Exception {
Assert.assertEquals(AnnotationUtils.getNumAlignedBases(read),expected);
}
@DataProvider(name = "getNumClippedBasesAtEndTestReads")
public Object[][] numClipEnd() {
final List<Object[]> tests = new ArrayList<>();
tests.add(new Object[] {allMatch, 0});
tests.add(new Object[] {twoIndels, 0});
tests.add(new Object[] {softClipStart, 0});
tests.add(new Object[] {hardClipStart, 0});
tests.add(new Object[] {lowQualTail, 6});
tests.add(new Object[] {lowQualClippedTail, 6});
tests.add(new Object[] {lowQualStart, 0});
tests.add(new Object[] {lowQualBothEnds, 6});
return tests.toArray(new Object[][]{});
}
@Test(dataProvider = "getNumClippedBasesAtEndTestReads")
public void testGetNumClippedBasesAtEnd(GATKSAMRecord read, int expected) throws Exception {
Assert.assertEquals(AnnotationUtils.getNumClippedBasesAtEnd(read), expected);
}
}

View File

@ -0,0 +1,127 @@
/*
* By downloading the PROGRAM you agree to the following terms of use:
*
* BROAD INSTITUTE
* SOFTWARE LICENSE AGREEMENT
* FOR ACADEMIC NON-COMMERCIAL RESEARCH PURPOSES ONLY
*
* This Agreement is made between the Broad Institute, Inc. with a principal address at 415 Main Street, Cambridge, MA 02142 (BROAD) and the LICENSEE and is effective at the date the downloading is completed (EFFECTIVE DATE).
*
* WHEREAS, LICENSEE desires to license the PROGRAM, as defined hereinafter, and BROAD wishes to have this PROGRAM utilized in the public interest, subject only to the royalty-free, nonexclusive, nontransferable license rights of the United States Government pursuant to 48 CFR 52.227-14; and
* WHEREAS, LICENSEE desires to license the PROGRAM and BROAD desires to grant a license on the following terms and conditions.
* NOW, THEREFORE, in consideration of the promises and covenants made herein, the parties hereto agree as follows:
*
* 1. DEFINITIONS
* 1.1 PROGRAM shall mean copyright in the object code and source code known as GATK3 and related documentation, if any, as they exist on the EFFECTIVE DATE and can be downloaded from http://www.broadinstitute.org/gatk on the EFFECTIVE DATE.
*
* 2. LICENSE
* 2.1 Grant. Subject to the terms of this Agreement, BROAD hereby grants to LICENSEE, solely for academic non-commercial research purposes, a non-exclusive, non-transferable license to: (a) download, execute and display the PROGRAM and (b) create bug fixes and modify the PROGRAM. LICENSEE hereby automatically grants to BROAD a non-exclusive, royalty-free, irrevocable license to any LICENSEE bug fixes or modifications to the PROGRAM with unlimited rights to sublicense and/or distribute. LICENSEE agrees to provide any such modifications and bug fixes to BROAD promptly upon their creation.
* The LICENSEE may apply the PROGRAM in a pipeline to data owned by users other than the LICENSEE and provide these users the results of the PROGRAM provided LICENSEE does so for academic non-commercial purposes only. For clarification purposes, academic sponsored research is not a commercial use under the terms of this Agreement.
* 2.2 No Sublicensing or Additional Rights. LICENSEE shall not sublicense or distribute the PROGRAM, in whole or in part, without prior written permission from BROAD. LICENSEE shall ensure that all of its users agree to the terms of this Agreement. LICENSEE further agrees that it shall not put the PROGRAM on a network, server, or other similar technology that may be accessed by anyone other than the LICENSEE and its employees and users who have agreed to the terms of this agreement.
* 2.3 License Limitations. Nothing in this Agreement shall be construed to confer any rights upon LICENSEE by implication, estoppel, or otherwise to any computer software, trademark, intellectual property, or patent rights of BROAD, or of any other entity, except as expressly granted herein. LICENSEE agrees that the PROGRAM, in whole or part, shall not be used for any commercial purpose, including without limitation, as the basis of a commercial software or hardware product or to provide services. LICENSEE further agrees that the PROGRAM shall not be copied or otherwise adapted in order to circumvent the need for obtaining a license for use of the PROGRAM.
*
* 3. PHONE-HOME FEATURE
* LICENSEE expressly acknowledges that the PROGRAM contains an embedded automatic reporting system (PHONE-HOME) which is enabled by default upon download. Unless LICENSEE requests disablement of PHONE-HOME, LICENSEE agrees that BROAD may collect limited information transmitted by PHONE-HOME regarding LICENSEE and its use of the PROGRAM. Such information shall include LICENSEES user identification, version number of the PROGRAM and tools being run, mode of analysis employed, and any error reports generated during run-time. Collection of such information is used by BROAD solely to monitor usage rates, fulfill reporting requirements to BROAD funding agencies, drive improvements to the PROGRAM, and facilitate adjustments to PROGRAM-related documentation.
*
* 4. OWNERSHIP OF INTELLECTUAL PROPERTY
* LICENSEE acknowledges that title to the PROGRAM shall remain with BROAD. The PROGRAM is marked with the following BROAD copyright notice and notice of attribution to contributors. LICENSEE shall retain such notice on all copies. LICENSEE agrees to include appropriate attribution if any results obtained from use of the PROGRAM are included in any publication.
* Copyright 2012-2015 Broad Institute, Inc.
* Notice of attribution: The GATK3 program was made available through the generosity of Medical and Population Genetics program at the Broad Institute, Inc.
* LICENSEE shall not use any trademark or trade name of BROAD, or any variation, adaptation, or abbreviation, of such marks or trade names, or any names of officers, faculty, students, employees, or agents of BROAD except as states above for attribution purposes.
*
* 5. INDEMNIFICATION
* LICENSEE shall indemnify, defend, and hold harmless BROAD, and their respective officers, faculty, students, employees, associated investigators and agents, and their respective successors, heirs and assigns, (Indemnitees), against any liability, damage, loss, or expense (including reasonable attorneys fees and expenses) incurred by or imposed upon any of the Indemnitees in connection with any claims, suits, actions, demands or judgments arising out of any theory of liability (including, without limitation, actions in the form of tort, warranty, or strict liability and regardless of whether such action has any factual basis) pursuant to any right or license granted under this Agreement.
*
* 6. NO REPRESENTATIONS OR WARRANTIES
* THE PROGRAM IS DELIVERED AS IS. BROAD MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND CONCERNING THE PROGRAM OR THE COPYRIGHT, EXPRESS OR IMPLIED, INCLUDING, WITHOUT LIMITATION, WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, NONINFRINGEMENT, OR THE ABSENCE OF LATENT OR OTHER DEFECTS, WHETHER OR NOT DISCOVERABLE. BROAD EXTENDS NO WARRANTIES OF ANY KIND AS TO PROGRAM CONFORMITY WITH WHATEVER USER MANUALS OR OTHER LITERATURE MAY BE ISSUED FROM TIME TO TIME.
* IN NO EVENT SHALL BROAD OR ITS RESPECTIVE DIRECTORS, OFFICERS, EMPLOYEES, AFFILIATED INVESTIGATORS AND AFFILIATES BE LIABLE FOR INCIDENTAL OR CONSEQUENTIAL DAMAGES OF ANY KIND, INCLUDING, WITHOUT LIMITATION, ECONOMIC DAMAGES OR INJURY TO PROPERTY AND LOST PROFITS, REGARDLESS OF WHETHER BROAD SHALL BE ADVISED, SHALL HAVE OTHER REASON TO KNOW, OR IN FACT SHALL KNOW OF THE POSSIBILITY OF THE FOREGOING.
*
* 7. ASSIGNMENT
* This Agreement is personal to LICENSEE and any rights or obligations assigned by LICENSEE without the prior written consent of BROAD shall be null and void.
*
* 8. MISCELLANEOUS
* 8.1 Export Control. LICENSEE gives assurance that it will comply with all United States export control laws and regulations controlling the export of the PROGRAM, including, without limitation, all Export Administration Regulations of the United States Department of Commerce. Among other things, these laws and regulations prohibit, or require a license for, the export of certain types of software to specified countries.
* 8.2 Termination. LICENSEE shall have the right to terminate this Agreement for any reason upon prior written notice to BROAD. If LICENSEE breaches any provision hereunder, and fails to cure such breach within thirty (30) days, BROAD may terminate this Agreement immediately. Upon termination, LICENSEE shall provide BROAD with written assurance that the original and all copies of the PROGRAM have been destroyed, except that, upon prior written authorization from BROAD, LICENSEE may retain a copy for archive purposes.
* 8.3 Survival. The following provisions shall survive the expiration or termination of this Agreement: Articles 1, 3, 4, 5 and Sections 2.2, 2.3, 7.3, and 7.4.
* 8.4 Notice. Any notices under this Agreement shall be in writing, shall specifically refer to this Agreement, and shall be sent by hand, recognized national overnight courier, confirmed facsimile transmission, confirmed electronic mail, or registered or certified mail, postage prepaid, return receipt requested. All notices under this Agreement shall be deemed effective upon receipt.
* 8.5 Amendment and Waiver; Entire Agreement. This Agreement may be amended, supplemented, or otherwise modified only by means of a written instrument signed by all parties. Any waiver of any rights or failure to act in a specific instance shall relate only to such instance and shall not be construed as an agreement to waive any rights or fail to act in any other instance, whether or not similar. This Agreement constitutes the entire agreement among the parties with respect to its subject matter and supersedes prior agreements or understandings between the parties relating to its subject matter.
* 8.6 Binding Effect; Headings. This Agreement shall be binding upon and inure to the benefit of the parties and their respective permitted successors and assigns. All headings are for convenience only and shall not affect the meaning of any provision of this Agreement.
* 8.7 Governing Law. This Agreement shall be construed, governed, interpreted and applied in accordance with the internal laws of the Commonwealth of Massachusetts, U.S.A., without regard to conflict of laws principles.
*/
package org.broadinstitute.gatk.tools.walkers.annotator;
import org.testng.Assert;
import org.testng.annotations.DataProvider;
import org.testng.annotations.Test;
import java.util.ArrayList;
import java.util.List;
public class StrandBiasTableUtilsTest {
private static final double DELTA_PRECISION = 10e-7;
@DataProvider(name = "UsingTable")
public Object[][] makeUsingTable() {
/* NOTE: the expected P values were computed in R as follows
fisher = function(v) {
return(fisher.test(matrix(v, nrow=2, ncol=2))$p.value)
}
*/
//> fisher(c(2068, 6796, 1133, 0))
final List<Object[]> tests = new ArrayList<>();
tests.add(new Object[]{0, 0, 0, 0, 1.0});
tests.add(new Object[]{100000, 100000, 100000, 100000, 1.0});
tests.add(new Object[]{1, 2, 3, 4, 1.0});
tests.add(new Object[]{0, 0, 100000, 100000, 1.0});
tests.add(new Object[]{100000, 100000, 100000, 0, 0.0}); //below R's or Java's precision
tests.add(new Object[]{200000, 100000, 1, 2, 1.0}); //differs from GATK4 implementation
tests.add(new Object[]{100, 100, 100, 0, 3.730187e-23});
tests.add(new Object[]{13736, 9047, 41, 1433, 1.232E-4}); //differs from GATK4 implementation
tests.add(new Object[]{66, 14, 64, 4, 0.0688244});
tests.add(new Object[]{351169, 306836, 153739, 2379, 0.0}); //below R's or Java's precision
tests.add(new Object[]{116449, 131216, 289, 16957, 0.0026801}); //differs from GATK4 implementation
tests.add(new Object[]{137, 159, 9, 23, 0.10752410}); //differs from GATK4 implementation
tests.add(new Object[]{129, 90, 21, 20, 0.6450772}); //differs from GATK4 implementation
tests.add(new Object[]{14054, 9160, 16, 7827, 0.0}); //below R's or Java's precision
tests.add(new Object[]{32803, 9184, 32117, 3283, 0.0289540}); //differs from GATK4 implementation
tests.add(new Object[]{2068, 6796, 1133, 0, 0.0}); //below R's or Java's precision
return tests.toArray(new Object[][]{});
}
@Test(dataProvider = "UsingTable")
public void testPValueForContingencyTable(final int refpos, final int refneg, final int altpos, final int altneg, double expectedPvalue) throws Exception {
final int[][] contingencyTable = new int[2][2];
contingencyTable[0][0] = refpos;
contingencyTable[0][1] = refneg;
contingencyTable[1][0] = altpos;
contingencyTable[1][1] = altneg;
final double pvalue = StrandBiasTableUtils.FisherExactPValueForContingencyTable(contingencyTable);
Assert.assertEquals(pvalue, expectedPvalue, DELTA_PRECISION, "Pvalues");
}
@Test
public void testGetContingencyArray() throws Exception {
final int[][] t = new int[2][2];
t[0][0] = 1; t[0][1] = 2; t[1][0] = 3; t[1][1] = 4;
final List<Integer> tList = StrandBiasTableUtils.getContingencyArray(t);
final List<Integer> truthList = new ArrayList<Integer>();
truthList.add(1); truthList.add(2); truthList.add(3); truthList.add(4);
Assert.assertEquals(tList, truthList);
}
@Test
public void testCopyContingencyTable() throws Exception {
final int[][] t = new int[2][2];
t[0][0] = 1; t[0][1] = 2; t[1][0] = 3; t[1][1] = 4;
final int[][] t2 = StrandBiasTableUtils.copyContingencyTable(t);
Assert.assertTrue(t[0][0] == t2[0][0]);
Assert.assertTrue(t[1][0] == t2[1][0]);
Assert.assertTrue(t[0][1] == t2[0][1]);
Assert.assertTrue(t[1][1] == t2[1][1]);
}
}

View File

@ -76,11 +76,11 @@ public class HaplotypeCallerGVCFIntegrationTest extends WalkerTest {
// this functionality can be adapted to provide input data for whatever you might want in your data
tests.add(new Object[]{NA12878_PCRFREE, ReferenceConfidenceMode.NONE, PCRFreeIntervals, "7f09c261950bf86e435edfa69ed2ec71"});
tests.add(new Object[]{NA12878_PCRFREE, ReferenceConfidenceMode.BP_RESOLUTION, PCRFreeIntervals, "1ecdfe6eeaa1bd8fe1ba690945af9fbf"});
tests.add(new Object[]{NA12878_PCRFREE, ReferenceConfidenceMode.GVCF, PCRFreeIntervals, "274b1e1016ca62f5ff81e398d17b63ab"});
tests.add(new Object[]{NA12878_PCRFREE, ReferenceConfidenceMode.BP_RESOLUTION, PCRFreeIntervals, "70605a2070d6d614d70b8547eedad2cd"});
tests.add(new Object[]{NA12878_PCRFREE, ReferenceConfidenceMode.GVCF, PCRFreeIntervals, "c439e51453a031ca60fe96b75423e589"});
tests.add(new Object[]{NA12878_WEx, ReferenceConfidenceMode.NONE, WExIntervals, "03e5815cc351f1ec2feed89d2aed8268"});
tests.add(new Object[]{NA12878_WEx, ReferenceConfidenceMode.BP_RESOLUTION, WExIntervals, "5854a0067b888947fb79ceed8a303fa7"});
tests.add(new Object[]{NA12878_WEx, ReferenceConfidenceMode.GVCF, WExIntervals, "46e7eb6a031cdcbe512aedac1e35f81e"});
tests.add(new Object[]{NA12878_WEx, ReferenceConfidenceMode.BP_RESOLUTION, WExIntervals, "5d5cca382bdf6987b2aef87918ed374c"});
tests.add(new Object[]{NA12878_WEx, ReferenceConfidenceMode.GVCF, WExIntervals, "532888aff96c356397a21ef790636818"});
return tests.toArray(new Object[][]{});
}
@ -95,12 +95,12 @@ public class HaplotypeCallerGVCFIntegrationTest extends WalkerTest {
// this functionality can be adapted to provide input data for whatever you might want in your data
tests.add(new Object[]{NA12878_PCRFREE, ReferenceConfidenceMode.NONE, PCRFreeIntervals, "cfe629c5a3be3b6524258ad1f9145488"});
tests.add(new Object[]{NA12878_PCRFREE, ReferenceConfidenceMode.BP_RESOLUTION, PCRFreeIntervals, "b7952299b34db309026c8cbaaec2aa1e"});
tests.add(new Object[]{NA12878_PCRFREE, ReferenceConfidenceMode.GVCF, PCRFreeIntervals, "fa7363b2adbd459bc7acb75b3beeb13c"});
tests.add(new Object[]{NA12878_PCRFREE, ReferenceConfidenceMode.BP_RESOLUTION, PCRFreeIntervals, "12fbfffa4bb2b8d520f8021a40b37d19"});
tests.add(new Object[]{NA12878_PCRFREE, ReferenceConfidenceMode.GVCF, PCRFreeIntervals, "a5ea6d4052bbf9e8bba9011bc6f0d203"});
tests.add(new Object[]{NA12878_WEx, ReferenceConfidenceMode.NONE, WExIntervals, "31da2254620f4a9c34ccf7c311cc133f"});
tests.add(new Object[]{NA12878_WEx, ReferenceConfidenceMode.BP_RESOLUTION, WExIntervals, "78b1d207f37ce17183461dd9b6814ec8"});
tests.add(new Object[]{NA12878_WEx, ReferenceConfidenceMode.BP_RESOLUTION, WExIntervals, "c95f52fe395392331dc3102902d54408"});
final String NA12878bandedResolutionMD5 = "2d54e7ed33001848fda3df66ee6e1999";
final String NA12878bandedResolutionMD5 = "fbe6099d138a069a65e4713bcae1e873";
tests.add(new Object[]{NA12878_WEx, ReferenceConfidenceMode.GVCF, WExIntervals, NA12878bandedResolutionMD5});
tests.add(new Object[]{NA12878_WEx + " -I " + privateTestDir + "NA20313.highCoverageRegion.bam -sn NA12878",
ReferenceConfidenceMode.GVCF, WExIntervals, NA12878bandedResolutionMD5});
@ -118,11 +118,11 @@ public class HaplotypeCallerGVCFIntegrationTest extends WalkerTest {
// this functionality can be adapted to provide input data for whatever you might want in your data
tests.add(new Object[]{NA12878_PCRFREE, ReferenceConfidenceMode.NONE, PCRFreeIntervals, "4c33ec3f6b9dce74e545405588c76ed1"});
tests.add(new Object[]{NA12878_PCRFREE, ReferenceConfidenceMode.BP_RESOLUTION, PCRFreeIntervals, "c65d97f6eac66769fe1dbcf7b671459d"});
tests.add(new Object[]{NA12878_PCRFREE, ReferenceConfidenceMode.GVCF, PCRFreeIntervals, "7bcd06d7ada7495b641df6545b76bf24"});
tests.add(new Object[]{NA12878_PCRFREE, ReferenceConfidenceMode.BP_RESOLUTION, PCRFreeIntervals, "d38becbac8a2e0c4b77e9cc11c1ba891"});
tests.add(new Object[]{NA12878_PCRFREE, ReferenceConfidenceMode.GVCF, PCRFreeIntervals, "57c0f8baaa34f4e0300eaec7b938970b"});
tests.add(new Object[]{NA12878_WEx, ReferenceConfidenceMode.NONE, WExIntervals, "4f8e3e249509a24da21d5dd8e3594f92"});
tests.add(new Object[]{NA12878_WEx, ReferenceConfidenceMode.BP_RESOLUTION, WExIntervals, "61b225aaf53c85e755ff3790f1a86818"});
tests.add(new Object[]{NA12878_WEx, ReferenceConfidenceMode.GVCF, WExIntervals, "d150defe805f74b65011c7ec02a3e952"});
tests.add(new Object[]{NA12878_WEx, ReferenceConfidenceMode.BP_RESOLUTION, WExIntervals, "aeb64e3cb8d141aa3f8ac7506db15e19"});
tests.add(new Object[]{NA12878_WEx, ReferenceConfidenceMode.GVCF, WExIntervals, "03af03feb96e402048801ecfa75344ad"});
return tests.toArray(new Object[][]{});
}
@ -136,11 +136,11 @@ public class HaplotypeCallerGVCFIntegrationTest extends WalkerTest {
// this functionality can be adapted to provide input data for whatever you might want in your data
tests.add(new Object[]{NA12878_PCRFREE, ReferenceConfidenceMode.NONE, PCRFreeIntervals, "0bec872d0dc72cc68538cfbd236e9933"});
tests.add(new Object[]{NA12878_PCRFREE, ReferenceConfidenceMode.BP_RESOLUTION, PCRFreeIntervals, "10cb558e49de284f1e559c7ed114f9ab"});
tests.add(new Object[]{NA12878_PCRFREE, ReferenceConfidenceMode.GVCF, PCRFreeIntervals, "3512803a103c7c2da86a18609bcd7061"});
tests.add(new Object[]{NA12878_PCRFREE, ReferenceConfidenceMode.BP_RESOLUTION, PCRFreeIntervals, "694405c8f6497b5ce99145538c325cba"});
tests.add(new Object[]{NA12878_PCRFREE, ReferenceConfidenceMode.GVCF, PCRFreeIntervals, "47ddf81ed143451aeb27969d397ce4ac"});
tests.add(new Object[]{NA12878_WEx, ReferenceConfidenceMode.NONE, WExIntervals, "c724ecdaea7eab5a6239ff4daaa6e034"});
tests.add(new Object[]{NA12878_WEx, ReferenceConfidenceMode.BP_RESOLUTION, WExIntervals, "170bdab72eef971598a0f2289a7ee4e3"});
tests.add(new Object[]{NA12878_WEx, ReferenceConfidenceMode.GVCF, WExIntervals, "825da547dc8100a5a47cd41dd31d2aef"});
tests.add(new Object[]{NA12878_WEx, ReferenceConfidenceMode.BP_RESOLUTION, WExIntervals, "f80cc551fe7763bacc46ea6d56582bba"});
tests.add(new Object[]{NA12878_WEx, ReferenceConfidenceMode.GVCF, WExIntervals, "43ef1e04d015c2bd9d513a67a6fc7a02"});
return tests.toArray(new Object[][]{});
}
@ -266,7 +266,7 @@ public class HaplotypeCallerGVCFIntegrationTest extends WalkerTest {
public void testWrongGVCFNonVariantRecordOrderBugFix() {
final String commandLine = String.format("-T HaplotypeCaller --pcr_indel_model NONE -pairHMMSub %s %s -R %s -I %s -L %s -ERC GVCF --no_cmdline_in_header -variant_index_type %s -variant_index_parameter %d",
HMM_SUB_IMPLEMENTATION, ALWAYS_LOAD_VECTOR_HMM, b37KGReference, WRONG_GVCF_RECORD_ORDER_BUGFIX_BAM, WRONG_GVCF_RECORD_ORDER_BUGFIX_INTERVALS, GATKVCFUtils.DEFAULT_GVCF_INDEX_TYPE, GATKVCFUtils.DEFAULT_GVCF_INDEX_PARAMETER);
final WalkerTestSpec spec = new WalkerTestSpec(commandLine + " -o %s", Arrays.asList("5dbb42e0140fc21315d58495030f56a2"));
final WalkerTestSpec spec = new WalkerTestSpec(commandLine + " -o %s", Arrays.asList("9b3b232ceeb109f2624826ea20825a82"));
spec.disableShadowBCF();
executeTest("testMissingGVCFIndexingStrategyException", spec);
}
@ -283,7 +283,7 @@ public class HaplotypeCallerGVCFIntegrationTest extends WalkerTest {
public void testNoCallGVCFMissingPLsBugFix() {
final String commandLine = String.format("-T HaplotypeCaller --pcr_indel_model NONE -pairHMMSub %s %s -R %s -I %s -L %s -ERC GVCF --no_cmdline_in_header -variant_index_type %s -variant_index_parameter %d",
HMM_SUB_IMPLEMENTATION, ALWAYS_LOAD_VECTOR_HMM, b37KGReference, NOCALL_GVCF_BUGFIX_BAM, NOCALL_GVCF_BUGFIX_INTERVALS, GATKVCFUtils.DEFAULT_GVCF_INDEX_TYPE, GATKVCFUtils.DEFAULT_GVCF_INDEX_PARAMETER);
final WalkerTestSpec spec = new WalkerTestSpec(commandLine + " -o %s", Arrays.asList("0eba41fa151a9079852fe9c77becde50"));
final WalkerTestSpec spec = new WalkerTestSpec(commandLine + " -o %s", Arrays.asList("c5a017f1cbd60219506be76f30fc4468"));
spec.disableShadowBCF();
executeTest("testNoCallGVCFMissingPLsBugFix", spec);
}
@ -312,4 +312,13 @@ public class HaplotypeCallerGVCFIntegrationTest extends WalkerTest {
executeTest(" testGeneralPloidyArrayIndexBug2Fix", spec);
}
@Test
public void testAlleleSpecificAnnotations() {
final String commandLine = String.format("-T HaplotypeCaller --pcr_indel_model NONE -pairHMMSub %s %s -R %s -I %s -L %s -ERC GVCF --no_cmdline_in_header -variant_index_type %s -variant_index_parameter %d -G Standard -G AS_Standard --disableDithering",
HMM_SUB_IMPLEMENTATION, ALWAYS_LOAD_VECTOR_HMM, b37KGReference, privateTestDir + "NA12878.HiSeq.b37.chr20.10_11mb.bam", "20:10433000-10437000", GATKVCFUtils.DEFAULT_GVCF_INDEX_TYPE, GATKVCFUtils.DEFAULT_GVCF_INDEX_PARAMETER);
final WalkerTestSpec spec = new WalkerTestSpec(commandLine + " -o %s", Arrays.asList("9a2887ed7afa829a75ec7d6e11dfc45f"));
spec.disableShadowBCF();
executeTest(" testAlleleSpecificAnnotations", spec);
}
}

View File

@ -272,4 +272,13 @@ public class CombineGVCFsIntegrationTest extends WalkerTest {
spec.disableShadowBCF();
executeTest("testBasepairResolutionInput", spec);
}
@Test
public void testAlleleSpecificAnnotations() throws Exception {
final String cmd = "-T CombineGVCFs -R " + b37KGReference + " -o %s --no_cmdline_in_header -G Standard -G AS_Standard -V "
+ privateTestDir + "NA12878.AS.chr20snippet.g.vcf -V " + privateTestDir + "NA12891.AS.chr20snippet.g.vcf";
final WalkerTestSpec spec = new WalkerTestSpec(cmd, 1, Arrays.asList("f06d60332cd46036813f652de6d807c9"));
spec.disableShadowBCF();
executeTest("testAlleleSpecificAnnotations", spec);
}
}

View File

@ -259,7 +259,7 @@ public class GenotypeGVCFsIntegrationTest extends WalkerTest {
final WalkerTestSpec spec = new WalkerTestSpec(
baseTestString(" -V " + gVCF.getAbsolutePath(), b37KGReference),
1,
Arrays.asList("b81a94763ea9714ed6289acd3d39c4cf"));
Arrays.asList("81ae1bc09f9cd34b0b37ae3cb4b09bc8"));
spec.disableShadowBCF(); //TODO: Remove when BaseTest.assertAttributesEquals() works with SAC
executeTest("testStrandAlleleCountsBySample", spec);
}
@ -560,4 +560,23 @@ public class GenotypeGVCFsIntegrationTest extends WalkerTest {
Arrays.asList("d617884b08ee85816f1ba1acf11f1738"));
executeTest("testSetZeroRGQsToNoCall", spec);
}
@Test
public void testAlleleSpecificAnnotations() {
final String cmd = "-T GenotypeGVCFs -R " + b37KGReference + " -o %s --no_cmdline_in_header -G Standard -G AS_Standard --disableDithering -V "
+ privateTestDir + "NA12878.AS.chr20snippet.g.vcf -V " + privateTestDir + "NA12891.AS.chr20snippet.g.vcf";
final WalkerTestSpec spec = new WalkerTestSpec(cmd, 1, Arrays.asList("69b5d245a7411ab45e6bd211cc09a53d"));
spec.disableShadowBCF();
executeTest("testAlleleSpecificAnnotations", spec);
}
@Test
//make sure none of the assumptions about things being merged as lists break the single-sample case
public void testAlleleSpecificAnnotations_oneSample() {
final String cmd = "-T GenotypeGVCFs -R " + b37KGReference + " -o %s --no_cmdline_in_header -G Standard -G AS_Standard --disableDithering -V "
+ privateTestDir + "NA12878.AS.chr20snippet.g.vcf";
final WalkerTestSpec spec = new WalkerTestSpec(cmd, 1, Arrays.asList("14ecc30b631afce3e3029484c9ee7b92"));
spec.disableShadowBCF();
executeTest("testAlleleSpecificAnnotations", spec);
}
}

View File

@ -52,6 +52,8 @@
package org.broadinstitute.gatk.tools.walkers.variantutils;
import htsjdk.variant.variantcontext.*;
import org.broadinstitute.gatk.engine.GenomeAnalysisEngine;
import org.broadinstitute.gatk.tools.walkers.annotator.VariantAnnotatorEngine;
import org.broadinstitute.gatk.utils.*;
import org.broadinstitute.gatk.utils.exceptions.UserException;
import org.broadinstitute.gatk.utils.fasta.CachingIndexedFastaSequenceFile;
@ -65,6 +67,7 @@ import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collections;
import java.util.List;
/**
@ -103,7 +106,7 @@ public class VariantContextMergerUnitTest extends BaseTest {
@Test(dataProvider = "referenceConfidenceMergeData")
public void testReferenceConfidenceMerge(final String testID, final List<VariantContext> toMerge, final GenomeLoc loc,
final boolean returnSiteEvenIfMonomorphic, final boolean uniquifySamples, final VariantContext expectedResult) {
final VariantContext result = ReferenceConfidenceVariantContextMerger.merge(toMerge, loc, returnSiteEvenIfMonomorphic ? (byte) 'A' : null, true, uniquifySamples);
final VariantContext result = ReferenceConfidenceVariantContextMerger.merge(toMerge, loc, returnSiteEvenIfMonomorphic ? (byte) 'A' : null, true, uniquifySamples, null);
if ( result == null ) {
Assert.assertTrue(expectedResult == null);
return;

View File

@ -0,0 +1,96 @@
/*
* Copyright 2012-2015 Broad Institute, Inc.
*
* Permission is hereby granted, free of charge, to any person
* obtaining a copy of this software and associated documentation
* files (the "Software"), to deal in the Software without
* restriction, including without limitation the rights to use,
* copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following
* conditions:
*
* The above copyright notice and this permission notice shall be
* included in all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR
* THE USE OR OTHER DEALINGS IN THE SOFTWARE.
*/
package org.broadinstitute.gatk.tools.walkers.annotator;
import htsjdk.variant.variantcontext.Allele;
import org.broadinstitute.gatk.utils.exceptions.GATKException;
import java.util.Collections;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
/**
* A class to encapsulate the raw data for allele-specific classes compatible with the ReducibleAnnotation interface
* @param <T> the type of raw data to be stored for later annotation calculation
*/
public class AlleleSpecificAnnotationData<T> extends ReducibleAnnotationData<T>{
final private List<Allele> alleleList;
private Allele refAllele;
public AlleleSpecificAnnotationData(final List<Allele> inputAlleles, final String inputData) {
super(inputData);
attributeMap = new HashMap<>();
for(final Allele a : inputAlleles) {
attributeMap.put(a, null);
}
alleleList = inputAlleles;
for(Allele a : alleleList) {
if(a.isReference()) {
refAllele = a;
}
}
}
@Override
public List<Allele> getAlleles() {return Collections.unmodifiableList(alleleList);}
/**
* Get the reference allele for this allele-specific data.
* (Used in cases where annotations compare some attribute of the alt alleles to that of the reference.)
* @return the reference allele for this data
*/
public Allele getRefAllele() {return refAllele;}
public void setAttributeMap(Map<Allele, T> inputMap) {
super.setAttributeMap(inputMap);
checkRefAlleles();
}
private void checkRefAlleles() {
boolean foundRef = false;
for (Allele a : alleleList) {
if (a.isReference()) {
if (foundRef)
throw new GATKException("ERROR: multiple reference alleles found in annotation data\n");
foundRef = true;
}
}
if (!foundRef)
throw new GATKException("ERROR: no reference alleles found in annotation data\n");
}
public String makeRawAnnotationString(String printDelim) {
String annotationString = "";
for (final Allele current : alleleList) {
if (!annotationString.isEmpty())
annotationString += printDelim;
if(attributeMap.get(current) != null)
annotationString += attributeMap.get(current).toString();
}
return annotationString.replaceAll("[\\[\\]\\s]", "");
}
}

View File

@ -0,0 +1,117 @@
/*
* Copyright 2012-2015 Broad Institute, Inc.
*
* Permission is hereby granted, free of charge, to any person
* obtaining a copy of this software and associated documentation
* files (the "Software"), to deal in the Software without
* restriction, including without limitation the rights to use,
* copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following
* conditions:
*
* The above copyright notice and this permission notice shall be
* included in all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR
* THE USE OR OTHER DEALINGS IN THE SOFTWARE.
*/
package org.broadinstitute.gatk.tools.walkers.annotator;
import java.util.Arrays;
import java.util.HashMap;
import java.util.Iterator;
import java.util.Map;
/**
* A class to represent data as a list of <value,count> pairs. For example, the list 2,2,2,2,2,2,3,4,4,4,5,5
* would be compressed as 2,6,3,1,4,3,5,2. The compressed list should be sorted in ascending order by value.
*
* Created by gauthier on 9/25/15.
*/
public class CompressedDataList<T> implements Iterable<T> {
protected Map<T,Integer> valueCounts = new HashMap<>();
public Map<T,Integer> getValueCounts(){
return valueCounts;
}
public boolean isEmpty(){
return valueCounts.isEmpty();
}
@Override
public Iterator<T> iterator(){
Iterator<T> it = new Iterator<T>() {
private Iterator<T> keySetIterator = valueCounts.keySet().iterator();
private T currentKey = valueCounts.isEmpty() ? null : keySetIterator.next();
private int currentValueIndex = 0;
private int currentValueSize = valueCounts.isEmpty() ? 0 : valueCounts.get(currentKey);
@Override
public boolean hasNext() {
return !valueCounts.isEmpty() && (keySetIterator.hasNext() || currentValueIndex < currentValueSize);
}
@Override
public T next() {
T retKey = currentKey;
currentValueIndex++;
if(currentValueIndex==currentValueSize){
if(keySetIterator.hasNext()) {
currentKey = keySetIterator.next();
currentValueIndex = 0;
currentValueSize = valueCounts.get(currentKey);
}
}
return retKey;
}
@Override
public void remove() {
throw new UnsupportedOperationException();
}
};
return it;
}
@Override
public String toString(){
String str = "";
Object[] keys = valueCounts.keySet().toArray();
Arrays.sort(keys);
for (Object i: keys){
if(!str.isEmpty())
str+=",";
str+=(i+","+valueCounts.get(i));
}
return str;
}
public void add(final T val){
add(val, 1);
}
public void add(final T val, final int count){
if(valueCounts.containsKey(val)){
valueCounts.put(val, valueCounts.get(val)+count);
}
else
valueCounts.put(val, count);
}
public void add(final CompressedDataList<T> obj){
for(Map.Entry<T, Integer> pair : obj.getValueCounts().entrySet()){
this.add(pair.getKey(),pair.getValue());
}
}
}

View File

@ -0,0 +1,105 @@
/*
* Copyright 2012-2015 Broad Institute, Inc.
*
* Permission is hereby granted, free of charge, to any person
* obtaining a copy of this software and associated documentation
* files (the "Software"), to deal in the Software without
* restriction, including without limitation the rights to use,
* copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following
* conditions:
*
* The above copyright notice and this permission notice shall be
* included in all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR
* THE USE OR OTHER DEALINGS IN THE SOFTWARE.
*/
package org.broadinstitute.gatk.tools.walkers.annotator;
import htsjdk.variant.variantcontext.Allele;
import java.util.*;
/**
* A class to encapsulate the raw data for classes compatible with the ReducibleAnnotation interface
*/
public class ReducibleAnnotationData<T> {
protected String rawData;
protected Map<Allele, T> attributeMap;
/**
* Create a new ReducibleAnnotationData using the raw data string from a VCF
* @param inputData the raw data as read in from a VCF
*/
public ReducibleAnnotationData(final String inputData) {
rawData = inputData; attributeMap = new HashMap<>();
attributeMap.put(Allele.NO_CALL, null);
}
/**
*
* @return the string of raw data as represented in the VCF
*/
public String getRawData() {return rawData;}
/**
* Note: parent class ReducibleAnnotationData is non-allele specific and stores all values with the no-call allele
* @return the list of alleles for which we have raw annotation data
*/
public List<Allele> getAlleles() {
List ret = new ArrayList<Allele>();
ret.addAll(attributeMap.keySet());
return ret;
}
/**
*
* @param key the allele of interest
* @return do we have data for the allele of interest?
*/
public boolean hasAttribute(Allele key) {
return attributeMap.containsKey(key);
}
/**
*
* @param key the allele of interest
* @return data for the allele of interest
*/
public T getAttribute(Allele key) {
return attributeMap.get(key);
}
/**
*
* @param key the allele of interest
* @param value raw data corresponding to the allele of interest
*/
public void putAttribute(Allele key, T value) {
attributeMap.put(key, value);
}
/**
* Assign all of the per-allele raw data at once
* @param inputMap the pre-calculated per-allele data
*/
public void setAttributeMap(Map<Allele, T> inputMap) {
attributeMap = inputMap;
}
/**
* Get the stored raw per-allele data
* @return
*/
public Map<Allele, T> getAttributeMap() {return Collections.unmodifiableMap(attributeMap);}
}

View File

@ -48,6 +48,8 @@ import java.util.*;
public class VariantAnnotatorEngine {
private final static Logger logger = Logger.getLogger(VariantAnnotatorEngine.class);
private List<InfoFieldAnnotation> requestedInfoAnnotations = Collections.emptyList();
private List<InfoFieldAnnotation> requestedReducibleInfoAnnotations = new ArrayList<>();
private List<InfoFieldAnnotation> requestedNonReducibleInfoAnnotations = new ArrayList<>();
private List<GenotypeAnnotation> requestedGenotypeAnnotations = Collections.emptyList();
private List<VAExpression> requestedExpressions = new ArrayList<>();
private boolean expressionAlleleConcordance = false;
@ -90,6 +92,7 @@ public class VariantAnnotatorEngine {
requestedInfoAnnotations = AnnotationInterfaceManager.createAllInfoFieldAnnotations();
requestedGenotypeAnnotations = AnnotationInterfaceManager.createAllGenotypeAnnotations();
excludeAnnotations(annotationsToExclude);
setReducibleAnnotations();
initializeDBs(toolkit);
}
@ -98,6 +101,7 @@ public class VariantAnnotatorEngine {
this.walker = walker;
this.toolkit = toolkit;
initializeAnnotations(annotationGroupsToUse, annotationsToUse, annotationsToExclude);
setReducibleAnnotations();
initializeDBs(toolkit);
}
@ -224,36 +228,83 @@ public class VariantAnnotatorEngine {
return annotateDBs(tracker, annotated);
}
/**
*
* @param referenceContext
* @param tracker
* @param readLikelihoods
* @param vc
* @param useRaw output annotation data as raw data? (Yes in the case of gVCF mode for HaplotypeCaller)
* @return
*/
public VariantContext annotateContextForActiveRegion(final ReferenceContext referenceContext,
final RefMetaDataTracker tracker,
final ReadLikelihoods<Allele> readLikelihoods,
final VariantContext vc) {
final VariantContext vc,
final boolean useRaw) {
//TODO we transform the read-likelihood into the Map^2 previous version for the sake of not changing of not changing annotation interface.
//TODO should we change those interfaces?
final Map<String, PerReadAlleleLikelihoodMap> annotationLikelihoods = readLikelihoods.toPerReadAlleleLikelihoodMap();
return annotateContextForActiveRegion(referenceContext, tracker, annotationLikelihoods, vc);
return annotateContextForActiveRegion(referenceContext, tracker, annotationLikelihoods, vc, useRaw);
}
/**
*
* @param referenceContext
* @param tracker
* @param perReadAlleleLikelihoodMap
* @param vc
* @param useRaw output annotation data as raw data? (Yes in the case of gVCF mode for HaplotypeCaller)
* @return
*/
public VariantContext annotateContextForActiveRegion(final ReferenceContext referenceContext,
final RefMetaDataTracker tracker,
final Map<String, PerReadAlleleLikelihoodMap> perReadAlleleLikelihoodMap,
final VariantContext vc) {
final VariantContext vc,
final boolean useRaw) {
// annotate genotypes
final VariantContextBuilder builder = new VariantContextBuilder(vc).genotypes(annotateGenotypes(null, null, null, vc, perReadAlleleLikelihoodMap));
VariantContext newGenotypeAnnotatedVC = builder.make();
final Map<String, Object> infoAnnotations = new LinkedHashMap<>(newGenotypeAnnotatedVC.getAttributes());
// go through all the requested info annotationTypes
for ( final InfoFieldAnnotation annotationType : requestedInfoAnnotations ) {
// go through all the requested info annotationTypes that are reducible
if (useRaw) {
for (final InfoFieldAnnotation annotationType : requestedReducibleInfoAnnotations) {
if (!(annotationType instanceof ActiveRegionBasedAnnotation))
continue;
ReducibleAnnotation currentASannotation = (ReducibleAnnotation) annotationType;
final Map<String, Object> annotationsFromCurrentType = currentASannotation.annotateRawData(null, null, referenceContext, null, newGenotypeAnnotatedVC, perReadAlleleLikelihoodMap);
if (annotationsFromCurrentType != null) {
infoAnnotations.putAll(annotationsFromCurrentType);
}
}
}
//if not in reference-confidence mode, do annotate with reducible annotations, but skip the raw data and go straight to the finalized values
else {
for (final InfoFieldAnnotation annotationType : requestedReducibleInfoAnnotations) {
if (!(annotationType instanceof ActiveRegionBasedAnnotation))
continue;
final Map<String, Object> annotationsFromCurrentType = annotationType.annotate(null, null, referenceContext, null, newGenotypeAnnotatedVC, perReadAlleleLikelihoodMap);
if (annotationsFromCurrentType != null) {
infoAnnotations.putAll(annotationsFromCurrentType);
}
}
}
//leave this in or else the median will overwrite until we do truly allele-specific
//// for now do both allele-specific and not
for ( final InfoFieldAnnotation annotationType : requestedNonReducibleInfoAnnotations ) {
if ( !(annotationType instanceof ActiveRegionBasedAnnotation) )
continue;
final Map<String, Object> annotationsFromCurrentType = annotationType.annotate(referenceContext, perReadAlleleLikelihoodMap, newGenotypeAnnotatedVC);
if ( annotationsFromCurrentType != null ) {
infoAnnotations.putAll(annotationsFromCurrentType);
}
final Map<String, Object> annotationsFromCurrentType = annotationType.annotate(referenceContext, perReadAlleleLikelihoodMap, newGenotypeAnnotatedVC);
if (annotationsFromCurrentType != null) {
infoAnnotations.putAll(annotationsFromCurrentType);
}
}
// create a new VC with info and genotype annotations
@ -263,6 +314,61 @@ public class VariantAnnotatorEngine {
return annotateDBs(tracker, annotated);
}
/**
* Combine (raw) data for reducible annotations (those that use raw data in gVCFs)
* Mutates annotationMap by removing the annotations that were combined
* @param allelesList the list of merged alleles across all variants being combined
* @param annotationMap attributes of merged variant contexts -- is modifying by removing successfully combined annotations
* @return a map containing the keys and raw values for the combined annotations
*/
public Map<String, Object> combineAnnotations(final List<Allele> allelesList, Map<String, List<ReducibleAnnotationData>> annotationMap) {
Map<String, Object> combinedAnnotations = new HashMap<>();
// go through all the requested reducible info annotationTypes
for (final InfoFieldAnnotation annotationType : requestedReducibleInfoAnnotations) {
ReducibleAnnotation currentASannotation = (ReducibleAnnotation) annotationType;
if (annotationMap.containsKey(currentASannotation.getRawKeyName())) {
final List<ReducibleAnnotationData> annotationValue = annotationMap.get(currentASannotation.getRawKeyName());
final Map<String, Object> annotationsFromCurrentType = currentASannotation.combineRawData(allelesList, annotationValue);
combinedAnnotations.putAll(annotationsFromCurrentType);
//remove the combined annotations so that the next method only processes the non-reducible ones
annotationMap.remove(currentASannotation.getRawKeyName());
}
}
return combinedAnnotations;
}
/**
* Finalize reducible annotations (those that use raw data in gVCFs)
* @param vc the merged VC with the final set of alleles, possibly subset to the number of maxAltAlleles for genotyping
* @param originalVC the merged but non-subset VC that contains the full list of merged alleles
* @return a VariantContext with the final annotation values for reducible annotations
*/
public VariantContext finalizeAnnotations(VariantContext vc, VariantContext originalVC) {
final Map<String, Object> infoAnnotations = new LinkedHashMap<>(vc.getAttributes());
// go through all the requested info annotationTypes
for ( final InfoFieldAnnotation annotationType : requestedReducibleInfoAnnotations ) {
if ( !(annotationType instanceof ActiveRegionBasedAnnotation) )
continue;
ReducibleAnnotation currentASannotation = (ReducibleAnnotation)annotationType;
final Map<String, Object> annotationsFromCurrentType = currentASannotation.finalizeRawData(vc, originalVC);
if ( annotationsFromCurrentType != null ) {
infoAnnotations.putAll(annotationsFromCurrentType);
//clean up raw annotation data after annotations are finalized
infoAnnotations.remove(currentASannotation.getRawKeyName());
}
}
// generate a new annotated VC
final VariantContextBuilder builder = new VariantContextBuilder(vc).attributes(infoAnnotations);
// annotate genotypes, creating another new VC in the process
final VariantContext annotated = builder.make();
return annotated;
}
/**
* Annotate the ID field and other DBs for the given Variant Context
*
@ -412,13 +518,13 @@ public class VariantAnnotatorEngine {
* @param vc variant context to annotate
* @return list of biallelics trimmed to a minimum representation
*/
private List<VariantContext> getMinRepresentationBiallelics(final VariantContext vc){
private List<VariantContext> getMinRepresentationBiallelics(final VariantContext vc) {
final List<VariantContext> minRepresentationBiallelicVCs = new ArrayList<VariantContext>();
final boolean isMultiAllelic = vc.getNAlleles() > 2;
if ( isMultiAllelic ){
if (isMultiAllelic) {
final List<VariantContext> vcList = GATKVariantContextUtils.splitVariantContextToBiallelics(vc);
for (final VariantContext biallelicVC: vcList) {
if ( !biallelicVC.isSNP() )
for (final VariantContext biallelicVC : vcList) {
if (!biallelicVC.isSNP())
minRepresentationBiallelicVCs.add(GATKVariantContextUtils.trimAlleles(biallelicVC, true, true));
else
minRepresentationBiallelicVCs.add(biallelicVC);
@ -429,4 +535,13 @@ public class VariantAnnotatorEngine {
return minRepresentationBiallelicVCs;
}
private void setReducibleAnnotations() {
for(final InfoFieldAnnotation annotationType : requestedInfoAnnotations) {
if (annotationType instanceof ReducibleAnnotation)
requestedReducibleInfoAnnotations.add(annotationType);
else
requestedNonReducibleInfoAnnotations.add(annotationType);
}
}
}

View File

@ -0,0 +1,31 @@
/*
* Copyright 2012-2015 Broad Institute, Inc.
*
* Permission is hereby granted, free of charge, to any person
* obtaining a copy of this software and associated documentation
* files (the "Software"), to deal in the Software without
* restriction, including without limitation the rights to use,
* copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following
* conditions:
*
* The above copyright notice and this permission notice shall be
* included in all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR
* THE USE OR OTHER DEALINGS IN THE SOFTWARE.
*/
package org.broadinstitute.gatk.tools.walkers.annotator.interfaces;
/**
* Created by gauthier on 9/28/15.
*/
public interface AS_StandardAnnotation extends AnnotationType {}

View File

@ -0,0 +1,88 @@
/*
* Copyright 2012-2015 Broad Institute, Inc.
*
* Permission is hereby granted, free of charge, to any person
* obtaining a copy of this software and associated documentation
* files (the "Software"), to deal in the Software without
* restriction, including without limitation the rights to use,
* copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following
* conditions:
*
* The above copyright notice and this permission notice shall be
* included in all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR
* THE USE OR OTHER DEALINGS IN THE SOFTWARE.
*/
package org.broadinstitute.gatk.tools.walkers.annotator.interfaces;
import htsjdk.variant.variantcontext.Allele;
import htsjdk.variant.variantcontext.VariantContext;
import org.broadinstitute.gatk.tools.walkers.annotator.ReducibleAnnotationData;
import org.broadinstitute.gatk.utils.contexts.AlignmentContext;
import org.broadinstitute.gatk.utils.contexts.ReferenceContext;
import org.broadinstitute.gatk.utils.genotyper.PerReadAlleleLikelihoodMap;
import org.broadinstitute.gatk.utils.refdata.RefMetaDataTracker;
import java.util.List;
import java.util.Map;
/**
* An interface for annotations that are calculated using raw data across samples, rather than the median (or median of median) of samples values
*/
public interface ReducibleAnnotation extends AnnotationType {
public abstract String getRawKeyName();
/**
* Generate the raw data necessary to calculate the annotation. Raw data is the final endpoint for gVCFs.
*
* @param tracker
* @param walker
* @param ref
* @param stratifiedContexts
* @param vc
* @param stratifiedPerReadAlleleLikelihoodMap
* @return
*/
public abstract Map<String, Object> annotateRawData(final RefMetaDataTracker tracker,
final AnnotatorCompatible walker,
final ReferenceContext ref,
final Map<String, AlignmentContext> stratifiedContexts,
final VariantContext vc,
final Map<String, PerReadAlleleLikelihoodMap> stratifiedPerReadAlleleLikelihoodMap);
/**
* Combine raw data, typically during the merging of raw data contained in multiple gVCFs as in CombineGVCFs and the
* preliminary merge for GenotypeGVCFs
* @param allelesList The merged allele list across all variants being combined/merged
* @param listOfRawData The raw data for all the variants being combined/merged
* @return
*/
public abstract Map<String, Object> combineRawData(final List<Allele> allelesList, final List <? extends ReducibleAnnotationData> listOfRawData);
/**
* Calculate the final annotation value from the raw data
* @param vc -- contains the final set of alleles, possibly subset by GenotypeGVCFs
* @param originalVC -- used to get all the alleles for all gVCFs
* @return
*/
public abstract Map<String, Object> finalizeRawData(final VariantContext vc, final VariantContext originalVC);
/**
*
* @param vc
* @param pralm
* @param rawAnnotations
*/
public abstract void calculateRawData(VariantContext vc, Map<String, PerReadAlleleLikelihoodMap> pralm, ReducibleAnnotationData rawAnnotations);
}

View File

@ -0,0 +1,118 @@
/*
* Copyright 2012-2015 Broad Institute, Inc.
*
* Permission is hereby granted, free of charge, to any person
* obtaining a copy of this software and associated documentation
* files (the "Software"), to deal in the Software without
* restriction, including without limitation the rights to use,
* copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following
* conditions:
*
* The above copyright notice and this permission notice shall be
* included in all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR
* THE USE OR OTHER DEALINGS IN THE SOFTWARE.
*/
package org.broadinstitute.gatk.tools.walkers.annotator;
import org.testng.Assert;
import org.testng.annotations.Test;
public class CompressedDataListUnitTest {
@Test
public void testAddSingly(){
CompressedDataList<Integer> intList = new CompressedDataList<>();
intList.add(2);
intList.add(5);
intList.add(5);
intList.add(2);
intList.add(2);
intList.add(3);
intList.add(2);
intList.add(2);
intList.add(4);
intList.add(4);
intList.add(4);
intList.add(2);
Assert.assertEquals(intList.isEmpty(), false);
Assert.assertEquals(intList.toString(), "2,6,3,1,4,3,5,2");
}
@Test
public void testAddValueCounts(){
CompressedDataList<Integer> intList = new CompressedDataList<>();
intList.add(5,2);
intList.add(2,6);
intList.add(3,1);
intList.add(4,3);
Assert.assertEquals(intList.isEmpty(), false);
Assert.assertEquals(intList.toString(), "2,6,3,1,4,3,5,2");
}
@Test
public void testAddBothWays(){
CompressedDataList<Integer> intList = new CompressedDataList<>();
intList.add(2);
intList.add(5,2);
intList.add(2);
intList.add(2);
intList.add(3);
intList.add(2);
intList.add(2);
intList.add(4,2);
intList.add(2);
intList.add(4,1);
Assert.assertEquals(intList.toString(), "2,6,3,1,4,3,5,2");
}
@Test
public void testCombineLists(){
CompressedDataList<Integer> intList1 = new CompressedDataList<>();
intList1.add(5,2);
intList1.add(2,6);
intList1.add(3,1);
intList1.add(4,3);
CompressedDataList<Integer> intList2 = new CompressedDataList<>();
intList2.add(2,5);
intList2.add(6,2);
intList2.add(1,3);
intList2.add(3,4);
intList1.add(intList2);
Assert.assertEquals(intList1.toString(), "1,3,2,11,3,5,4,3,5,2,6,2");
}
@Test
public void testIterator(){
CompressedDataList<Integer> intList1 = new CompressedDataList<>();
intList1.add(5,2);
intList1.add(2,6);
intList1.add(3,1);
intList1.add(4,3);
CompressedDataList<Integer> intList2 = new CompressedDataList<>();
for(Integer i : intList1) {
intList2.add(i);
}
Assert.assertEquals(intList1.toString(),intList2.toString());
}
}

View File

@ -35,6 +35,9 @@ import htsjdk.variant.variantcontext.Allele;
public final class GATKVCFConstants {
//INFO keys
public static final String RAW_RMS_MAPPING_QUALITY_KEY = "RAW_MQ";
public static final String AS_RMS_MAPPING_QUALITY_KEY = "AS_MQ";
public static final String AS_RAW_RMS_MAPPING_QUALITY_KEY = "AS_RAW_MQ";
public static final String ALLELE_BALANCE_HET_KEY = "ABHet";
public static final String ALLELE_BALANCE_HOM_KEY = "ABHom";
public static final String ORIGINAL_AC_KEY = "AC_Orig"; //SelectVariants
@ -45,6 +48,8 @@ public final class GATKVCFConstants {
public static final String BEAGLE_AN_COMP_KEY = "ANH"; //BeagleOutputToVCF
public static final String BASE_COUNTS_KEY = "BaseCounts";
public static final String BASE_QUAL_RANK_SUM_KEY = "BaseQRankSum";
public static final String AS_BASE_QUAL_RANK_SUM_KEY = "AS_BaseQRankSum";
public static final String AS_RAW_BASE_QUAL_RANK_SUM_KEY = "AS_RAW_BaseQRankSum";
public static final String GENOTYPE_AND_VALIDATE_STATUS_KEY = "callStatus";
public static final String CLIPPING_RANK_SUM_KEY = "ClippingRankSum";
public static final String CULPRIT_KEY = "culprit";
@ -55,6 +60,8 @@ public final class GATKVCFConstants {
public static final String EVENT_DISTANCE_MAX_KEY = "MAX_ED"; //M2
public static final String EVENT_DISTANCE_MIN_KEY = "MIN_ED"; //M2
public static final String FISHER_STRAND_KEY = "FS";
public static final String AS_FISHER_STRAND_KEY = "AS_FS";
public static final String AS_SB_TABLE_KEY = "AS_SB_TABLE";
public static final String GC_CONTENT_KEY = "GC";
public static final String GQ_MEAN_KEY = "GQ_MEAN";
public static final String GQ_STDEV_KEY = "GQ_STDDEV";
@ -67,6 +74,7 @@ public final class GATKVCFConstants {
public static final String INTERVAL_GC_CONTENT_KEY = "IGC";
public static final String INBREEDING_COEFFICIENT_KEY = "InbreedingCoeff";
public static final String EXCESS_HET_KEY = "ExcessHet";
public static final String AS_HETEROZYGOSITY_KEY = "AS_InbreedingCoeff";
public static final String LIKELIHOOD_RANK_SUM_KEY = "LikelihoodRankSum";
public static final String LO_CONF_DENOVO_KEY = "loConfDeNovo";
public static final String LOW_MQ_KEY = "LowMQ";
@ -75,6 +83,9 @@ public final class GATKVCFConstants {
public static final String MLE_PER_SAMPLE_ALLELE_COUNT_KEY = "MLPSAC";
public static final String MLE_PER_SAMPLE_ALLELE_FRACTION_KEY = "MLPSAF";
public static final String MAP_QUAL_RANK_SUM_KEY = "MQRankSum";
public static final String RAW_MAP_QUAL_RANK_SUM_KEY = "RAW_MQRankSum";
public static final String AS_MAP_QUAL_RANK_SUM_KEY = "AS_MQRankSum";
public static final String AS_RAW_MAP_QUAL_RANK_SUM_KEY = "AS_RAW_MQRankSum";
public static final String MENDEL_VIOLATION_LR_KEY = "MVLR";
public static final String NOCALL_CHROM_KEY = "NCC";
public static final String NUMBER_OF_DISCOVERED_ALLELES_KEY = "NDA";
@ -92,12 +103,15 @@ public final class GATKVCFConstants {
public static final String POSITIVE_LABEL_KEY = "POSITIVE_TRAIN_SITE";
public static final String QUAL_BY_DEPTH_KEY = "QD";
public static final String BEAGLE_R2_KEY = "R2"; //BeagleOutputToVCF
public static final String AS_READ_POS_RANK_SUM_KEY = "AS_ReadPosRankSum";
public static final String AS_RAW_READ_POS_RANK_SUM_KEY = "AS_RAW_ReadPosRankSum";
public static final String READ_POS_RANK_SUM_KEY = "ReadPosRankSum";
public static final String REFSAMPLE_DEPTH_KEY = "REFDEPTH";
public static final String REPEATS_PER_ALLELE_KEY = "RPA";
public static final String REPEAT_UNIT_KEY = "RU";
public static final String SAMPLE_LIST_KEY = "Samples";
public static final String STRAND_ODDS_RATIO_KEY = "SOR";
public static final String AS_STRAND_ODDS_RATIO_KEY = "AS_SOR";
public static final String STR_PRESENT_KEY = "STR";
public static final String TRANSMISSION_DISEQUILIBRIUM_KEY = "TDT";
public static final String TUMOR_LOD_KEY = "TLOD"; //M2

View File

@ -115,8 +115,12 @@ public class GATKVCFHeaderLines {
addInfoLine(new VCFInfoHeaderLine(LOW_MQ_KEY, 3, VCFHeaderLineType.Float, "3-tuple: <fraction of reads with MQ=0>,<fraction of reads with MQ<=10>,<total number of reads>"));
addInfoLine(new VCFInfoHeaderLine(N_BASE_COUNT_KEY, 1, VCFHeaderLineType.Float, "Percentage of N bases in the pileup"));
addInfoLine(new VCFInfoHeaderLine(BASE_QUAL_RANK_SUM_KEY, 1, VCFHeaderLineType.Float, "Z-score from Wilcoxon rank sum test of Alt Vs. Ref base qualities"));
addInfoLine(new VCFInfoHeaderLine(AS_BASE_QUAL_RANK_SUM_KEY, VCFHeaderLineCount.A, VCFHeaderLineType.Float, "allele specific Z-score from Wilcoxon rank sum test of each Alt Vs. Ref base qualities"));
addInfoLine(new VCFInfoHeaderLine(AS_RAW_BASE_QUAL_RANK_SUM_KEY, 1, VCFHeaderLineType.String, "raw data for allele specific rank sum test of base qualities"));
addInfoLine(new VCFInfoHeaderLine(CLIPPING_RANK_SUM_KEY, 1, VCFHeaderLineType.Float, "Z-score From Wilcoxon rank sum test of Alt vs. Ref number of hard clipped bases"));
addInfoLine(new VCFInfoHeaderLine(FISHER_STRAND_KEY, 1, VCFHeaderLineType.Float, "Phred-scaled p-value using Fisher's exact test to detect strand bias"));
addInfoLine(new VCFInfoHeaderLine(AS_FISHER_STRAND_KEY, VCFHeaderLineCount.A, VCFHeaderLineType.Float, "allele specific phred-scaled p-value using Fisher's exact test to detect strand bias of each alt allele"));
addInfoLine(new VCFInfoHeaderLine(AS_SB_TABLE_KEY, 1, VCFHeaderLineType.String, "Allele-specific forward/reverse read counts for strand bias tests"));
addInfoLine(new VCFInfoHeaderLine(GC_CONTENT_KEY, 1, VCFHeaderLineType.Float, "GC content around the variant (see docs for window size details)"));
addInfoLine(new VCFInfoHeaderLine(NOCALL_CHROM_KEY, 1, VCFHeaderLineType.Integer, "Number of no-called samples"));
addInfoLine(new VCFInfoHeaderLine(GQ_MEAN_KEY, 1, VCFHeaderLineType.Float, "Mean of all GQ values"));
@ -126,16 +130,28 @@ public class GATKVCFHeaderLines {
addInfoLine(new VCFInfoHeaderLine(HOMOPOLYMER_RUN_KEY, 1, VCFHeaderLineType.Integer, "Largest Contiguous Homopolymer Run of Variant Allele In Either Direction"));
addInfoLine(new VCFInfoHeaderLine(INBREEDING_COEFFICIENT_KEY, 1, VCFHeaderLineType.Float, "Inbreeding coefficient as estimated from the genotype likelihoods per-sample when compared against the Hardy-Weinberg expectation"));
addInfoLine(new VCFInfoHeaderLine(EXCESS_HET_KEY, 1, VCFHeaderLineType.Float, "Phred-scaled p-value for exact test of excess heterozygosity"));
addInfoLine(new VCFInfoHeaderLine(AS_HETEROZYGOSITY_KEY, VCFHeaderLineCount.A, VCFHeaderLineType.Float, "allele specific heterozygosity as estimated from the genotype likelihoods per-sample when compared against the Hardy-Weinberg expectation; relate to inbreeding coefficient"));
addInfoLine(new VCFInfoHeaderLine(LIKELIHOOD_RANK_SUM_KEY, 1, VCFHeaderLineType.Float, "Z-score from Wilcoxon rank sum test of Alt Vs. Ref haplotype likelihoods"));
addInfoLine(new VCFInfoHeaderLine(MAP_QUAL_RANK_SUM_KEY, 1, VCFHeaderLineType.Float, "Z-score From Wilcoxon rank sum test of Alt vs. Ref read mapping qualities"));
addInfoLine(new VCFInfoHeaderLine(AS_MAP_QUAL_RANK_SUM_KEY, VCFHeaderLineCount.A, VCFHeaderLineType.Float, "allele specific Z-score From Wilcoxon rank sum test of each Alt vs. Ref read mapping qualities"));
addInfoLine(new VCFInfoHeaderLine(RAW_RMS_MAPPING_QUALITY_KEY, 1, VCFHeaderLineType.Float, "Raw data for RMS Mapping Quality"));
addInfoLine(new VCFInfoHeaderLine(AS_RAW_RMS_MAPPING_QUALITY_KEY, VCFHeaderLineCount.A, VCFHeaderLineType.Float, "Allele-specfic raw data for RMS Mapping Quality"));
addInfoLine(new VCFInfoHeaderLine(AS_RMS_MAPPING_QUALITY_KEY, VCFHeaderLineCount.A, VCFHeaderLineType.Float, "Allele-specific RMS Mapping Quality"));
addInfoLine(new VCFInfoHeaderLine(RAW_MAP_QUAL_RANK_SUM_KEY, 1, VCFHeaderLineType.Float, "Raw data for Mapping Quality Rank Sum"));
addInfoLine(new VCFInfoHeaderLine(AS_RAW_MAP_QUAL_RANK_SUM_KEY, 1, VCFHeaderLineType.String, "Allele-specfic raw data for Mapping Quality Rank Sum"));
addInfoLine(new VCFInfoHeaderLine(AS_MAP_QUAL_RANK_SUM_KEY, VCFHeaderLineCount.A, VCFHeaderLineType.Float, "Allele-specific Mapping Quality Rank Sum"));
addInfoLine(new VCFInfoHeaderLine(MENDEL_VIOLATION_LR_KEY, 1, VCFHeaderLineType.Float, "Mendelian violation likelihood ratio: L[MV] - L[No MV]"));
addInfoLine(new VCFInfoHeaderLine(HI_CONF_DENOVO_KEY, 1, VCFHeaderLineType.String, "High confidence possible de novo mutation (GQ >= 20 for all trio members)=[comma-delimited list of child samples]"));
addInfoLine(new VCFInfoHeaderLine(LO_CONF_DENOVO_KEY, 1, VCFHeaderLineType.String, "Low confidence possible de novo mutation (GQ >= 10 for child, GQ > 0 for parents)=[comma-delimited list of child samples]"));
addInfoLine(new VCFInfoHeaderLine(QUAL_BY_DEPTH_KEY, 1, VCFHeaderLineType.Float, "Variant Confidence/Quality by Depth"));
addInfoLine(new VCFInfoHeaderLine(READ_POS_RANK_SUM_KEY, 1, VCFHeaderLineType.Float, "Z-score from Wilcoxon rank sum test of Alt vs. Ref read position bias"));
addInfoLine(new VCFInfoHeaderLine(AS_READ_POS_RANK_SUM_KEY, VCFHeaderLineCount.A, VCFHeaderLineType.Float, "allele specific Z-score from Wilcoxon rank sum test of each Alt vs. Ref read position bias"));
addInfoLine(new VCFInfoHeaderLine(AS_RAW_READ_POS_RANK_SUM_KEY, 1, VCFHeaderLineType.String, "allele specific raw data for rank sum test of read position bias"));
addInfoLine(new VCFInfoHeaderLine(SAMPLE_LIST_KEY, VCFHeaderLineCount.UNBOUNDED, VCFHeaderLineType.String, "List of polymorphic samples"));
addInfoLine(new VCFInfoHeaderLine(SPANNING_DELETIONS_KEY, 1, VCFHeaderLineType.Float, "Fraction of Reads Containing Spanning Deletions"));
addInfoLine(new VCFInfoHeaderLine(STRAND_ODDS_RATIO_KEY, 1, VCFHeaderLineType.Float, "Symmetric Odds Ratio of 2x2 contingency table to detect strand bias"));
addInfoLine(new VCFInfoHeaderLine(AS_STRAND_ODDS_RATIO_KEY, VCFHeaderLineCount.A, VCFHeaderLineType.Float, "Allele specific strand Odds Ratio of 2x|Alts| contingency table to detect allele specific strand bias"));
addInfoLine(new VCFInfoHeaderLine(STR_PRESENT_KEY, 0, VCFHeaderLineType.Flag, "Variant is a short tandem repeat"));
addInfoLine(new VCFInfoHeaderLine(REPEAT_UNIT_KEY, 1, VCFHeaderLineType.String, "Tandem repeat unit (bases)"));
addInfoLine(new VCFInfoHeaderLine(REPEATS_PER_ALLELE_KEY, VCFHeaderLineCount.UNBOUNDED, VCFHeaderLineType.Integer, "Number of times tandem repeat unit is repeated, for each allele (including reference)"));