Merge pull request #658 from broadinstitute/ldg_PbyTwithPriors
Updated CalculateGenotypePosteriors to compute genotype posteriors using...
This commit is contained in:
commit
2df2a153e6
|
|
@ -0,0 +1,170 @@
|
|||
/*
|
||||
* By downloading the PROGRAM you agree to the following terms of use:
|
||||
*
|
||||
* BROAD INSTITUTE - SOFTWARE LICENSE AGREEMENT - FOR ACADEMIC NON-COMMERCIAL RESEARCH PURPOSES ONLY
|
||||
*
|
||||
* This Agreement is made between the Broad Institute, Inc. with a principal address at 7 Cambridge Center, Cambridge, MA 02142 (BROAD) and the LICENSEE and is effective at the date the downloading is completed (EFFECTIVE DATE).
|
||||
*
|
||||
* WHEREAS, LICENSEE desires to license the PROGRAM, as defined hereinafter, and BROAD wishes to have this PROGRAM utilized in the public interest, subject only to the royalty-free, nonexclusive, nontransferable license rights of the United States Government pursuant to 48 CFR 52.227-14; and
|
||||
* WHEREAS, LICENSEE desires to license the PROGRAM and BROAD desires to grant a license on the following terms and conditions.
|
||||
* NOW, THEREFORE, in consideration of the promises and covenants made herein, the parties hereto agree as follows:
|
||||
*
|
||||
* 1. DEFINITIONS
|
||||
* 1.1 PROGRAM shall mean copyright in the object code and source code known as GATK2 and related documentation, if any, as they exist on the EFFECTIVE DATE and can be downloaded from http://www.broadinstitute/GATK on the EFFECTIVE DATE.
|
||||
*
|
||||
* 2. LICENSE
|
||||
* 2.1 Grant. Subject to the terms of this Agreement, BROAD hereby grants to LICENSEE, solely for academic non-commercial research purposes, a non-exclusive, non-transferable license to: (a) download, execute and display the PROGRAM and (b) create bug fixes and modify the PROGRAM.
|
||||
* The LICENSEE may apply the PROGRAM in a pipeline to data owned by users other than the LICENSEE and provide these users the results of the PROGRAM provided LICENSEE does so for academic non-commercial purposes only. For clarification purposes, academic sponsored research is not a commercial use under the terms of this Agreement.
|
||||
* 2.2 No Sublicensing or Additional Rights. LICENSEE shall not sublicense or distribute the PROGRAM, in whole or in part, without prior written permission from BROAD. LICENSEE shall ensure that all of its users agree to the terms of this Agreement. LICENSEE further agrees that it shall not put the PROGRAM on a network, server, or other similar technology that may be accessed by anyone other than the LICENSEE and its employees and users who have agreed to the terms of this agreement.
|
||||
* 2.3 License Limitations. Nothing in this Agreement shall be construed to confer any rights upon LICENSEE by implication, estoppel, or otherwise to any computer software, trademark, intellectual property, or patent rights of BROAD, or of any other entity, except as expressly granted herein. LICENSEE agrees that the PROGRAM, in whole or part, shall not be used for any commercial purpose, including without limitation, as the basis of a commercial software or hardware product or to provide services. LICENSEE further agrees that the PROGRAM shall not be copied or otherwise adapted in order to circumvent the need for obtaining a license for use of the PROGRAM.
|
||||
*
|
||||
* 3. OWNERSHIP OF INTELLECTUAL PROPERTY
|
||||
* LICENSEE acknowledges that title to the PROGRAM shall remain with BROAD. The PROGRAM is marked with the following BROAD copyright notice and notice of attribution to contributors. LICENSEE shall retain such notice on all copies. LICENSEE agrees to include appropriate attribution if any results obtained from use of the PROGRAM are included in any publication.
|
||||
* Copyright 2012 Broad Institute, Inc.
|
||||
* Notice of attribution: The GATK2 program was made available through the generosity of Medical and Population Genetics program at the Broad Institute, Inc.
|
||||
* LICENSEE shall not use any trademark or trade name of BROAD, or any variation, adaptation, or abbreviation, of such marks or trade names, or any names of officers, faculty, students, employees, or agents of BROAD except as states above for attribution purposes.
|
||||
*
|
||||
* 4. INDEMNIFICATION
|
||||
* LICENSEE shall indemnify, defend, and hold harmless BROAD, and their respective officers, faculty, students, employees, associated investigators and agents, and their respective successors, heirs and assigns, (Indemnitees), against any liability, damage, loss, or expense (including reasonable attorneys fees and expenses) incurred by or imposed upon any of the Indemnitees in connection with any claims, suits, actions, demands or judgments arising out of any theory of liability (including, without limitation, actions in the form of tort, warranty, or strict liability and regardless of whether such action has any factual basis) pursuant to any right or license granted under this Agreement.
|
||||
*
|
||||
* 5. NO REPRESENTATIONS OR WARRANTIES
|
||||
* THE PROGRAM IS DELIVERED AS IS. BROAD MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND CONCERNING THE PROGRAM OR THE COPYRIGHT, EXPRESS OR IMPLIED, INCLUDING, WITHOUT LIMITATION, WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, NONINFRINGEMENT, OR THE ABSENCE OF LATENT OR OTHER DEFECTS, WHETHER OR NOT DISCOVERABLE. BROAD EXTENDS NO WARRANTIES OF ANY KIND AS TO PROGRAM CONFORMITY WITH WHATEVER USER MANUALS OR OTHER LITERATURE MAY BE ISSUED FROM TIME TO TIME.
|
||||
* IN NO EVENT SHALL BROAD OR ITS RESPECTIVE DIRECTORS, OFFICERS, EMPLOYEES, AFFILIATED INVESTIGATORS AND AFFILIATES BE LIABLE FOR INCIDENTAL OR CONSEQUENTIAL DAMAGES OF ANY KIND, INCLUDING, WITHOUT LIMITATION, ECONOMIC DAMAGES OR INJURY TO PROPERTY AND LOST PROFITS, REGARDLESS OF WHETHER BROAD SHALL BE ADVISED, SHALL HAVE OTHER REASON TO KNOW, OR IN FACT SHALL KNOW OF THE POSSIBILITY OF THE FOREGOING.
|
||||
*
|
||||
* 6. ASSIGNMENT
|
||||
* This Agreement is personal to LICENSEE and any rights or obligations assigned by LICENSEE without the prior written consent of BROAD shall be null and void.
|
||||
*
|
||||
* 7. MISCELLANEOUS
|
||||
* 7.1 Export Control. LICENSEE gives assurance that it will comply with all United States export control laws and regulations controlling the export of the PROGRAM, including, without limitation, all Export Administration Regulations of the United States Department of Commerce. Among other things, these laws and regulations prohibit, or require a license for, the export of certain types of software to specified countries.
|
||||
* 7.2 Termination. LICENSEE shall have the right to terminate this Agreement for any reason upon prior written notice to BROAD. If LICENSEE breaches any provision hereunder, and fails to cure such breach within thirty (30) days, BROAD may terminate this Agreement immediately. Upon termination, LICENSEE shall provide BROAD with written assurance that the original and all copies of the PROGRAM have been destroyed, except that, upon prior written authorization from BROAD, LICENSEE may retain a copy for archive purposes.
|
||||
* 7.3 Survival. The following provisions shall survive the expiration or termination of this Agreement: Articles 1, 3, 4, 5 and Sections 2.2, 2.3, 7.3, and 7.4.
|
||||
* 7.4 Notice. Any notices under this Agreement shall be in writing, shall specifically refer to this Agreement, and shall be sent by hand, recognized national overnight courier, confirmed facsimile transmission, confirmed electronic mail, or registered or certified mail, postage prepaid, return receipt requested. All notices under this Agreement shall be deemed effective upon receipt.
|
||||
* 7.5 Amendment and Waiver; Entire Agreement. This Agreement may be amended, supplemented, or otherwise modified only by means of a written instrument signed by all parties. Any waiver of any rights or failure to act in a specific instance shall relate only to such instance and shall not be construed as an agreement to waive any rights or fail to act in any other instance, whether or not similar. This Agreement constitutes the entire agreement among the parties with respect to its subject matter and supersedes prior agreements or understandings between the parties relating to its subject matter.
|
||||
* 7.6 Binding Effect; Headings. This Agreement shall be binding upon and inure to the benefit of the parties and their respective permitted successors and assigns. All headings are for convenience only and shall not affect the meaning of any provision of this Agreement.
|
||||
* 7.7 Governing Law. This Agreement shall be construed, governed, interpreted and applied in accordance with the internal laws of the Commonwealth of Massachusetts, U.S.A., without regard to conflict of laws principles.
|
||||
*/
|
||||
|
||||
package org.broadinstitute.gatk.tools.walkers.annotator;
|
||||
|
||||
import org.broadinstitute.gatk.engine.contexts.AlignmentContext;
|
||||
import org.broadinstitute.gatk.engine.contexts.ReferenceContext;
|
||||
import org.broadinstitute.gatk.engine.refdata.RefMetaDataTracker;
|
||||
import org.broadinstitute.gatk.engine.samples.Sample;
|
||||
import org.broadinstitute.gatk.engine.samples.Trio;
|
||||
import org.broadinstitute.gatk.engine.walkers.Walker;
|
||||
import org.broadinstitute.gatk.tools.walkers.annotator.interfaces.AnnotatorCompatible;
|
||||
import org.broadinstitute.gatk.tools.walkers.annotator.interfaces.ExperimentalAnnotation;
|
||||
import org.broadinstitute.gatk.tools.walkers.annotator.interfaces.InfoFieldAnnotation;
|
||||
import org.broadinstitute.gatk.tools.walkers.annotator.interfaces.RodRequiringAnnotation;
|
||||
import org.broadinstitute.gatk.utils.genotyper.PerReadAlleleLikelihoodMap;
|
||||
import org.broadinstitute.gatk.utils.MendelianViolation;
|
||||
import htsjdk.variant.vcf.VCFHeaderLineType;
|
||||
import htsjdk.variant.vcf.VCFInfoHeaderLine;
|
||||
import org.broadinstitute.gatk.utils.exceptions.UserException;
|
||||
import htsjdk.variant.variantcontext.VariantContext;
|
||||
|
||||
import java.util.*;
|
||||
|
||||
/**
|
||||
* Tags variants with called genotypes that support the existence of a de novo mutation in at least one of the given families
|
||||
*
|
||||
* <p>Given a variant context, this tool uses the called genotypes (ideally after having been refined using PhaseByTransmission and CalculateGenotypePosteriors)
|
||||
* to identify possible de novo mutations and the sample in which they occur. </p>
|
||||
*
|
||||
* <h3>Caveats</h3>
|
||||
*
|
||||
* <p>This tool assumes that the organism is diploid.</p>
|
||||
*
|
||||
* <p>Note that this annotation requires a valid ped file.</p>
|
||||
*
|
||||
* <p>Only reports possible de novos for children where genotype is not filtered (which is most appropriate if parent likelihoods
|
||||
* have already been factored in using PhaseByTransmission).</p>
|
||||
*
|
||||
* <p>When multiple trios are present, the annotation is simply the maximum
|
||||
* of the likelihood ratios, rather than the strict 1-Prod(1-p_i) calculation, as this can scale poorly for uncertain
|
||||
* sites and many trios.</p>
|
||||
*
|
||||
* <p>This annotation can only be used from the Variant Annotator.
|
||||
* If you attempt to use it from the UnifiedGenotyper, the run will fail with an error message to that effect.
|
||||
* If you attempt to use it from the HaplotypeCaller, the run will complete successfully but the annotation will not be added to any variants.</p>
|
||||
*/
|
||||
|
||||
public class PossibleDeNovo extends InfoFieldAnnotation implements RodRequiringAnnotation, ExperimentalAnnotation {
|
||||
|
||||
private MendelianViolation mendelianViolation = null;
|
||||
public static final String HI_CONF_DENOVO_KEY = "hiConfDeNovo";
|
||||
public static final String LO_CONF_DENOVO_KEY = "loConfDeNovo";
|
||||
private final int GQ_threshold = 20;
|
||||
private Set<Trio> trios;
|
||||
|
||||
public Map<String, Object> annotate(final RefMetaDataTracker tracker,
|
||||
final AnnotatorCompatible walker,
|
||||
final ReferenceContext ref,
|
||||
final Map<String, AlignmentContext> stratifiedContexts,
|
||||
final VariantContext vc,
|
||||
final Map<String, PerReadAlleleLikelihoodMap> stratifiedPerReadAlleleLikelihoodMap) {
|
||||
if ( mendelianViolation == null ) {
|
||||
trios = ((Walker) walker).getSampleDB().getTrios();
|
||||
if ( trios.size() > 0 ) {
|
||||
mendelianViolation = new MendelianViolation(((VariantAnnotator)walker).minGenotypeQualityP );
|
||||
}
|
||||
else {
|
||||
throw new UserException("Mendelian violation annotation can only be used from the Variant Annotator, and must be provided a valid PED file (-ped) from the command line.");
|
||||
}
|
||||
}
|
||||
|
||||
final Map<String,Object> attributeMap = new HashMap<String,Object>(1);
|
||||
boolean isHighConfDeNovo = false;
|
||||
boolean isLowConfDeNovo = false;
|
||||
final List<String> highConfDeNovoChildren = new ArrayList<String>();
|
||||
final List<String> lowConfDeNovoChildren = new ArrayList<String>();
|
||||
for ( final Trio trio : trios ) {
|
||||
if ( contextHasTrioLikelihoods(vc,trio) ) {
|
||||
if (mendelianViolation.isViolation(trio.getMother(),trio.getFather(),trio.getChild(),vc)) {
|
||||
if (mendelianViolation.getParentsRefRefChildHet() > 0) {
|
||||
if (vc.getGenotype(trio.getChildID()).getGQ() > GQ_threshold && vc.getGenotype(trio.getMaternalID()).getGQ() > GQ_threshold && vc.getGenotype(trio.getPaternalID()).getGQ() > GQ_threshold) {
|
||||
highConfDeNovoChildren.add(trio.getChildID());
|
||||
isHighConfDeNovo = true;
|
||||
}
|
||||
else {
|
||||
lowConfDeNovoChildren.add(trio.getChildID());
|
||||
isLowConfDeNovo = true;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if ( isHighConfDeNovo || isLowConfDeNovo ) {
|
||||
for(final String child : highConfDeNovoChildren) {
|
||||
attributeMap.put(HI_CONF_DENOVO_KEY,child);
|
||||
}
|
||||
for(final String child : lowConfDeNovoChildren) {
|
||||
attributeMap.put(LO_CONF_DENOVO_KEY,child);
|
||||
}
|
||||
}
|
||||
return attributeMap;
|
||||
}
|
||||
|
||||
// return the descriptions used for the VCF INFO meta field
|
||||
public List<String> getKeyNames() { return Arrays.asList(HI_CONF_DENOVO_KEY,LO_CONF_DENOVO_KEY); }
|
||||
|
||||
public List<VCFInfoHeaderLine> getDescriptions() { return Arrays.asList(new VCFInfoHeaderLine(HI_CONF_DENOVO_KEY, 1, VCFHeaderLineType.String, "High confidence possible de novo mutation (GQ > "+GQ_threshold+"): sample name"),
|
||||
new VCFInfoHeaderLine(LO_CONF_DENOVO_KEY, 1, VCFHeaderLineType.String, "Low confidence possible de novo mutation: sample name")); }
|
||||
|
||||
|
||||
private boolean contextHasTrioLikelihoods(VariantContext context, Trio trio) {
|
||||
for ( String sample : Arrays.asList(trio.getMaternalID(),trio.getPaternalID(),trio.getChildID()) ) {
|
||||
if (trio.getMaternalID().isEmpty() || trio.getPaternalID().isEmpty() || trio.getChildID().isEmpty())
|
||||
return false;
|
||||
if ( ! context.hasGenotype(sample) )
|
||||
return false;
|
||||
if ( ! context.getGenotype(sample).hasLikelihoods() )
|
||||
return false;
|
||||
}
|
||||
|
||||
return true;
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
|
|
@ -46,12 +46,15 @@
|
|||
|
||||
package org.broadinstitute.gatk.tools.walkers.variantutils;
|
||||
|
||||
|
||||
import org.broadinstitute.gatk.utils.commandline.*;
|
||||
import org.broadinstitute.gatk.engine.CommandLineGATK;
|
||||
import org.broadinstitute.gatk.engine.arguments.StandardVariantContextInputArgumentCollection;
|
||||
import org.broadinstitute.gatk.engine.contexts.AlignmentContext;
|
||||
import org.broadinstitute.gatk.engine.contexts.ReferenceContext;
|
||||
import org.broadinstitute.gatk.engine.refdata.RefMetaDataTracker;
|
||||
import org.broadinstitute.gatk.engine.samples.Sample;
|
||||
import org.broadinstitute.gatk.engine.samples.Trio;
|
||||
import org.broadinstitute.gatk.engine.walkers.RodWalker;
|
||||
import org.broadinstitute.gatk.utils.SampleUtils;
|
||||
import org.broadinstitute.gatk.utils.exceptions.UserException;
|
||||
|
|
@ -60,7 +63,7 @@ import org.broadinstitute.gatk.utils.help.HelpConstants;
|
|||
import org.broadinstitute.gatk.utils.variant.GATKVCFUtils;
|
||||
import org.broadinstitute.gatk.utils.variant.GATKVariantContextUtils;
|
||||
import org.broadinstitute.gatk.utils.variant.HomoSapiensConstants;
|
||||
import htsjdk.variant.variantcontext.VariantContext;
|
||||
import htsjdk.variant.variantcontext.*;
|
||||
import htsjdk.variant.variantcontext.writer.VariantContextWriter;
|
||||
import htsjdk.variant.vcf.*;
|
||||
|
||||
|
|
@ -90,7 +93,11 @@ import java.util.*;
|
|||
*
|
||||
* <h3>Input</h3>
|
||||
* <p>
|
||||
* A VCF with genotype likelihoods, and optionally genotypes, AC/AN fields, or MLEAC/AN fields
|
||||
* <ul>
|
||||
* <li>A VCF with genotype likelihoods, and optionally genotypes, AC/AN fields, or MLEAC/AN fields</li>
|
||||
* <li>(Optional) A PED pedigree file containing the description of the individuals relationships.</li>
|
||||
* </ul>
|
||||
*
|
||||
* </p>
|
||||
*
|
||||
* <p>
|
||||
|
|
@ -103,9 +110,10 @@ import java.util.*;
|
|||
* <h3>Output</h3>
|
||||
* <p>
|
||||
* A new VCF with:
|
||||
* 1) Genotype posteriors added to the genotype fields ("GP")
|
||||
* 1) Genotype posteriors added to the genotype fields ("PP")
|
||||
* 2) Genotypes and GQ assigned according to these posteriors
|
||||
* 3) Per-site genotype priors added to the INFO field ("PG")
|
||||
* 4) (Optional) Per-site, per-trio transmission probabilities given as Phred-scaled probability of all genotypes in the trio being correct, added to the genotype fields ("TP")
|
||||
* </p>
|
||||
*
|
||||
* <h3>Notes</h3>
|
||||
|
|
@ -123,7 +131,7 @@ import java.util.*;
|
|||
* -R ref.fasta \
|
||||
* -T CalculateGenotypePosteriors \
|
||||
* -V NA12878.wgs.HC.vcf \
|
||||
* -VV 1000G_EUR.genotypes.combined.vcf \
|
||||
* -supporting 1000G_EUR.genotypes.combined.vcf \
|
||||
* -o NA12878.wgs.HC.posteriors.vcf \
|
||||
*
|
||||
* Refine the genotypes of a large panel based on the discovered allele frequency
|
||||
|
|
@ -147,10 +155,19 @@ import java.util.*;
|
|||
* java -Xmx2g -jar GenomeAnalysisTK.jar \
|
||||
* -R ref.fasta \
|
||||
* -T CalculateGenotypePosteriors \
|
||||
* -VV external.panel.vcf \
|
||||
* -supporting external.panel.vcf \
|
||||
* -V input.vcf \
|
||||
* -o output.withPosteriors.vcf
|
||||
* --numRefSamplesIfNoCall 100
|
||||
*
|
||||
* Apply only family priors to a callset
|
||||
* java -Xmx2g -jar GenomeAnalysisTK.jar \
|
||||
* -R ref.fasta \
|
||||
* -T CalculateGenotypePosteriors \
|
||||
* -V input.vcf \
|
||||
* --skipPopulationPriors
|
||||
* -ped family.ped
|
||||
* -o output.withPosteriors.vcf
|
||||
*
|
||||
* </pre>
|
||||
*
|
||||
|
|
@ -168,7 +185,7 @@ public class CalculateGenotypePosteriors extends RodWalker<Integer,Integer> {
|
|||
* Supporting external panels. Allele counts from these panels (taken from AC,AN or MLEAC,AN or raw genotypes) will
|
||||
* be used to inform the frequency distribution underying the genotype priors.
|
||||
*/
|
||||
@Input(fullName="supporting", shortName = "VV", doc="Other callsets to use in generating genotype posteriors", required=false)
|
||||
@Input(fullName="supporting", shortName = "supporting", doc="Other callsets to use in generating genotype posteriors", required=false)
|
||||
public List<RodBinding<VariantContext>> supportVariants = new ArrayList<RodBinding<VariantContext>>();
|
||||
|
||||
/**
|
||||
|
|
@ -180,6 +197,14 @@ public class CalculateGenotypePosteriors extends RodWalker<Integer,Integer> {
|
|||
@Argument(fullName="globalPrior",shortName="G",doc="The global Dirichlet prior parameters for the allele frequency",required=false)
|
||||
public double globalPrior = HomoSapiensConstants.SNP_HETEROZYGOSITY;
|
||||
|
||||
/**
|
||||
* The mutation prior -- i.e. the probability that a new mutation occurs. Sensitivity analysis on known de novo
|
||||
* mutations suggests a default value of 10^-6.
|
||||
*
|
||||
*/
|
||||
@Argument(fullName="deNovoPrior",shortName="DNP",doc="The de novo mutation prior",required=false)
|
||||
public double deNovoPrior = 1e-6;
|
||||
|
||||
/**
|
||||
* When a variant is not seen in a panel, whether to infer (and with what effective strength) that only reference
|
||||
* alleles were ascertained at that site. E.g. "If not seen in 1000Genomes, treat it as AC=0, AN=2000". This is
|
||||
|
|
@ -212,15 +237,43 @@ public class CalculateGenotypePosteriors extends RodWalker<Integer,Integer> {
|
|||
@Argument(fullName="calculateMissingPriors",shortName="calcMissing",doc="Use discovered allele frequency in the callset for variants that do no appear in the external callset", required=false)
|
||||
public boolean calcMissing = false;
|
||||
|
||||
/**
|
||||
* Skip application of population-based priors
|
||||
*/
|
||||
@Argument(fullName="skipPopulationPriors",shortName="skipPop",doc="Skip application of population-based priors", required=false)
|
||||
public boolean skipPopulationPriors = false;
|
||||
|
||||
/**
|
||||
* Skip application of family-based priors. Note: if pedigree file is absent, family-based priors will be skipped.
|
||||
*/
|
||||
@Argument(fullName="skipFamilyPriors",shortName="skipFam",doc="Skip application of family-based priors", required=false)
|
||||
public boolean skipFamilyPriors = false;
|
||||
|
||||
@Output(doc="File to which variants should be written")
|
||||
protected VariantContextWriter vcfWriter = null;
|
||||
|
||||
private final String TRANSMISSION_PROBABILITY_TAG_NAME = "TP";
|
||||
private final String PHRED_SCALED_POSTERIORS_KEY = "PP";
|
||||
|
||||
private FamilyLikelihoodsUtils famUtils = new FamilyLikelihoodsUtils();
|
||||
|
||||
public void initialize() {
|
||||
// Get list of samples to include in the output
|
||||
final List<String> rodNames = Arrays.asList(variantCollection.variants.getName());
|
||||
|
||||
final Map<String,VCFHeader> vcfRods = GATKVCFUtils.getVCFHeadersFromRods(getToolkit(), rodNames);
|
||||
|
||||
final Set<String> vcfSamples = SampleUtils.getSampleList(vcfRods, GATKVariantContextUtils.GenotypeMergeType.REQUIRE_UNIQUE);
|
||||
|
||||
//Get the trios from the families passed as ped
|
||||
if (!skipFamilyPriors){
|
||||
final Set<Trio> trios = getSampleDB().getTrios();
|
||||
if(trios.size()<1) {
|
||||
logger.info("No PED file passed or no *non-skipped* trios found in PED file. Skipping family priors.");
|
||||
skipFamilyPriors = true;
|
||||
}
|
||||
}
|
||||
|
||||
if ( vcfRods.size() > 1 )
|
||||
throw new IllegalStateException("Somehow more than one variant was bound?");
|
||||
|
||||
|
|
@ -241,15 +294,18 @@ public class CalculateGenotypePosteriors extends RodWalker<Integer,Integer> {
|
|||
}
|
||||
}
|
||||
|
||||
final TreeSet<String> vcfSamples = new TreeSet<>(SampleUtils.getSampleList(vcfRods, GATKVariantContextUtils.GenotypeMergeType.REQUIRE_UNIQUE));
|
||||
|
||||
// Initialize VCF header
|
||||
final Set<VCFHeaderLine> headerLines = VCFUtils.smartMergeHeaders(vcfRods.values(), true);
|
||||
headerLines.add(new VCFFormatHeaderLine(VCFConstants.GENOTYPE_POSTERIORS_KEY, VCFHeaderLineCount.G, VCFHeaderLineType.Integer, "Posterior Genotype Likelihoods"));
|
||||
headerLines.add(new VCFFormatHeaderLine(PHRED_SCALED_POSTERIORS_KEY, VCFHeaderLineCount.G, VCFHeaderLineType.Integer, "Phred-scaled Posterior Genotype Probabilities"));
|
||||
headerLines.add(new VCFInfoHeaderLine("PG", VCFHeaderLineCount.G, VCFHeaderLineType.Integer, "Genotype Likelihood Prior"));
|
||||
if (!skipFamilyPriors)
|
||||
headerLines.add(new VCFFormatHeaderLine(TRANSMISSION_PROBABILITY_TAG_NAME, 1, VCFHeaderLineType.Integer, "Phred score of the genotype combination and phase given that the genotypes are correct"));
|
||||
headerLines.add(new VCFHeaderLine("source", "CalculateGenotypePosteriors"));
|
||||
|
||||
vcfWriter.writeHeader(new VCFHeader(headerLines, vcfSamples));
|
||||
|
||||
Map<String,Set<Sample>> families = this.getSampleDB().getFamilies(vcfSamples);
|
||||
famUtils.initialize(deNovoPrior, vcfSamples, families);
|
||||
}
|
||||
|
||||
public Integer reduceInit() { return 0; }
|
||||
|
|
@ -265,12 +321,30 @@ public class CalculateGenotypePosteriors extends RodWalker<Integer,Integer> {
|
|||
|
||||
final int missing = supportVariants.size() - otherVCs.size();
|
||||
|
||||
for ( VariantContext vc : vcs ) {
|
||||
vcfWriter.add(PosteriorLikelihoodsUtils.calculatePosteriorGLs(vc, otherVCs, missing * numRefIfMissing, globalPrior, !ignoreInputSamples, defaultToAC, calcMissing));
|
||||
}
|
||||
for ( final VariantContext vc : vcs ) {
|
||||
VariantContext vc_familyPriors,vc_bothPriors;
|
||||
|
||||
//do family priors first (if applicable)
|
||||
final VariantContextBuilder builder = new VariantContextBuilder(vc);
|
||||
//only compute family priors for biallelelic sites
|
||||
if (!skipFamilyPriors && vc.isBiallelic()){
|
||||
GenotypesContext gc = famUtils.calculatePosteriorGLs(vc);
|
||||
builder.genotypes(gc);
|
||||
}
|
||||
vc_familyPriors = builder.make();
|
||||
|
||||
if (!skipPopulationPriors)
|
||||
vc_bothPriors = PosteriorLikelihoodsUtils.calculatePosteriorGLs(vc_familyPriors, otherVCs, missing * numRefIfMissing, globalPrior, !ignoreInputSamples, defaultToAC, calcMissing);
|
||||
else {
|
||||
final VariantContextBuilder builder2 = new VariantContextBuilder(vc_familyPriors);
|
||||
vc_bothPriors = builder2.make();
|
||||
}
|
||||
vcfWriter.add(vc_bothPriors);
|
||||
}
|
||||
|
||||
return 1;
|
||||
}
|
||||
|
||||
public Integer reduce(Integer l, Integer r) { return r + l; }
|
||||
}
|
||||
|
||||
|
|
|
|||
|
|
@ -0,0 +1,680 @@
|
|||
/*
|
||||
* By downloading the PROGRAM you agree to the following terms of use:
|
||||
*
|
||||
* BROAD INSTITUTE - SOFTWARE LICENSE AGREEMENT - FOR ACADEMIC NON-COMMERCIAL RESEARCH PURPOSES ONLY
|
||||
*
|
||||
* This Agreement is made between the Broad Institute, Inc. with a principal address at 7 Cambridge Center, Cambridge, MA 02142 (BROAD) and the LICENSEE and is effective at the date the downloading is completed (EFFECTIVE DATE).
|
||||
*
|
||||
* WHEREAS, LICENSEE desires to license the PROGRAM, as defined hereinafter, and BROAD wishes to have this PROGRAM utilized in the public interest, subject only to the royalty-free, nonexclusive, nontransferable license rights of the United States Government pursuant to 48 CFR 52.227-14; and
|
||||
* WHEREAS, LICENSEE desires to license the PROGRAM and BROAD desires to grant a license on the following terms and conditions.
|
||||
* NOW, THEREFORE, in consideration of the promises and covenants made herein, the parties hereto agree as follows:
|
||||
*
|
||||
* 1. DEFINITIONS
|
||||
* 1.1 PROGRAM shall mean copyright in the object code and source code known as GATK2 and related documentation, if any, as they exist on the EFFECTIVE DATE and can be downloaded from http://www.broadinstitute/GATK on the EFFECTIVE DATE.
|
||||
*
|
||||
* 2. LICENSE
|
||||
* 2.1 Grant. Subject to the terms of this Agreement, BROAD hereby grants to LICENSEE, solely for academic non-commercial research purposes, a non-exclusive, non-transferable license to: (a) download, execute and display the PROGRAM and (b) create bug fixes and modify the PROGRAM.
|
||||
* The LICENSEE may apply the PROGRAM in a pipeline to data owned by users other than the LICENSEE and provide these users the results of the PROGRAM provided LICENSEE does so for academic non-commercial purposes only. For clarification purposes, academic sponsored research is not a commercial use under the terms of this Agreement.
|
||||
* 2.2 No Sublicensing or Additional Rights. LICENSEE shall not sublicense or distribute the PROGRAM, in whole or in part, without prior written permission from BROAD. LICENSEE shall ensure that all of its users agree to the terms of this Agreement. LICENSEE further agrees that it shall not put the PROGRAM on a network, server, or other similar technology that may be accessed by anyone other than the LICENSEE and its employees and users who have agreed to the terms of this agreement.
|
||||
* 2.3 License Limitations. Nothing in this Agreement shall be construed to confer any rights upon LICENSEE by implication, estoppel, or otherwise to any computer software, trademark, intellectual property, or patent rights of BROAD, or of any other entity, except as expressly granted herein. LICENSEE agrees that the PROGRAM, in whole or part, shall not be used for any commercial purpose, including without limitation, as the basis of a commercial software or hardware product or to provide services. LICENSEE further agrees that the PROGRAM shall not be copied or otherwise adapted in order to circumvent the need for obtaining a license for use of the PROGRAM.
|
||||
*
|
||||
* 3. OWNERSHIP OF INTELLECTUAL PROPERTY
|
||||
* LICENSEE acknowledges that title to the PROGRAM shall remain with BROAD. The PROGRAM is marked with the following BROAD copyright notice and notice of attribution to contributors. LICENSEE shall retain such notice on all copies. LICENSEE agrees to include appropriate attribution if any results obtained from use of the PROGRAM are included in any publication.
|
||||
* Copyright 2012 Broad Institute, Inc.
|
||||
* Notice of attribution: The GATK2 program was made available through the generosity of Medical and Population Genetics program at the Broad Institute, Inc.
|
||||
* LICENSEE shall not use any trademark or trade name of BROAD, or any variation, adaptation, or abbreviation, of such marks or trade names, or any names of officers, faculty, students, employees, or agents of BROAD except as states above for attribution purposes.
|
||||
*
|
||||
* 4. INDEMNIFICATION
|
||||
* LICENSEE shall indemnify, defend, and hold harmless BROAD, and their respective officers, faculty, students, employees, associated investigators and agents, and their respective successors, heirs and assigns, (Indemnitees), against any liability, damage, loss, or expense (including reasonable attorneys fees and expenses) incurred by or imposed upon any of the Indemnitees in connection with any claims, suits, actions, demands or judgments arising out of any theory of liability (including, without limitation, actions in the form of tort, warranty, or strict liability and regardless of whether such action has any factual basis) pursuant to any right or license granted under this Agreement.
|
||||
*
|
||||
* 5. NO REPRESENTATIONS OR WARRANTIES
|
||||
* THE PROGRAM IS DELIVERED AS IS. BROAD MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND CONCERNING THE PROGRAM OR THE COPYRIGHT, EXPRESS OR IMPLIED, INCLUDING, WITHOUT LIMITATION, WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, NONINFRINGEMENT, OR THE ABSENCE OF LATENT OR OTHER DEFECTS, WHETHER OR NOT DISCOVERABLE. BROAD EXTENDS NO WARRANTIES OF ANY KIND AS TO PROGRAM CONFORMITY WITH WHATEVER USER MANUALS OR OTHER LITERATURE MAY BE ISSUED FROM TIME TO TIME.
|
||||
* IN NO EVENT SHALL BROAD OR ITS RESPECTIVE DIRECTORS, OFFICERS, EMPLOYEES, AFFILIATED INVESTIGATORS AND AFFILIATES BE LIABLE FOR INCIDENTAL OR CONSEQUENTIAL DAMAGES OF ANY KIND, INCLUDING, WITHOUT LIMITATION, ECONOMIC DAMAGES OR INJURY TO PROPERTY AND LOST PROFITS, REGARDLESS OF WHETHER BROAD SHALL BE ADVISED, SHALL HAVE OTHER REASON TO KNOW, OR IN FACT SHALL KNOW OF THE POSSIBILITY OF THE FOREGOING.
|
||||
*
|
||||
* 6. ASSIGNMENT
|
||||
* This Agreement is personal to LICENSEE and any rights or obligations assigned by LICENSEE without the prior written consent of BROAD shall be null and void.
|
||||
*
|
||||
* 7. MISCELLANEOUS
|
||||
* 7.1 Export Control. LICENSEE gives assurance that it will comply with all United States export control laws and regulations controlling the export of the PROGRAM, including, without limitation, all Export Administration Regulations of the United States Department of Commerce. Among other things, these laws and regulations prohibit, or require a license for, the export of certain types of software to specified countries.
|
||||
* 7.2 Termination. LICENSEE shall have the right to terminate this Agreement for any reason upon prior written notice to BROAD. If LICENSEE breaches any provision hereunder, and fails to cure such breach within thirty (30) days, BROAD may terminate this Agreement immediately. Upon termination, LICENSEE shall provide BROAD with written assurance that the original and all copies of the PROGRAM have been destroyed, except that, upon prior written authorization from BROAD, LICENSEE may retain a copy for archive purposes.
|
||||
* 7.3 Survival. The following provisions shall survive the expiration or termination of this Agreement: Articles 1, 3, 4, 5 and Sections 2.2, 2.3, 7.3, and 7.4.
|
||||
* 7.4 Notice. Any notices under this Agreement shall be in writing, shall specifically refer to this Agreement, and shall be sent by hand, recognized national overnight courier, confirmed facsimile transmission, confirmed electronic mail, or registered or certified mail, postage prepaid, return receipt requested. All notices under this Agreement shall be deemed effective upon receipt.
|
||||
* 7.5 Amendment and Waiver; Entire Agreement. This Agreement may be amended, supplemented, or otherwise modified only by means of a written instrument signed by all parties. Any waiver of any rights or failure to act in a specific instance shall relate only to such instance and shall not be construed as an agreement to waive any rights or fail to act in any other instance, whether or not similar. This Agreement constitutes the entire agreement among the parties with respect to its subject matter and supersedes prior agreements or understandings between the parties relating to its subject matter.
|
||||
* 7.6 Binding Effect; Headings. This Agreement shall be binding upon and inure to the benefit of the parties and their respective permitted successors and assigns. All headings are for convenience only and shall not affect the meaning of any provision of this Agreement.
|
||||
* 7.7 Governing Law. This Agreement shall be construed, governed, interpreted and applied in accordance with the internal laws of the Commonwealth of Massachusetts, U.S.A., without regard to conflict of laws principles.
|
||||
*/
|
||||
|
||||
package org.broadinstitute.gatk.tools.walkers.variantutils;
|
||||
|
||||
import org.apache.log4j.Logger;
|
||||
import org.broadinstitute.gatk.engine.GenomeAnalysisEngine;
|
||||
import org.broadinstitute.gatk.engine.samples.Sample;
|
||||
import org.broadinstitute.gatk.utils.MathUtils;
|
||||
import org.broadinstitute.gatk.utils.QualityUtils;
|
||||
import org.broadinstitute.gatk.utils.Utils;
|
||||
import org.broadinstitute.gatk.utils.exceptions.UserException;
|
||||
import org.broadinstitute.gatk.utils.variant.GATKVariantContextUtils;
|
||||
import htsjdk.variant.utils.GeneralUtils;
|
||||
import htsjdk.variant.variantcontext.*;
|
||||
|
||||
import java.util.*;
|
||||
|
||||
/**
|
||||
* FamilyLikelihoodsUtils code is based on PhaseByTransmission with added posterior probability calculations
|
||||
*/
|
||||
|
||||
public class FamilyLikelihoodsUtils {
|
||||
private static Logger logger = Logger.getLogger(FamilyLikelihoodsUtils.class);
|
||||
|
||||
//Matrix of priors for all genotype combinations
|
||||
final private EnumMap<GenotypeType,EnumMap<GenotypeType,EnumMap<GenotypeType,Integer>>> mvCountMatrix =
|
||||
new EnumMap<GenotypeType,EnumMap<GenotypeType,EnumMap<GenotypeType,Integer>>>(GenotypeType.class);
|
||||
|
||||
//Matrix of allele transmission
|
||||
final private EnumMap<GenotypeType,EnumMap<GenotypeType,EnumMap<GenotypeType,TrioGenotypes>>> transmissionMatrix =
|
||||
new EnumMap<GenotypeType,EnumMap<GenotypeType,EnumMap<GenotypeType,TrioGenotypes>>>(GenotypeType.class);
|
||||
|
||||
final double[] configurationLikelihoodsMatrix = {0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
|
||||
0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
|
||||
0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0}; //27 is # of trio genotype combos, initialize to zero
|
||||
|
||||
ArrayList<Sample> trios = new ArrayList<Sample>();
|
||||
|
||||
//Random number generator
|
||||
final private Random rand = new GenomeAnalysisEngine().getRandomGenerator();
|
||||
|
||||
private final String TRANSMISSION_PROBABILITY_TAG_NAME = "TP";
|
||||
private final String PHRED_SCALED_POSTERIORS_KEY = "PP";
|
||||
|
||||
public final double NO_TRANSMISSION_PROB = -1.0;
|
||||
|
||||
private double deNovoPrior = 1e-8;
|
||||
|
||||
private enum FamilyMember {
|
||||
MOTHER,
|
||||
FATHER,
|
||||
CHILD
|
||||
}
|
||||
|
||||
//Stores a conceptual trio or parent/child pair genotype combination
|
||||
//This combination can then be "applied" to a given trio or pair using the getUpdatedGenotypes method.
|
||||
private class TrioGenotypes {
|
||||
|
||||
//Create 2 fake alleles
|
||||
//The actual bases will never be used but the Genotypes created using the alleles will be.
|
||||
private final Allele REF = Allele.create("A",true);
|
||||
private final Allele VAR = Allele.create("A",false);
|
||||
private final Allele NO_CALL = Allele.create(".",false);
|
||||
private final String DUMMY_NAME = "DummySample";
|
||||
private EnumMap<GenotypeType,EnumMap<GenotypeType,EnumMap<GenotypeType,Integer>>> MVcountMatrix;
|
||||
|
||||
private final EnumMap<FamilyMember,Genotype> familyGenotypes = new EnumMap<FamilyMember, Genotype>(FamilyMember.class);
|
||||
|
||||
/* Constructor: Creates a conceptual trio genotype combination from the given genotypes.
|
||||
*/
|
||||
public TrioGenotypes(GenotypeType mother, GenotypeType father, GenotypeType child){
|
||||
familyGenotypes.put(FamilyMember.MOTHER, makeGenotype(mother));
|
||||
familyGenotypes.put(FamilyMember.FATHER, makeGenotype(father));
|
||||
familyGenotypes.put(FamilyMember.CHILD, makeGenotype(child));
|
||||
}
|
||||
|
||||
private ArrayList<Allele> getAlleles(GenotypeType genotype){
|
||||
final ArrayList<Allele> alleles = new ArrayList<Allele>(2);
|
||||
if(genotype == GenotypeType.HOM_REF){
|
||||
alleles.add(REF);
|
||||
alleles.add(REF);
|
||||
}
|
||||
else if(genotype == GenotypeType.HET){
|
||||
alleles.add(REF);
|
||||
alleles.add(VAR);
|
||||
}
|
||||
else if(genotype == GenotypeType.HOM_VAR){
|
||||
alleles.add(VAR);
|
||||
alleles.add(VAR);
|
||||
}
|
||||
else{
|
||||
return null;
|
||||
}
|
||||
return alleles;
|
||||
}
|
||||
|
||||
public void setMVcountMatrix(EnumMap<GenotypeType,EnumMap<GenotypeType,EnumMap<GenotypeType,Integer>>> inputMat) {
|
||||
MVcountMatrix = inputMat;
|
||||
}
|
||||
|
||||
private boolean hasCalledGT(GenotypeType genotype){
|
||||
return genotype == GenotypeType.HOM_REF || genotype == GenotypeType.HET || genotype == GenotypeType.HOM_VAR;
|
||||
}
|
||||
|
||||
//TODO: this was stupid stuff for phasing -- let's take it out
|
||||
private Genotype makeGenotype(final GenotypeType type) {
|
||||
return makeGenotype(getAlleles(type));
|
||||
}
|
||||
|
||||
private Genotype makeGenotype(final List<Allele> alleles) {
|
||||
final GenotypeBuilder gb = new GenotypeBuilder(DUMMY_NAME, alleles);
|
||||
return gb.make();
|
||||
}
|
||||
|
||||
/**
|
||||
* Applies the trio genotype combination to the given trio.
|
||||
* @param ref: Reference allele
|
||||
* @param alt: Alternate allele
|
||||
* @param motherGenotype: Genotype of the mother in this trio genotype combination
|
||||
* @param fatherGenotype: Genotype of the father in this trio genotype combination
|
||||
* @param childGenotype: Genotype of the child in this trio genotype combination
|
||||
* @param transmissionProb: Probability for this trio genotype combination to be correct (pass NO_TRANSMISSION_PROB if unavailable)
|
||||
* @param configurationLikelihoodsMatrix: (Non-normalized) likelihoods for each trio genotype combination, for use in generating new PLs
|
||||
* @param updatedGenotypes: An ArrayList<Genotype> to which the newly updated genotypes are added in the following order: Mother, Father, Child
|
||||
*/
|
||||
public void getUpdatedGenotypes(final Allele ref, final Allele alt, final Genotype motherGenotype, final Genotype fatherGenotype, final Genotype childGenotype, final double transmissionProb, double[] configurationLikelihoodsMatrix, final ArrayList<Genotype> updatedGenotypes){
|
||||
//default to flat priors in case input genotypes are not called
|
||||
double[] motherPosteriors = {1,1,1};
|
||||
double[] fatherPosteriors = {1,1,1};
|
||||
double[] childPosteriors = {1,1,1};
|
||||
|
||||
//genotypes here can be no call
|
||||
boolean fatherIsCalled = fatherGenotype != null && hasCalledGT(fatherGenotype.getType());
|
||||
boolean motherIsCalled = motherGenotype != null && hasCalledGT(motherGenotype.getType());
|
||||
boolean childIsCalled = childGenotype != null && hasCalledGT(childGenotype.getType());
|
||||
|
||||
if (fatherIsCalled && childIsCalled) {
|
||||
motherPosteriors = getPosteriors(FamilyMember.MOTHER);
|
||||
}
|
||||
updatedGenotypes.add(getUpdatedGenotype(ref, alt, motherGenotype, transmissionProb, motherPosteriors));
|
||||
|
||||
if (motherIsCalled && childIsCalled) {
|
||||
fatherPosteriors = getPosteriors(FamilyMember.FATHER);
|
||||
}
|
||||
updatedGenotypes.add(getUpdatedGenotype(ref, alt, fatherGenotype, transmissionProb, fatherPosteriors));
|
||||
|
||||
if (motherIsCalled && fatherIsCalled) {
|
||||
childPosteriors = getPosteriors(FamilyMember.CHILD);
|
||||
}
|
||||
updatedGenotypes.add(getUpdatedGenotype(ref, alt, childGenotype, transmissionProb, childPosteriors));
|
||||
}
|
||||
|
||||
private Genotype getUpdatedGenotype(Allele refAllele, Allele altAllele, Genotype genotype, double transmissionProb, double[] normalizedPosteriors){
|
||||
|
||||
int phredScoreTransmission = -1;
|
||||
if(transmissionProb != NO_TRANSMISSION_PROB){
|
||||
double dphredScoreTransmission = QualityUtils.phredScaleLog10ErrorRate(Math.log10(1 - (transmissionProb)));
|
||||
phredScoreTransmission = dphredScoreTransmission < Byte.MAX_VALUE ? (byte)dphredScoreTransmission : Byte.MAX_VALUE;
|
||||
}
|
||||
//Handle null, missing and unavailable genotypes
|
||||
//Note that only cases where a null/missing/unavailable genotype was passed in the first place can lead to a null/missing/unavailable
|
||||
//genotype so it is safe to return the original genotype in this case.
|
||||
//In addition, if the genotype configuration confidence is 0, then return the original genotypes.
|
||||
if(phredScoreTransmission ==0 || genotype == null || !hasCalledGT(genotype.getType()))
|
||||
return genotype;
|
||||
|
||||
//Add the transmission probability
|
||||
final Map<String, Object> genotypeAttributes = new HashMap<String, Object>();
|
||||
genotypeAttributes.putAll(genotype.getExtendedAttributes());
|
||||
if(transmissionProb>NO_TRANSMISSION_PROB)
|
||||
genotypeAttributes.put(TRANSMISSION_PROBABILITY_TAG_NAME, phredScoreTransmission);
|
||||
|
||||
final ArrayList<Allele> usedAlleles = new ArrayList<Allele>(2);
|
||||
usedAlleles.add(refAllele);
|
||||
usedAlleles.add(altAllele);
|
||||
|
||||
final GenotypeBuilder builder = new GenotypeBuilder(genotype);
|
||||
|
||||
final double[] log10Posteriors = MathUtils.toLog10(normalizedPosteriors);
|
||||
|
||||
//note that there will there be times when posteriors don't agree with genotype predicted by configuration likelihoods
|
||||
GATKVariantContextUtils.updateGenotypeAfterSubsetting(usedAlleles, builder,
|
||||
GATKVariantContextUtils.GenotypeAssignmentMethod.USE_PLS_TO_ASSIGN, log10Posteriors, usedAlleles);
|
||||
|
||||
|
||||
|
||||
builder.attribute(PHRED_SCALED_POSTERIORS_KEY,
|
||||
Utils.listFromPrimitives(GenotypeLikelihoods.fromLog10Likelihoods(log10Posteriors).getAsPLs()));
|
||||
builder.attributes(genotypeAttributes);
|
||||
return builder.make();
|
||||
}
|
||||
|
||||
//marginalize over the configurationLikelihoodsMatrix and normalize to get the posteriors
|
||||
private double[] getPosteriors(FamilyMember recalcInd) {
|
||||
double marginalOverChangedHR, marginalOverChangedHET, marginalOverChangedHV;
|
||||
marginalOverChangedHR = marginalOverChangedHET = marginalOverChangedHV = 0;
|
||||
final double[] recalcPosteriors = new double[3];
|
||||
|
||||
GenotypeType[] calledTypes = {GenotypeType.HOM_REF, GenotypeType.HET, GenotypeType.HOM_VAR};
|
||||
|
||||
switch (recalcInd) {
|
||||
case MOTHER:
|
||||
for(final GenotypeType father : calledTypes) {
|
||||
for(final GenotypeType child : calledTypes) {
|
||||
GenotypeType mother;
|
||||
mother = GenotypeType.HOM_REF;
|
||||
marginalOverChangedHR += configurationLikelihoodsMatrix[getLikelihoodIndex(mother,father,child,false)];
|
||||
mother = GenotypeType.HET;
|
||||
marginalOverChangedHET += configurationLikelihoodsMatrix[getLikelihoodIndex(mother,father,child,false)];
|
||||
mother = GenotypeType.HOM_VAR;
|
||||
marginalOverChangedHV += configurationLikelihoodsMatrix[getLikelihoodIndex(mother,father,child,false)];
|
||||
}
|
||||
}
|
||||
break;
|
||||
case FATHER:
|
||||
for(final GenotypeType mother : calledTypes){
|
||||
for (final GenotypeType child : calledTypes){
|
||||
GenotypeType father;
|
||||
father = GenotypeType.HOM_REF;
|
||||
marginalOverChangedHR += configurationLikelihoodsMatrix[getLikelihoodIndex(mother,father,child,false)];
|
||||
father = GenotypeType.HET;
|
||||
marginalOverChangedHET += configurationLikelihoodsMatrix[getLikelihoodIndex(mother,father,child,false)];
|
||||
father = GenotypeType.HOM_VAR;
|
||||
marginalOverChangedHV += configurationLikelihoodsMatrix[getLikelihoodIndex(mother,father,child,false)];
|
||||
}
|
||||
}
|
||||
break;
|
||||
case CHILD:
|
||||
for(final GenotypeType mother : calledTypes){
|
||||
for (final GenotypeType father: calledTypes){
|
||||
GenotypeType child;
|
||||
child = GenotypeType.HOM_REF;
|
||||
marginalOverChangedHR += configurationLikelihoodsMatrix[getLikelihoodIndex(mother,father,child,false)];
|
||||
child = GenotypeType.HET;
|
||||
marginalOverChangedHET += configurationLikelihoodsMatrix[getLikelihoodIndex(mother,father,child,false)];
|
||||
child = GenotypeType.HOM_VAR;
|
||||
marginalOverChangedHV += configurationLikelihoodsMatrix[getLikelihoodIndex(mother,father,child,false)];
|
||||
}
|
||||
}
|
||||
break;
|
||||
default:
|
||||
throw new UserException(String.format("%d does not indicate a valid trio individual -- use 0 for mother, 1 for father, 2 for child",recalcInd));
|
||||
}
|
||||
recalcPosteriors[0] = marginalOverChangedHR;
|
||||
recalcPosteriors[1] = marginalOverChangedHET;
|
||||
recalcPosteriors[2] = marginalOverChangedHV;
|
||||
|
||||
final double[] normalizedPosteriors = MathUtils.normalizeFromRealSpace(recalcPosteriors);
|
||||
|
||||
return normalizedPosteriors;
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
public void initialize(double DNprior, Set<String> vcfSamples, Map<String,Set<Sample>> families){
|
||||
this.deNovoPrior = DNprior;
|
||||
buildMatrices();
|
||||
trios = setTrios(vcfSamples, families);
|
||||
}
|
||||
|
||||
public GenotypesContext calculatePosteriorGLs(VariantContext vc){
|
||||
final GenotypesContext genotypesContext = GenotypesContext.copy(vc.getGenotypes());
|
||||
|
||||
for (final Sample sample : trios) {
|
||||
Genotype mother = vc.getGenotype(sample.getMaternalID());
|
||||
Genotype father = vc.getGenotype(sample.getPaternalID());
|
||||
Genotype child = vc.getGenotype(sample.getID());
|
||||
|
||||
//Keep only trios and parent/child pairs
|
||||
if(mother == null && father == null || child == null) {
|
||||
logger.warn("null genos in var "+vc.toStringDecodeGenotypes());
|
||||
continue;
|
||||
}
|
||||
|
||||
final ArrayList<Genotype> trioGenotypes = new ArrayList<Genotype>(3);
|
||||
final int mvCount = updateFamilyGenotypes(vc.getReference(), vc.getAltAlleleWithHighestAlleleCount(), mother, father, child, trioGenotypes);
|
||||
|
||||
Genotype updatedMother = trioGenotypes.get(0);
|
||||
Genotype updatedFather = trioGenotypes.get(1);
|
||||
Genotype updatedChild = trioGenotypes.get(2);
|
||||
|
||||
genotypesContext.replace(updatedChild);
|
||||
genotypesContext.replace(updatedFather);
|
||||
genotypesContext.replace(updatedMother);
|
||||
}
|
||||
|
||||
return genotypesContext;
|
||||
}
|
||||
|
||||
/**
|
||||
* Select trios and parent/child pairs only
|
||||
*/
|
||||
private ArrayList<Sample> setTrios(Set<String> vcfSamples, Map<String,Set<Sample>> families){
|
||||
|
||||
Set<Sample> family;
|
||||
ArrayList<Sample> parents;
|
||||
final ArrayList<Sample> trios = new ArrayList<Sample>();
|
||||
for(final Map.Entry<String,Set<Sample>> familyEntry : families.entrySet()){
|
||||
family = familyEntry.getValue();
|
||||
|
||||
// Since getFamilies(vcfSamples) above still returns parents of samples in the VCF even if those parents are not in the VCF, need to subset down here:
|
||||
final Set<Sample> familyMembersInVCF = new TreeSet<Sample>();
|
||||
for(final Sample familyMember : family){
|
||||
if (vcfSamples.contains(familyMember.getID())) {
|
||||
familyMembersInVCF.add(familyMember);
|
||||
}
|
||||
}
|
||||
family = familyMembersInVCF;
|
||||
|
||||
if(family.size() == 3){
|
||||
for(final Sample familyMember : family){
|
||||
parents = familyMember.getParents();
|
||||
if(parents.size()>0){
|
||||
if(family.containsAll(parents))
|
||||
trios.add(familyMember);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
}
|
||||
return trios;
|
||||
}
|
||||
|
||||
//Create the transmission matrices
|
||||
//TODO: pass in the real genotypes so we have that info
|
||||
private void buildMatrices(){
|
||||
for(final GenotypeType mother : GenotypeType.values()){
|
||||
mvCountMatrix.put(mother,new EnumMap<GenotypeType,EnumMap<GenotypeType,Integer>>(GenotypeType.class));
|
||||
transmissionMatrix.put(mother,new EnumMap<GenotypeType,EnumMap<GenotypeType,TrioGenotypes>>(GenotypeType.class));
|
||||
for(final GenotypeType father : GenotypeType.values()){
|
||||
mvCountMatrix.get(mother).put(father,new EnumMap<GenotypeType, Integer>(GenotypeType.class));
|
||||
transmissionMatrix.get(mother).put(father,new EnumMap<GenotypeType,TrioGenotypes>(GenotypeType.class));
|
||||
for(final GenotypeType child : GenotypeType.values()){
|
||||
mvCountMatrix.get(mother).get(father).put(child, getCombinationMVCount(mother, father, child));
|
||||
transmissionMatrix.get(mother).get(father).put(child,new TrioGenotypes(mother,father,child));
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
//Returns the number of Mendelian Violations for a given genotype combination.
|
||||
//If one of the parents' genotypes is missing, it will consider it as a parent/child pair
|
||||
//If the child genotype or both parents genotypes are missing, 0 is returned.
|
||||
private int getCombinationMVCount(GenotypeType mother, GenotypeType father, GenotypeType child){
|
||||
|
||||
//Child is no call => No MV
|
||||
if(child == GenotypeType.NO_CALL || child == GenotypeType.UNAVAILABLE)
|
||||
return 0;
|
||||
//Add parents with genotypes for the evaluation
|
||||
final ArrayList<GenotypeType> parents = new ArrayList<GenotypeType>();
|
||||
if (!(mother == GenotypeType.NO_CALL || mother == GenotypeType.UNAVAILABLE))
|
||||
parents.add(mother);
|
||||
if (!(father == GenotypeType.NO_CALL || father == GenotypeType.UNAVAILABLE))
|
||||
parents.add(father);
|
||||
|
||||
//Both parents no calls => No MV
|
||||
if (parents.isEmpty())
|
||||
return 0;
|
||||
|
||||
//If at least one parent had a genotype, then count the number of ref and alt alleles that can be passed
|
||||
int parentsNumRefAlleles = 0;
|
||||
int parentsNumAltAlleles = 0;
|
||||
|
||||
for(final GenotypeType parent : parents){
|
||||
if(parent == GenotypeType.HOM_REF){
|
||||
parentsNumRefAlleles++;
|
||||
}
|
||||
else if(parent == GenotypeType.HET){
|
||||
parentsNumRefAlleles++;
|
||||
parentsNumAltAlleles++;
|
||||
}
|
||||
else if(parent == GenotypeType.HOM_VAR){
|
||||
parentsNumAltAlleles++;
|
||||
}
|
||||
}
|
||||
|
||||
//Case Child is HomRef
|
||||
if(child == GenotypeType.HOM_REF){
|
||||
if(parentsNumRefAlleles == parents.size())
|
||||
return 0;
|
||||
else return (parents.size()-parentsNumRefAlleles);
|
||||
}
|
||||
|
||||
//Case child is HomVar
|
||||
if(child == GenotypeType.HOM_VAR){
|
||||
if(parentsNumAltAlleles == parents.size())
|
||||
return 0;
|
||||
else return parents.size()-parentsNumAltAlleles;
|
||||
}
|
||||
|
||||
//Case child is Het
|
||||
if(child == GenotypeType.HET && ((parentsNumRefAlleles > 0 && parentsNumAltAlleles > 0) || parents.size()<2))
|
||||
return 0;
|
||||
|
||||
//MV
|
||||
return 1;
|
||||
}
|
||||
|
||||
/**
|
||||
* Updates the genotypes of the given trio. If one of the parents is null, it is considered a parent/child pair.
|
||||
* @param ref: Reference allele
|
||||
* @param alt: Alternative allele
|
||||
* @param mother: Mother's genotype from vc input
|
||||
* @param father: Father's genotype from vc input
|
||||
* @param child: Child's genotype from vc input
|
||||
* @param finalGenotypes: An ArrayList<Genotype> containing the updated genotypes
|
||||
* @return
|
||||
*/
|
||||
private int updateFamilyGenotypes(Allele ref, Allele alt, Genotype mother, Genotype father, Genotype child, ArrayList<Genotype> finalGenotypes) {
|
||||
|
||||
//Check whether it is a pair or trio
|
||||
//Always assign the first parent as the parent having genotype information in pairs
|
||||
//Always assign the mother as the first parent in trios
|
||||
int parentsCalled = 0;
|
||||
Map<GenotypeType,Double> firstParentLikelihoods;
|
||||
Map<GenotypeType,Double> secondParentLikelihoods;
|
||||
final ArrayList<GenotypeType> bestFirstParentGenotype = new ArrayList<GenotypeType>();
|
||||
final ArrayList<GenotypeType> bestSecondParentGenotype = new ArrayList<GenotypeType>();
|
||||
final ArrayList<GenotypeType> bestChildGenotype = new ArrayList<GenotypeType>();
|
||||
GenotypeType pairSecondParentGenotype = null;
|
||||
boolean parentsAreFlipped = false; //usually mother comes first, like for indexing of transmissionMatrix
|
||||
final int INVALID_INDEX = -1;
|
||||
|
||||
//if only one parent is called, make uncalled parent the secondParent
|
||||
if(mother == null || !mother.isCalled()){
|
||||
firstParentLikelihoods = getLikelihoodsAsMapSafeNull(father);
|
||||
secondParentLikelihoods = getLikelihoodsAsMapSafeNull(mother);
|
||||
bestFirstParentGenotype.add(getTypeSafeNull(father));
|
||||
bestSecondParentGenotype.add(getTypeSafeNull(mother));
|
||||
pairSecondParentGenotype = mother == null ? GenotypeType.UNAVAILABLE : mother.getType();
|
||||
parentsAreFlipped = true;
|
||||
if(father != null && father.isCalled())
|
||||
parentsCalled = 1;
|
||||
}
|
||||
else{
|
||||
firstParentLikelihoods = getLikelihoodsAsMapSafeNull(mother);
|
||||
secondParentLikelihoods = getLikelihoodsAsMapSafeNull(father);
|
||||
bestFirstParentGenotype.add(getTypeSafeNull(mother));
|
||||
bestSecondParentGenotype.add(getTypeSafeNull(father));
|
||||
if(father == null || !father.isCalled()){
|
||||
parentsCalled = 1;
|
||||
pairSecondParentGenotype = father == null ? GenotypeType.UNAVAILABLE : father.getType();
|
||||
}else{
|
||||
parentsCalled = 2;
|
||||
}
|
||||
}
|
||||
Map<GenotypeType,Double> childLikelihoods = getLikelihoodsAsMapSafeNull(child);
|
||||
bestChildGenotype.add(getTypeSafeNull(child));
|
||||
|
||||
//Prior vars
|
||||
double bestConfigurationLikelihood = 0.0;
|
||||
double norm = 0.0;
|
||||
int configuration_index =0;
|
||||
final ArrayList<Integer> bestMVCount = new ArrayList<Integer>();
|
||||
bestMVCount.add(0);
|
||||
|
||||
//Get the most likely combination
|
||||
//Only check for most likely combination if at least a parent and the child have genotypes
|
||||
int matInd;
|
||||
if(child.isCalled() && parentsCalled > 0){
|
||||
int mvCount;
|
||||
int cumulativeMVCount = 0;
|
||||
double configurationLikelihood = 0;
|
||||
for(final Map.Entry<GenotypeType,Double> childGenotype :
|
||||
childLikelihoods.entrySet()){
|
||||
for(final Map.Entry<GenotypeType,Double> firstParentGenotype :
|
||||
firstParentLikelihoods.entrySet()){
|
||||
for(final Map.Entry<GenotypeType,Double> secondParentGenotype :
|
||||
secondParentLikelihoods.entrySet()){
|
||||
mvCount = mvCountMatrix.get(firstParentGenotype.getKey()).get(secondParentGenotype.getKey()).get(childGenotype.getKey());
|
||||
//For parent/child pairs, sum over the possible genotype configurations of the missing parent
|
||||
if(parentsCalled<2){
|
||||
cumulativeMVCount += mvCount;
|
||||
configurationLikelihood += mvCount>0 ? Math.pow(deNovoPrior,mvCount)*firstParentGenotype.getValue()*secondParentGenotype.getValue()*childGenotype.getValue() : (1.0-10*deNovoPrior-deNovoPrior*deNovoPrior)*firstParentGenotype.getValue()*secondParentGenotype.getValue()*childGenotype.getValue();
|
||||
}
|
||||
//Evaluate configurations of trios
|
||||
else{
|
||||
configurationLikelihood = mvCount>0 ? Math.pow(deNovoPrior,mvCount)*firstParentGenotype.getValue()*secondParentGenotype.getValue()*childGenotype.getValue() : (1.0-10*deNovoPrior-deNovoPrior*deNovoPrior)*firstParentGenotype.getValue()*secondParentGenotype.getValue()*childGenotype.getValue();
|
||||
norm += configurationLikelihood;
|
||||
matInd = getLikelihoodIndex(firstParentGenotype.getKey(), secondParentGenotype.getKey(),childGenotype.getKey(),parentsAreFlipped);
|
||||
if (matInd > INVALID_INDEX) //still a slim chance of a MIXED GT
|
||||
configurationLikelihoodsMatrix[matInd] = configurationLikelihood;
|
||||
//Keep this combination if
|
||||
//It has a better likelihood
|
||||
//Or it has the same likelihood but requires less changes from original genotypes
|
||||
if (configurationLikelihood > bestConfigurationLikelihood){
|
||||
bestConfigurationLikelihood = configurationLikelihood;
|
||||
bestMVCount.clear();
|
||||
bestMVCount.add(mvCount);
|
||||
bestFirstParentGenotype.clear();
|
||||
bestFirstParentGenotype.add(firstParentGenotype.getKey());
|
||||
bestSecondParentGenotype.clear();
|
||||
bestSecondParentGenotype.add(secondParentGenotype.getKey());
|
||||
bestChildGenotype.clear();
|
||||
bestChildGenotype.add(childGenotype.getKey());
|
||||
}
|
||||
else if(configurationLikelihood == bestConfigurationLikelihood) {
|
||||
bestFirstParentGenotype.add(firstParentGenotype.getKey());
|
||||
bestSecondParentGenotype.add(secondParentGenotype.getKey());
|
||||
bestChildGenotype.add(childGenotype.getKey());
|
||||
bestMVCount.add(mvCount);
|
||||
}
|
||||
}
|
||||
}
|
||||
//Evaluate configurations of parent/child pairs
|
||||
if(parentsCalled<2){
|
||||
norm += configurationLikelihood;
|
||||
matInd = getLikelihoodIndex(firstParentGenotype.getKey(), GenotypeType.HOM_REF,childGenotype.getKey(), parentsAreFlipped);
|
||||
if (matInd > INVALID_INDEX)
|
||||
configurationLikelihoodsMatrix[matInd] = configurationLikelihood;
|
||||
matInd = getLikelihoodIndex(firstParentGenotype.getKey(), GenotypeType.HET,childGenotype.getKey(),parentsAreFlipped);
|
||||
if (matInd > INVALID_INDEX)
|
||||
configurationLikelihoodsMatrix[matInd] = configurationLikelihood;
|
||||
matInd = getLikelihoodIndex(firstParentGenotype.getKey(), GenotypeType.HOM_VAR,childGenotype.getKey(),parentsAreFlipped);
|
||||
if (matInd > INVALID_INDEX)
|
||||
configurationLikelihoodsMatrix[matInd] = configurationLikelihood;
|
||||
|
||||
//Keep this combination if
|
||||
//It has a better likelihood
|
||||
//Or it has the same likelihood but requires less changes from original genotypes
|
||||
if (configurationLikelihood > bestConfigurationLikelihood){
|
||||
bestConfigurationLikelihood = configurationLikelihood;
|
||||
bestMVCount.clear();
|
||||
bestMVCount.add(cumulativeMVCount/3);
|
||||
bestChildGenotype.clear();
|
||||
bestFirstParentGenotype.clear();
|
||||
bestSecondParentGenotype.clear();
|
||||
bestChildGenotype.add(childGenotype.getKey());
|
||||
bestFirstParentGenotype.add(firstParentGenotype.getKey());
|
||||
bestSecondParentGenotype.add(pairSecondParentGenotype);
|
||||
}
|
||||
else if(configurationLikelihood == bestConfigurationLikelihood) {
|
||||
bestFirstParentGenotype.add(firstParentGenotype.getKey());
|
||||
bestSecondParentGenotype.add(pairSecondParentGenotype);
|
||||
bestChildGenotype.add(childGenotype.getKey());
|
||||
bestMVCount.add(cumulativeMVCount/3);
|
||||
}
|
||||
configurationLikelihood = 0;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
//normalize the best configuration probability
|
||||
bestConfigurationLikelihood = bestConfigurationLikelihood / norm;
|
||||
|
||||
//In case of multiple equally likely combinations, take a random one
|
||||
if(bestFirstParentGenotype.size()>1){
|
||||
configuration_index = rand.nextInt(bestFirstParentGenotype.size()-1);
|
||||
}
|
||||
|
||||
}
|
||||
else{
|
||||
bestConfigurationLikelihood = NO_TRANSMISSION_PROB;
|
||||
}
|
||||
|
||||
TrioGenotypes updatedTrioGenotypes;
|
||||
if(parentsCalled < 2 && mother == null || !mother.isCalled())
|
||||
updatedTrioGenotypes = transmissionMatrix.get(bestSecondParentGenotype.get(configuration_index)).get(bestFirstParentGenotype.get(configuration_index)).get(bestChildGenotype.get(configuration_index));
|
||||
else
|
||||
updatedTrioGenotypes = transmissionMatrix.get(bestFirstParentGenotype.get(configuration_index)).get(bestSecondParentGenotype.get(configuration_index)).get(bestChildGenotype.get(configuration_index));
|
||||
|
||||
//Return the updated genotypes
|
||||
updatedTrioGenotypes.setMVcountMatrix(mvCountMatrix);
|
||||
updatedTrioGenotypes.getUpdatedGenotypes(ref, alt, mother, father, child, bestConfigurationLikelihood, configurationLikelihoodsMatrix, finalGenotypes);
|
||||
return bestMVCount.get(configuration_index);
|
||||
|
||||
}
|
||||
|
||||
//Get a Map of genotype likelihoods, normalized from log10-space.
|
||||
//In case of null, unavailable or no call, all likelihoods are 1/3.
|
||||
private EnumMap<GenotypeType,Double> getLikelihoodsAsMapSafeNull(Genotype genotype){
|
||||
if (genotype != null && genotype.isCalled() && genotype.hasExtendedAttribute(PHRED_SCALED_POSTERIORS_KEY)) {
|
||||
final EnumMap<GenotypeType,Double> likelihoodsMap = new EnumMap<GenotypeType, Double>(GenotypeType.class);
|
||||
Object GPfromVCF = genotype.getExtendedAttribute(PHRED_SCALED_POSTERIORS_KEY);
|
||||
//parse the GPs into a vector of probabilities
|
||||
final String[] likelihoodsAsStringVector = ((String)GPfromVCF).split(",");
|
||||
final double[] likelihoodsAsVector = new double[likelihoodsAsStringVector.length];
|
||||
for ( int i = 0; i < likelihoodsAsStringVector.length; i++ ) {
|
||||
likelihoodsAsVector[i] = Double.parseDouble(likelihoodsAsStringVector[i]) / -10.0;
|
||||
}
|
||||
double[] likelihoods = GeneralUtils.normalizeFromLog10(likelihoodsAsVector);
|
||||
likelihoodsMap.put(GenotypeType.HOM_REF,likelihoods[GenotypeType.HOM_REF.ordinal()-1]);
|
||||
likelihoodsMap.put(GenotypeType.HET,likelihoods[GenotypeType.HET.ordinal()-1]);
|
||||
likelihoodsMap.put(GenotypeType.HOM_VAR, likelihoods[GenotypeType.HOM_VAR.ordinal() - 1]);
|
||||
return likelihoodsMap;
|
||||
}
|
||||
|
||||
if(genotype == null || !genotype.isCalled() || genotype.getLikelihoods() == null){
|
||||
final EnumMap<GenotypeType,Double> likelihoods = new EnumMap<GenotypeType, Double>(GenotypeType.class);
|
||||
likelihoods.put(GenotypeType.HOM_REF,1.0/3.0);
|
||||
likelihoods.put(GenotypeType.HET,1.0/3.0);
|
||||
likelihoods.put(GenotypeType.HOM_VAR,1.0/3.0);
|
||||
return likelihoods;
|
||||
}
|
||||
return genotype.getLikelihoods().getAsMap(true);
|
||||
}
|
||||
|
||||
//Returns the GenotypeType; returns UNAVAILABLE if given null
|
||||
private GenotypeType getTypeSafeNull(Genotype genotype){
|
||||
if(genotype == null)
|
||||
return GenotypeType.UNAVAILABLE;
|
||||
return genotype.getType();
|
||||
}
|
||||
|
||||
private int getLikelihoodIndex(GenotypeType firstParent, GenotypeType secondParent, GenotypeType child, boolean parentsAreFlipped){
|
||||
int childInd = genotypeTypeValue(child);
|
||||
int motherInd;
|
||||
int fatherInd;
|
||||
final int NUM_CALLED_GENOTYPETYPES = 3;
|
||||
final int INVALID = -1;
|
||||
if (parentsAreFlipped)
|
||||
{
|
||||
motherInd = genotypeTypeValue(secondParent);
|
||||
fatherInd = genotypeTypeValue(firstParent);
|
||||
}
|
||||
else {
|
||||
motherInd = genotypeTypeValue(firstParent);
|
||||
fatherInd = genotypeTypeValue(secondParent);
|
||||
}
|
||||
|
||||
|
||||
if (childInd == INVALID || motherInd == INVALID || fatherInd == INVALID) //any of the genotypes are NO_CALL, UNAVAILABLE or MIXED
|
||||
return INVALID;
|
||||
|
||||
//index into array playing the part of a 3x3x3 matrix (where 3=NUM_CALLED_GENOTYPETYPES)
|
||||
return motherInd*NUM_CALLED_GENOTYPETYPES*NUM_CALLED_GENOTYPETYPES + fatherInd*NUM_CALLED_GENOTYPETYPES + childInd;
|
||||
}
|
||||
|
||||
private int genotypeTypeValue(GenotypeType input){
|
||||
if (input == GenotypeType.HOM_REF) return 0;
|
||||
if (input == GenotypeType.HET) return 1;
|
||||
if (input == GenotypeType.HOM_VAR) return 2;
|
||||
return -1;
|
||||
}
|
||||
|
||||
}
|
||||
|
|
@ -57,6 +57,8 @@ import java.util.*;
|
|||
|
||||
public class PosteriorLikelihoodsUtils {
|
||||
|
||||
private static final String PHRED_SCALED_POSTERIORS_KEY = "PP";
|
||||
|
||||
public static VariantContext calculatePosteriorGLs(final VariantContext vc1,
|
||||
final Collection<VariantContext> resources,
|
||||
final int numRefSamplesFromMissingResources,
|
||||
|
|
@ -100,7 +102,36 @@ public class PosteriorLikelihoodsUtils {
|
|||
//parse the likelihoods for each sample's genotype
|
||||
final List<double[]> likelihoods = new ArrayList<>(vc1.getNSamples());
|
||||
for ( final Genotype genotype : vc1.getGenotypes() ) {
|
||||
likelihoods.add(genotype.hasLikelihoods() ? genotype.getLikelihoods().getAsVector() : null );
|
||||
if (!genotype.hasExtendedAttribute(PHRED_SCALED_POSTERIORS_KEY)){
|
||||
likelihoods.add(genotype.hasLikelihoods() ? genotype.getLikelihoods().getAsVector() : null );
|
||||
|
||||
}
|
||||
else {
|
||||
Object PPfromVCF = genotype.getExtendedAttribute(PHRED_SCALED_POSTERIORS_KEY);
|
||||
//parse the PPs into a vector of probabilities
|
||||
if (PPfromVCF instanceof String) {
|
||||
final String PPstring = (String)PPfromVCF;
|
||||
if (PPstring.charAt(0)=='.') //samples not in trios will have PP tag like ".,.,." after family priors are applied
|
||||
likelihoods.add(genotype.hasLikelihoods() ? genotype.getLikelihoods().getAsVector() : null );
|
||||
else {
|
||||
final String[] likelihoodsAsStringVector = PPstring.split(",");
|
||||
double[] likelihoodsAsVector = new double[likelihoodsAsStringVector.length];
|
||||
for ( int i = 0; i < likelihoodsAsStringVector.length; i++ ) {
|
||||
likelihoodsAsVector[i] = Double.parseDouble(likelihoodsAsStringVector[i])/-10.0;
|
||||
}
|
||||
likelihoods.add(likelihoodsAsVector);
|
||||
}
|
||||
}
|
||||
else {
|
||||
int[] likelihoodsAsInts = extractInts(PPfromVCF);
|
||||
double[] likelihoodsAsVector = new double[likelihoodsAsInts.length];
|
||||
for ( int i = 0; i < likelihoodsAsInts.length; i++ ) {
|
||||
likelihoodsAsVector[i] = likelihoodsAsInts[i]/-10.0;
|
||||
}
|
||||
likelihoods.add(likelihoodsAsVector);
|
||||
}
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
//TODO: for now just use priors that are SNPs because indel priors will bias SNP calls
|
||||
|
|
@ -111,10 +142,11 @@ public class PosteriorLikelihoodsUtils {
|
|||
final GenotypesContext newContext = GenotypesContext.create();
|
||||
for ( int genoIdx = 0; genoIdx < vc1.getNSamples(); genoIdx ++ ) {
|
||||
final GenotypeBuilder builder = new GenotypeBuilder(vc1.getGenotype(genoIdx));
|
||||
builder.phased(vc1.getGenotype(genoIdx).isPhased());
|
||||
if ( posteriors.get(genoIdx) != null ) {
|
||||
GATKVariantContextUtils.updateGenotypeAfterSubsetting(vc1.getAlleles(), builder,
|
||||
GATKVariantContextUtils.GenotypeAssignmentMethod.USE_PLS_TO_ASSIGN, posteriors.get(genoIdx), vc1.getAlleles());
|
||||
builder.attribute(VCFConstants.GENOTYPE_POSTERIORS_KEY,
|
||||
builder.attribute(PHRED_SCALED_POSTERIORS_KEY,
|
||||
Utils.listFromPrimitives(GenotypeLikelihoods.fromLog10Likelihoods(posteriors.get(genoIdx)).getAsPLs()));
|
||||
}
|
||||
newContext.add(builder.make());
|
||||
|
|
|
|||
|
|
@ -53,6 +53,10 @@ import java.util.Arrays;
|
|||
|
||||
public class CalculateGenotypePosteriorsIntegrationTest extends WalkerTest {
|
||||
|
||||
private static String CEUtrioFamilyFile = privateTestDir + "CEUtrio.ped";
|
||||
private static String CEUtrioTest = privateTestDir + "CEUtrioTest.vcf";
|
||||
private static String CEUtrioPopPriorsTest = privateTestDir + "CEUtrioPopPriorsTest.vcf";
|
||||
|
||||
@Test(enabled = true)
|
||||
public void testUsingDiscoveredAF() {
|
||||
WalkerTestSpec spec = new WalkerTestSpec(
|
||||
|
|
@ -62,7 +66,7 @@ public class CalculateGenotypePosteriorsIntegrationTest extends WalkerTest {
|
|||
" -L 20:10,000,000-10,100,000" +
|
||||
" -V " + validationDataLocation + "1000G.phase3.broad.withGenotypes.chr20.1MB.vcf",
|
||||
1,
|
||||
Arrays.asList("80d0eedddd215df8ab29bde908c73ca4"));
|
||||
Arrays.asList("75c4b1baba3071047e4629c8e81ddea1"));
|
||||
executeTest("testUsingDiscoveredAF", spec);
|
||||
}
|
||||
|
||||
|
|
@ -75,7 +79,7 @@ public class CalculateGenotypePosteriorsIntegrationTest extends WalkerTest {
|
|||
" -L 20:10,000,000-10,100,000" +
|
||||
" -V " + validationDataLocation + "1000G.phase3.broad.withGenotypes.chr20.1MB.vcf",
|
||||
1,
|
||||
Arrays.asList("f7653ef21b859b90d54a71ef4245ec85"));
|
||||
Arrays.asList("5a2d7481fc69b97033f8c3ae9c0bdd20"));
|
||||
executeTest("testMissingPriors", spec);
|
||||
}
|
||||
|
||||
|
|
@ -87,10 +91,24 @@ public class CalculateGenotypePosteriorsIntegrationTest extends WalkerTest {
|
|||
" -R " + b37KGReference +
|
||||
" -L 20:10,000,000-10,100,000" +
|
||||
" -V " + validationDataLocation + "NA12878.Jan2013.haplotypeCaller.subset.indels.vcf" +
|
||||
" -VV " + validationDataLocation + "1000G.phase3.broad.withGenotypes.chr20.1MB.vcf",
|
||||
" -supporting " + validationDataLocation + "1000G.phase3.broad.withGenotypes.chr20.1MB.vcf",
|
||||
1,
|
||||
Arrays.asList("6dd7dcf94bfe99ddcbd477100592db89"));
|
||||
executeTest("testMissingPriors", spec);
|
||||
Arrays.asList("7876c43e9fc13723bd890b8adc5d053d"));
|
||||
executeTest("testInputINDELs", spec);
|
||||
}
|
||||
|
||||
@Test(enabled = true)
|
||||
public void testFamilyPriors() {
|
||||
WalkerTestSpec spec = new WalkerTestSpec(
|
||||
"-T CalculateGenotypePosteriors --no_cmdline_in_header" +
|
||||
" -o %s" +
|
||||
" -R " + b37KGReference +
|
||||
" -ped " + CEUtrioFamilyFile +
|
||||
" -V " + CEUtrioTest +
|
||||
" -supporting " + CEUtrioPopPriorsTest,
|
||||
1,
|
||||
Arrays.asList("a22c81f0609c9f43578054661797395b"));
|
||||
executeTest("testFamilyPriors", spec);
|
||||
}
|
||||
|
||||
|
||||
|
|
|
|||
|
|
@ -69,6 +69,7 @@ import java.util.List;
|
|||
public class PosteriorLikelihoodsUtilsUnitTest extends BaseTest {
|
||||
|
||||
Allele Aref, T, C, G, Cref, ATC, ATCATC;
|
||||
private final String PHRED_SCALED_POSTERIORS_KEY = "PP";
|
||||
|
||||
@BeforeSuite
|
||||
public void setup() {
|
||||
|
|
@ -142,10 +143,10 @@ public class PosteriorLikelihoodsUtilsUnitTest extends BaseTest {
|
|||
Assert.assertTrue(test1exp1.hasPL());
|
||||
Genotype test1exp2 = makeGwithPLs("s2",T,T,new double[]{-6.000066, -3.823938, -6.557894e-05});
|
||||
Genotype test1exp3 = makeGwithPLs("s3",Aref,Aref,new double[]{-0.0006510083, -2.824524, -9.000651});
|
||||
Assert.assertEquals("java.util.ArrayList",test1result.getGenotype(0).getAnyAttribute(VCFConstants.GENOTYPE_POSTERIORS_KEY).getClass().getCanonicalName());
|
||||
Assert.assertEquals(arraysEq(test1exp1.getPL(), _mleparse((List<Integer>)test1result.getGenotype(0).getAnyAttribute(VCFConstants.GENOTYPE_POSTERIORS_KEY))), "");
|
||||
Assert.assertEquals(arraysEq(test1exp2.getPL(),_mleparse((List<Integer>)test1result.getGenotype(1).getAnyAttribute(VCFConstants.GENOTYPE_POSTERIORS_KEY))), "");
|
||||
Assert.assertEquals(arraysEq(test1exp3.getPL(),_mleparse((List<Integer>)test1result.getGenotype(2).getAnyAttribute(VCFConstants.GENOTYPE_POSTERIORS_KEY))), "");
|
||||
Assert.assertEquals("java.util.ArrayList",test1result.getGenotype(0).getAnyAttribute(PHRED_SCALED_POSTERIORS_KEY).getClass().getCanonicalName());
|
||||
Assert.assertEquals(arraysEq(test1exp1.getPL(), _mleparse((List<Integer>)test1result.getGenotype(0).getAnyAttribute(PHRED_SCALED_POSTERIORS_KEY))), "");
|
||||
Assert.assertEquals(arraysEq(test1exp2.getPL(),_mleparse((List<Integer>)test1result.getGenotype(1).getAnyAttribute(PHRED_SCALED_POSTERIORS_KEY))), "");
|
||||
Assert.assertEquals(arraysEq(test1exp3.getPL(),_mleparse((List<Integer>)test1result.getGenotype(2).getAnyAttribute(PHRED_SCALED_POSTERIORS_KEY))), "");
|
||||
|
||||
// AA AB BB AC BC CC
|
||||
// AA AC CC AT CT TT
|
||||
|
|
@ -160,10 +161,10 @@ public class PosteriorLikelihoodsUtilsUnitTest extends BaseTest {
|
|||
Genotype test2exp2 = makeGwithPLs("s2",Aref,C,new double[]{-3.609957, -0.007723248, -1.785778, -3.007723, -4.660767, -8.785778});
|
||||
Genotype test2exp3 = makeGwithPLs("s3",Aref,Aref,new double[] {-0.06094877, -0.9587151, -2.03677,-1.958715, -3.111759, -5.23677});
|
||||
Genotype test2exp4 = makeGwithPLs("s4",C,T,new double[]{-7.016534, -3.4143, -1.392355, -1.4143, -0.06734388, -1.192355});
|
||||
Assert.assertEquals(arraysEq(test2exp1.getPL(),(int[]) _mleparse((List<Integer>)test2result.getGenotype(0).getAnyAttribute(VCFConstants.GENOTYPE_POSTERIORS_KEY))), "");
|
||||
Assert.assertEquals(arraysEq(test2exp2.getPL(),(int[]) _mleparse((List<Integer>)test2result.getGenotype(1).getAnyAttribute(VCFConstants.GENOTYPE_POSTERIORS_KEY))), "");
|
||||
Assert.assertEquals(arraysEq(test2exp3.getPL(),(int[]) _mleparse((List<Integer>)test2result.getGenotype(2).getAnyAttribute(VCFConstants.GENOTYPE_POSTERIORS_KEY))), "");
|
||||
Assert.assertEquals(arraysEq(test2exp4.getPL(),(int[]) _mleparse((List<Integer>)test2result.getGenotype(3).getAnyAttribute(VCFConstants.GENOTYPE_POSTERIORS_KEY))), "");
|
||||
Assert.assertEquals(arraysEq(test2exp1.getPL(),(int[]) _mleparse((List<Integer>)test2result.getGenotype(0).getAnyAttribute(PHRED_SCALED_POSTERIORS_KEY))), "");
|
||||
Assert.assertEquals(arraysEq(test2exp2.getPL(),(int[]) _mleparse((List<Integer>)test2result.getGenotype(1).getAnyAttribute(PHRED_SCALED_POSTERIORS_KEY))), "");
|
||||
Assert.assertEquals(arraysEq(test2exp3.getPL(),(int[]) _mleparse((List<Integer>)test2result.getGenotype(2).getAnyAttribute(PHRED_SCALED_POSTERIORS_KEY))), "");
|
||||
Assert.assertEquals(arraysEq(test2exp4.getPL(),(int[]) _mleparse((List<Integer>)test2result.getGenotype(3).getAnyAttribute(PHRED_SCALED_POSTERIORS_KEY))), "");
|
||||
}
|
||||
|
||||
@Test
|
||||
|
|
@ -182,15 +183,15 @@ public class PosteriorLikelihoodsUtilsUnitTest extends BaseTest {
|
|||
Genotype test1exp1 = makeGwithPLs("t1",T,T,new double[]{-3.370985, -1.415172, -0.01721766});
|
||||
Genotype test1exp2 = makeGwithPLs("t2",Aref,T,new double[]{-1.763792, -0.007978791, -3.010024});
|
||||
Genotype test1exp3 = makeGwithPLs("t3",Aref,T,new double[]{-2.165587, -0.009773643, -1.811819});
|
||||
Assert.assertEquals(arraysEq(test1exp1.getPL(),_mleparse((List<Integer>) test1result.getGenotype(0).getAnyAttribute(VCFConstants.GENOTYPE_POSTERIORS_KEY))), "");
|
||||
Assert.assertEquals(arraysEq(test1exp2.getPL(),_mleparse((List<Integer>) test1result.getGenotype(1).getAnyAttribute(VCFConstants.GENOTYPE_POSTERIORS_KEY))), "");
|
||||
Assert.assertEquals(arraysEq(test1exp3.getPL(),_mleparse((List<Integer>) test1result.getGenotype(2).getAnyAttribute(VCFConstants.GENOTYPE_POSTERIORS_KEY))), "");
|
||||
Assert.assertEquals(arraysEq(test1exp1.getPL(),_mleparse((List<Integer>) test1result.getGenotype(0).getAnyAttribute(PHRED_SCALED_POSTERIORS_KEY))), "");
|
||||
Assert.assertEquals(arraysEq(test1exp2.getPL(),_mleparse((List<Integer>) test1result.getGenotype(1).getAnyAttribute(PHRED_SCALED_POSTERIORS_KEY))), "");
|
||||
Assert.assertEquals(arraysEq(test1exp3.getPL(),_mleparse((List<Integer>) test1result.getGenotype(2).getAnyAttribute(PHRED_SCALED_POSTERIORS_KEY))), "");
|
||||
|
||||
VariantContext testNonOverlapping = makeVC("1", Arrays.asList(Aref,T), makeG("s1",T,T,3,1,0));
|
||||
List<VariantContext> other = Arrays.asList(makeVC("2",Arrays.asList(Aref,C),makeG("s2",C,C,10,2,0)));
|
||||
VariantContext test2result = PosteriorLikelihoodsUtils.calculatePosteriorGLs(testNonOverlapping,other,0,0.001,true,false,true);
|
||||
Genotype test2exp1 = makeGwithPLs("SGV",T,T,new double[]{-4.078345, -3.276502, -0.0002661066});
|
||||
Assert.assertEquals(arraysEq(test2exp1.getPL(),_mleparse((List<Integer>) test2result.getGenotype(0).getAnyAttribute(VCFConstants.GENOTYPE_POSTERIORS_KEY))), "");
|
||||
Assert.assertEquals(arraysEq(test2exp1.getPL(),_mleparse((List<Integer>) test2result.getGenotype(0).getAnyAttribute(PHRED_SCALED_POSTERIORS_KEY))), "");
|
||||
}
|
||||
|
||||
@Test
|
||||
|
|
@ -200,7 +201,7 @@ public class PosteriorLikelihoodsUtilsUnitTest extends BaseTest {
|
|||
supplTest1.add(new VariantContextBuilder(makeVC("2",Arrays.asList(Aref,T))).attribute(VCFConstants.MLE_ALLELE_COUNT_KEY,500).attribute(VCFConstants.ALLELE_NUMBER_KEY,1000).make());
|
||||
VariantContext test1result = PosteriorLikelihoodsUtils.calculatePosteriorGLs(testOverlappingBase,supplTest1,0,0.001,true,false,true);
|
||||
|
||||
int[] GP = _mleparse( (List<Integer>)test1result.getGenotype(0).getAnyAttribute(VCFConstants.GENOTYPE_POSTERIORS_KEY));
|
||||
int[] GP = _mleparse( (List<Integer>)test1result.getGenotype(0).getAnyAttribute(PHRED_SCALED_POSTERIORS_KEY));
|
||||
Assert.assertTrue(GP[2] > GP[1]);
|
||||
}
|
||||
|
||||
|
|
@ -211,7 +212,7 @@ public class PosteriorLikelihoodsUtilsUnitTest extends BaseTest {
|
|||
supplTest1.add(new VariantContextBuilder(makeVC("2",Arrays.asList(Aref,T))).attribute(VCFConstants.MLE_ALLELE_COUNT_KEY,900).attribute(VCFConstants.ALLELE_NUMBER_KEY,1000).make());
|
||||
VariantContext test1result = PosteriorLikelihoodsUtils.calculatePosteriorGLs(testOverlappingBase,supplTest1,0,0.001,true,false,true);
|
||||
|
||||
int[] GP = _mleparse( (List<Integer>)test1result.getGenotype(0).getAnyAttribute(VCFConstants.GENOTYPE_POSTERIORS_KEY));
|
||||
int[] GP = _mleparse( (List<Integer>)test1result.getGenotype(0).getAnyAttribute(PHRED_SCALED_POSTERIORS_KEY));
|
||||
Assert.assertTrue(GP[2] < GP[1]);
|
||||
}
|
||||
|
||||
|
|
@ -222,7 +223,7 @@ public class PosteriorLikelihoodsUtilsUnitTest extends BaseTest {
|
|||
supplTest1.add(new VariantContextBuilder(makeVC("2",Arrays.asList(Aref,T))).attribute(VCFConstants.MLE_ALLELE_COUNT_KEY,500).attribute(VCFConstants.ALLELE_NUMBER_KEY,1000).make());
|
||||
VariantContext test1result = PosteriorLikelihoodsUtils.calculatePosteriorGLs(testOverlappingBase,supplTest1,0,0.001,true,false,true);
|
||||
|
||||
int[] GP = _mleparse( (List<Integer>)test1result.getGenotype(0).getAnyAttribute(VCFConstants.GENOTYPE_POSTERIORS_KEY));
|
||||
int[] GP = _mleparse( (List<Integer>)test1result.getGenotype(0).getAnyAttribute(PHRED_SCALED_POSTERIORS_KEY));
|
||||
Assert.assertTrue(GP[0] > GP[1]);
|
||||
}
|
||||
|
||||
|
|
@ -233,7 +234,7 @@ public class PosteriorLikelihoodsUtilsUnitTest extends BaseTest {
|
|||
supplTest1.add(new VariantContextBuilder(makeVC("2",Arrays.asList(Aref,T))).attribute(VCFConstants.MLE_ALLELE_COUNT_KEY,100).attribute(VCFConstants.ALLELE_NUMBER_KEY,1000).make());
|
||||
VariantContext test1result = PosteriorLikelihoodsUtils.calculatePosteriorGLs(testOverlappingBase,supplTest1,0,0.001,true,false,true);
|
||||
|
||||
int[] GP = _mleparse( (List<Integer>)test1result.getGenotype(0).getAnyAttribute(VCFConstants.GENOTYPE_POSTERIORS_KEY));
|
||||
int[] GP = _mleparse( (List<Integer>)test1result.getGenotype(0).getAnyAttribute(PHRED_SCALED_POSTERIORS_KEY));
|
||||
Assert.assertTrue(GP[0] < GP[1]);
|
||||
}
|
||||
|
||||
|
|
@ -298,7 +299,7 @@ public class PosteriorLikelihoodsUtilsUnitTest extends BaseTest {
|
|||
VariantContext test1result = PosteriorLikelihoodsUtils.calculatePosteriorGLs(inputIndel,supplTest1,0,0.001,true,false,true);
|
||||
|
||||
System.out.println(test1result);
|
||||
int[] GPs = _mleparse( (List<Integer>)test1result.getGenotype(0).getAnyAttribute(VCFConstants.GENOTYPE_POSTERIORS_KEY));
|
||||
int[] GPs = _mleparse( (List<Integer>)test1result.getGenotype(0).getAnyAttribute(PHRED_SCALED_POSTERIORS_KEY));
|
||||
int[] PLs = test1result.getGenotype(0).getPL();
|
||||
Assert.assertEquals(PLs,GPs);
|
||||
}
|
||||
|
|
@ -314,7 +315,7 @@ public class PosteriorLikelihoodsUtilsUnitTest extends BaseTest {
|
|||
|
||||
|
||||
System.out.println(test1result);
|
||||
int[] GPs = _mleparse( (List<Integer>)test1result.getGenotype(0).getAnyAttribute(VCFConstants.GENOTYPE_POSTERIORS_KEY));
|
||||
int[] GPs = _mleparse( (List<Integer>)test1result.getGenotype(0).getAnyAttribute(PHRED_SCALED_POSTERIORS_KEY));
|
||||
int[] PLs = test1result.getGenotype(0).getPL();
|
||||
Assert.assertEquals(PLs,GPs);
|
||||
}
|
||||
|
|
|
|||
Loading…
Reference in New Issue