();
diff --git a/protected/java/src/org/broadinstitute/sting/gatk/walkers/diagnostics/missing/Metrics.java b/protected/java/src/org/broadinstitute/sting/gatk/walkers/diagnostics/missing/Metrics.java
new file mode 100644
index 000000000..5e3da5f4f
--- /dev/null
+++ b/protected/java/src/org/broadinstitute/sting/gatk/walkers/diagnostics/missing/Metrics.java
@@ -0,0 +1,110 @@
+/*
+* By downloading the PROGRAM you agree to the following terms of use:
+*
+* BROAD INSTITUTE - SOFTWARE LICENSE AGREEMENT - FOR ACADEMIC NON-COMMERCIAL RESEARCH PURPOSES ONLY
+*
+* This Agreement is made between the Broad Institute, Inc. with a principal address at 7 Cambridge Center, Cambridge, MA 02142 (BROAD) and the LICENSEE and is effective at the date the downloading is completed (EFFECTIVE DATE).
+*
+* WHEREAS, LICENSEE desires to license the PROGRAM, as defined hereinafter, and BROAD wishes to have this PROGRAM utilized in the public interest, subject only to the royalty-free, nonexclusive, nontransferable license rights of the United States Government pursuant to 48 CFR 52.227-14; and
+* WHEREAS, LICENSEE desires to license the PROGRAM and BROAD desires to grant a license on the following terms and conditions.
+* NOW, THEREFORE, in consideration of the promises and covenants made herein, the parties hereto agree as follows:
+*
+* 1. DEFINITIONS
+* 1.1 PROGRAM shall mean copyright in the object code and source code known as GATK2 and related documentation, if any, as they exist on the EFFECTIVE DATE and can be downloaded from http://www.broadinstitute/GATK on the EFFECTIVE DATE.
+*
+* 2. LICENSE
+* 2.1 Grant. Subject to the terms of this Agreement, BROAD hereby grants to LICENSEE, solely for academic non-commercial research purposes, a non-exclusive, non-transferable license to: (a) download, execute and display the PROGRAM and (b) create bug fixes and modify the PROGRAM.
+* The LICENSEE may apply the PROGRAM in a pipeline to data owned by users other than the LICENSEE and provide these users the results of the PROGRAM provided LICENSEE does so for academic non-commercial purposes only. For clarification purposes, academic sponsored research is not a commercial use under the terms of this Agreement.
+* 2.2 No Sublicensing or Additional Rights. LICENSEE shall not sublicense or distribute the PROGRAM, in whole or in part, without prior written permission from BROAD. LICENSEE shall ensure that all of its users agree to the terms of this Agreement. LICENSEE further agrees that it shall not put the PROGRAM on a network, server, or other similar technology that may be accessed by anyone other than the LICENSEE and its employees and users who have agreed to the terms of this agreement.
+* 2.3 License Limitations. Nothing in this Agreement shall be construed to confer any rights upon LICENSEE by implication, estoppel, or otherwise to any computer software, trademark, intellectual property, or patent rights of BROAD, or of any other entity, except as expressly granted herein. LICENSEE agrees that the PROGRAM, in whole or part, shall not be used for any commercial purpose, including without limitation, as the basis of a commercial software or hardware product or to provide services. LICENSEE further agrees that the PROGRAM shall not be copied or otherwise adapted in order to circumvent the need for obtaining a license for use of the PROGRAM.
+*
+* 3. OWNERSHIP OF INTELLECTUAL PROPERTY
+* LICENSEE acknowledges that title to the PROGRAM shall remain with BROAD. The PROGRAM is marked with the following BROAD copyright notice and notice of attribution to contributors. LICENSEE shall retain such notice on all copies. LICENSEE agrees to include appropriate attribution if any results obtained from use of the PROGRAM are included in any publication.
+* Copyright 2012 Broad Institute, Inc.
+* Notice of attribution: The GATK2 program was made available through the generosity of Medical and Population Genetics program at the Broad Institute, Inc.
+* LICENSEE shall not use any trademark or trade name of BROAD, or any variation, adaptation, or abbreviation, of such marks or trade names, or any names of officers, faculty, students, employees, or agents of BROAD except as states above for attribution purposes.
+*
+* 4. INDEMNIFICATION
+* LICENSEE shall indemnify, defend, and hold harmless BROAD, and their respective officers, faculty, students, employees, associated investigators and agents, and their respective successors, heirs and assigns, (Indemnitees), against any liability, damage, loss, or expense (including reasonable attorneys fees and expenses) incurred by or imposed upon any of the Indemnitees in connection with any claims, suits, actions, demands or judgments arising out of any theory of liability (including, without limitation, actions in the form of tort, warranty, or strict liability and regardless of whether such action has any factual basis) pursuant to any right or license granted under this Agreement.
+*
+* 5. NO REPRESENTATIONS OR WARRANTIES
+* THE PROGRAM IS DELIVERED AS IS. BROAD MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND CONCERNING THE PROGRAM OR THE COPYRIGHT, EXPRESS OR IMPLIED, INCLUDING, WITHOUT LIMITATION, WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, NONINFRINGEMENT, OR THE ABSENCE OF LATENT OR OTHER DEFECTS, WHETHER OR NOT DISCOVERABLE. BROAD EXTENDS NO WARRANTIES OF ANY KIND AS TO PROGRAM CONFORMITY WITH WHATEVER USER MANUALS OR OTHER LITERATURE MAY BE ISSUED FROM TIME TO TIME.
+* IN NO EVENT SHALL BROAD OR ITS RESPECTIVE DIRECTORS, OFFICERS, EMPLOYEES, AFFILIATED INVESTIGATORS AND AFFILIATES BE LIABLE FOR INCIDENTAL OR CONSEQUENTIAL DAMAGES OF ANY KIND, INCLUDING, WITHOUT LIMITATION, ECONOMIC DAMAGES OR INJURY TO PROPERTY AND LOST PROFITS, REGARDLESS OF WHETHER BROAD SHALL BE ADVISED, SHALL HAVE OTHER REASON TO KNOW, OR IN FACT SHALL KNOW OF THE POSSIBILITY OF THE FOREGOING.
+*
+* 6. ASSIGNMENT
+* This Agreement is personal to LICENSEE and any rights or obligations assigned by LICENSEE without the prior written consent of BROAD shall be null and void.
+*
+* 7. MISCELLANEOUS
+* 7.1 Export Control. LICENSEE gives assurance that it will comply with all United States export control laws and regulations controlling the export of the PROGRAM, including, without limitation, all Export Administration Regulations of the United States Department of Commerce. Among other things, these laws and regulations prohibit, or require a license for, the export of certain types of software to specified countries.
+* 7.2 Termination. LICENSEE shall have the right to terminate this Agreement for any reason upon prior written notice to BROAD. If LICENSEE breaches any provision hereunder, and fails to cure such breach within thirty (30) days, BROAD may terminate this Agreement immediately. Upon termination, LICENSEE shall provide BROAD with written assurance that the original and all copies of the PROGRAM have been destroyed, except that, upon prior written authorization from BROAD, LICENSEE may retain a copy for archive purposes.
+* 7.3 Survival. The following provisions shall survive the expiration or termination of this Agreement: Articles 1, 3, 4, 5 and Sections 2.2, 2.3, 7.3, and 7.4.
+* 7.4 Notice. Any notices under this Agreement shall be in writing, shall specifically refer to this Agreement, and shall be sent by hand, recognized national overnight courier, confirmed facsimile transmission, confirmed electronic mail, or registered or certified mail, postage prepaid, return receipt requested. All notices under this Agreement shall be deemed effective upon receipt.
+* 7.5 Amendment and Waiver; Entire Agreement. This Agreement may be amended, supplemented, or otherwise modified only by means of a written instrument signed by all parties. Any waiver of any rights or failure to act in a specific instance shall relate only to such instance and shall not be construed as an agreement to waive any rights or fail to act in any other instance, whether or not similar. This Agreement constitutes the entire agreement among the parties with respect to its subject matter and supersedes prior agreements or understandings between the parties relating to its subject matter.
+* 7.6 Binding Effect; Headings. This Agreement shall be binding upon and inure to the benefit of the parties and their respective permitted successors and assigns. All headings are for convenience only and shall not affect the meaning of any provision of this Agreement.
+* 7.7 Governing Law. This Agreement shall be construed, governed, interpreted and applied in accordance with the internal laws of the Commonwealth of Massachusetts, U.S.A., without regard to conflict of laws principles.
+*/
+
+package org.broadinstitute.sting.gatk.walkers.diagnostics.missing;
+
+/**
+ * Short one line description of the walker.
+ *
+ *
+ * [Long description of the walker]
+ *
+ *
+ *
+ * Input
+ *
+ * [Description of the Input]
+ *
+ *
+ * Output
+ *
+ * [Description of the Output]
+ *
+ *
+ * Examples
+ *
+ * java
+ * -jar GenomeAnalysisTK.jar
+ * -T [walker name]
+ *
+ *
+ * @author Mauricio Carneiro
+ * @since 5/1/13
+ */
+final class Metrics {
+ private double gccontent;
+ private double baseQual;
+ private double mapQual;
+ private int reads;
+ private int refs;
+
+ void reads(int reads) {this.reads = reads;}
+ void refs(int refs) {this.refs = refs;}
+
+ void gccontent(double gccontent) {this.gccontent = gccontent;}
+ void baseQual(double baseQual) {this.baseQual = baseQual;}
+ void mapQual(double mapQual) {this.mapQual = mapQual;}
+
+ double gccontent() {return refs > 0 ? gccontent/refs : 0.0;}
+ double baseQual() {return reads > 0 ? baseQual/reads : 0.0;}
+ double mapQual() {return reads > 0 ? mapQual/reads : 0.0;}
+
+ /**
+ * Combines two metrics
+ *
+ * @param value the other metric to combine
+ * @return itself, for simple reduce
+ */
+ public Metrics combine(Metrics value) {
+ this.gccontent += value.gccontent;
+ this.baseQual += value.baseQual;
+ this.mapQual += value.mapQual;
+ this.reads += value.reads;
+ this.refs += value.refs;
+
+ return this;
+ }
+}
diff --git a/protected/java/src/org/broadinstitute/sting/gatk/walkers/diagnostics/missing/QualifyMissingIntervals.java b/protected/java/src/org/broadinstitute/sting/gatk/walkers/diagnostics/missing/QualifyMissingIntervals.java
new file mode 100644
index 000000000..d0db3ef98
--- /dev/null
+++ b/protected/java/src/org/broadinstitute/sting/gatk/walkers/diagnostics/missing/QualifyMissingIntervals.java
@@ -0,0 +1,228 @@
+/*
+* By downloading the PROGRAM you agree to the following terms of use:
+*
+* BROAD INSTITUTE - SOFTWARE LICENSE AGREEMENT - FOR ACADEMIC NON-COMMERCIAL RESEARCH PURPOSES ONLY
+*
+* This Agreement is made between the Broad Institute, Inc. with a principal address at 7 Cambridge Center, Cambridge, MA 02142 (BROAD) and the LICENSEE and is effective at the date the downloading is completed (EFFECTIVE DATE).
+*
+* WHEREAS, LICENSEE desires to license the PROGRAM, as defined hereinafter, and BROAD wishes to have this PROGRAM utilized in the public interest, subject only to the royalty-free, nonexclusive, nontransferable license rights of the United States Government pursuant to 48 CFR 52.227-14; and
+* WHEREAS, LICENSEE desires to license the PROGRAM and BROAD desires to grant a license on the following terms and conditions.
+* NOW, THEREFORE, in consideration of the promises and covenants made herein, the parties hereto agree as follows:
+*
+* 1. DEFINITIONS
+* 1.1 PROGRAM shall mean copyright in the object code and source code known as GATK2 and related documentation, if any, as they exist on the EFFECTIVE DATE and can be downloaded from http://www.broadinstitute/GATK on the EFFECTIVE DATE.
+*
+* 2. LICENSE
+* 2.1 Grant. Subject to the terms of this Agreement, BROAD hereby grants to LICENSEE, solely for academic non-commercial research purposes, a non-exclusive, non-transferable license to: (a) download, execute and display the PROGRAM and (b) create bug fixes and modify the PROGRAM.
+* The LICENSEE may apply the PROGRAM in a pipeline to data owned by users other than the LICENSEE and provide these users the results of the PROGRAM provided LICENSEE does so for academic non-commercial purposes only. For clarification purposes, academic sponsored research is not a commercial use under the terms of this Agreement.
+* 2.2 No Sublicensing or Additional Rights. LICENSEE shall not sublicense or distribute the PROGRAM, in whole or in part, without prior written permission from BROAD. LICENSEE shall ensure that all of its users agree to the terms of this Agreement. LICENSEE further agrees that it shall not put the PROGRAM on a network, server, or other similar technology that may be accessed by anyone other than the LICENSEE and its employees and users who have agreed to the terms of this agreement.
+* 2.3 License Limitations. Nothing in this Agreement shall be construed to confer any rights upon LICENSEE by implication, estoppel, or otherwise to any computer software, trademark, intellectual property, or patent rights of BROAD, or of any other entity, except as expressly granted herein. LICENSEE agrees that the PROGRAM, in whole or part, shall not be used for any commercial purpose, including without limitation, as the basis of a commercial software or hardware product or to provide services. LICENSEE further agrees that the PROGRAM shall not be copied or otherwise adapted in order to circumvent the need for obtaining a license for use of the PROGRAM.
+*
+* 3. OWNERSHIP OF INTELLECTUAL PROPERTY
+* LICENSEE acknowledges that title to the PROGRAM shall remain with BROAD. The PROGRAM is marked with the following BROAD copyright notice and notice of attribution to contributors. LICENSEE shall retain such notice on all copies. LICENSEE agrees to include appropriate attribution if any results obtained from use of the PROGRAM are included in any publication.
+* Copyright 2012 Broad Institute, Inc.
+* Notice of attribution: The GATK2 program was made available through the generosity of Medical and Population Genetics program at the Broad Institute, Inc.
+* LICENSEE shall not use any trademark or trade name of BROAD, or any variation, adaptation, or abbreviation, of such marks or trade names, or any names of officers, faculty, students, employees, or agents of BROAD except as states above for attribution purposes.
+*
+* 4. INDEMNIFICATION
+* LICENSEE shall indemnify, defend, and hold harmless BROAD, and their respective officers, faculty, students, employees, associated investigators and agents, and their respective successors, heirs and assigns, (Indemnitees), against any liability, damage, loss, or expense (including reasonable attorneys fees and expenses) incurred by or imposed upon any of the Indemnitees in connection with any claims, suits, actions, demands or judgments arising out of any theory of liability (including, without limitation, actions in the form of tort, warranty, or strict liability and regardless of whether such action has any factual basis) pursuant to any right or license granted under this Agreement.
+*
+* 5. NO REPRESENTATIONS OR WARRANTIES
+* THE PROGRAM IS DELIVERED AS IS. BROAD MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND CONCERNING THE PROGRAM OR THE COPYRIGHT, EXPRESS OR IMPLIED, INCLUDING, WITHOUT LIMITATION, WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, NONINFRINGEMENT, OR THE ABSENCE OF LATENT OR OTHER DEFECTS, WHETHER OR NOT DISCOVERABLE. BROAD EXTENDS NO WARRANTIES OF ANY KIND AS TO PROGRAM CONFORMITY WITH WHATEVER USER MANUALS OR OTHER LITERATURE MAY BE ISSUED FROM TIME TO TIME.
+* IN NO EVENT SHALL BROAD OR ITS RESPECTIVE DIRECTORS, OFFICERS, EMPLOYEES, AFFILIATED INVESTIGATORS AND AFFILIATES BE LIABLE FOR INCIDENTAL OR CONSEQUENTIAL DAMAGES OF ANY KIND, INCLUDING, WITHOUT LIMITATION, ECONOMIC DAMAGES OR INJURY TO PROPERTY AND LOST PROFITS, REGARDLESS OF WHETHER BROAD SHALL BE ADVISED, SHALL HAVE OTHER REASON TO KNOW, OR IN FACT SHALL KNOW OF THE POSSIBILITY OF THE FOREGOING.
+*
+* 6. ASSIGNMENT
+* This Agreement is personal to LICENSEE and any rights or obligations assigned by LICENSEE without the prior written consent of BROAD shall be null and void.
+*
+* 7. MISCELLANEOUS
+* 7.1 Export Control. LICENSEE gives assurance that it will comply with all United States export control laws and regulations controlling the export of the PROGRAM, including, without limitation, all Export Administration Regulations of the United States Department of Commerce. Among other things, these laws and regulations prohibit, or require a license for, the export of certain types of software to specified countries.
+* 7.2 Termination. LICENSEE shall have the right to terminate this Agreement for any reason upon prior written notice to BROAD. If LICENSEE breaches any provision hereunder, and fails to cure such breach within thirty (30) days, BROAD may terminate this Agreement immediately. Upon termination, LICENSEE shall provide BROAD with written assurance that the original and all copies of the PROGRAM have been destroyed, except that, upon prior written authorization from BROAD, LICENSEE may retain a copy for archive purposes.
+* 7.3 Survival. The following provisions shall survive the expiration or termination of this Agreement: Articles 1, 3, 4, 5 and Sections 2.2, 2.3, 7.3, and 7.4.
+* 7.4 Notice. Any notices under this Agreement shall be in writing, shall specifically refer to this Agreement, and shall be sent by hand, recognized national overnight courier, confirmed facsimile transmission, confirmed electronic mail, or registered or certified mail, postage prepaid, return receipt requested. All notices under this Agreement shall be deemed effective upon receipt.
+* 7.5 Amendment and Waiver; Entire Agreement. This Agreement may be amended, supplemented, or otherwise modified only by means of a written instrument signed by all parties. Any waiver of any rights or failure to act in a specific instance shall relate only to such instance and shall not be construed as an agreement to waive any rights or fail to act in any other instance, whether or not similar. This Agreement constitutes the entire agreement among the parties with respect to its subject matter and supersedes prior agreements or understandings between the parties relating to its subject matter.
+* 7.6 Binding Effect; Headings. This Agreement shall be binding upon and inure to the benefit of the parties and their respective permitted successors and assigns. All headings are for convenience only and shall not affect the meaning of any provision of this Agreement.
+* 7.7 Governing Law. This Agreement shall be construed, governed, interpreted and applied in accordance with the internal laws of the Commonwealth of Massachusetts, U.S.A., without regard to conflict of laws principles.
+*/
+
+package org.broadinstitute.sting.gatk.walkers.diagnostics.missing;
+
+import org.broadinstitute.sting.commandline.Argument;
+import org.broadinstitute.sting.commandline.Output;
+import org.broadinstitute.sting.gatk.CommandLineGATK;
+import org.broadinstitute.sting.gatk.contexts.AlignmentContext;
+import org.broadinstitute.sting.gatk.contexts.ReferenceContext;
+import org.broadinstitute.sting.gatk.refdata.RefMetaDataTracker;
+import org.broadinstitute.sting.gatk.report.GATKReport;
+import org.broadinstitute.sting.gatk.walkers.By;
+import org.broadinstitute.sting.gatk.walkers.DataSource;
+import org.broadinstitute.sting.gatk.walkers.LocusWalker;
+import org.broadinstitute.sting.gatk.walkers.NanoSchedulable;
+import org.broadinstitute.sting.utils.GenomeLoc;
+import org.broadinstitute.sting.utils.GenomeLocParser;
+import org.broadinstitute.sting.utils.GenomeLocSortedSet;
+import org.broadinstitute.sting.utils.collections.Pair;
+import org.broadinstitute.sting.utils.exceptions.UserException;
+import org.broadinstitute.sting.utils.help.DocumentedGATKFeature;
+import org.broadinstitute.sting.utils.help.HelpConstants;
+import org.broadinstitute.sting.utils.pileup.ReadBackedPileup;
+import org.broadinstitute.sting.utils.text.XReadLines;
+
+import java.io.File;
+import java.io.FileNotFoundException;
+import java.io.PrintStream;
+import java.util.List;
+
+/**
+ * Walks along reference and calculates a few metrics for each interval.
+ *
+ * Metrics:
+ *
+ * - Average Base Quality
+ * - Average Mapping Quality
+ * - GC Content
+ * - Position in the target
+ * - Coding Sequence / Intron
+ * - Length of the uncovered area
+ *
+ *
+ * Input
+ *
+ * A reference file (for GC content), the input bam file (for base and mapping quality calculation), the missing intervals (in the -L), the baits/targets used to sequence (in the -targets) and a bed file with the coding sequence intervals of the genome (in the -cds)
+ *
+ *
+ * Output
+ *
+ * GC content calculations per interval.
+ *
+ *
+ * Example
+ *
+ * java -Xmx2g -jar GenomeAnalysisTK.jar \
+ * -T QualifyMissingIntervals \
+ * -R ref.fasta \
+ * -I input.bam \
+ * -o output.grp \
+ * -L input.intervals \
+ * -cds cds.intervals \
+ * -targets targets.intervals
+ *
+ *
+ */
+@DocumentedGATKFeature( groupName = HelpConstants.DOCS_CAT_QC, extraDocs = {CommandLineGATK.class} )
+@By(DataSource.REFERENCE)
+public final class QualifyMissingIntervals extends LocusWalker implements NanoSchedulable {
+ @Output
+ protected PrintStream out;
+
+ @Argument(shortName = "targets", required = true)
+ public File targetsFile;
+
+ @Argument(shortName = "cds", required = false)
+ public File cdsFile = null;
+
+ GATKReport simpleReport;
+ GenomeLocSortedSet target;
+ GenomeLocSortedSet cds;
+
+ public boolean isReduceByInterval() {
+ return true;
+ }
+
+ public void initialize() {
+ simpleReport = GATKReport.newSimpleReport("QualifyMissingIntervals", "IN", "GC", "BQ", "MQ", "TP", "CD", "LN");
+ final GenomeLocParser parser = getToolkit().getGenomeLocParser();
+ target = new GenomeLocSortedSet(parser);
+ cds = new GenomeLocSortedSet(parser);
+ parseFile(targetsFile, target, parser);
+ if (cdsFile != null)
+ parseFile(cdsFile, cds, parser);
+ }
+
+ public Metrics reduceInit() {
+ return new Metrics();
+ }
+
+ public Metrics map(RefMetaDataTracker tracker, ReferenceContext ref, AlignmentContext context) {
+ if (tracker == null)
+ return null;
+
+ final Metrics metrics = new Metrics();
+ final byte baseIndex = ref.getBase();
+ final ReadBackedPileup pileup = context.getBasePileup();
+ final int nBases = pileup.getNumberOfElements();
+
+ double baseQual = 0.0;
+ for (byte qual : pileup.getQuals()) {
+ baseQual += qual;
+ }
+ double mapQual = 0.0;
+ for (byte qual : pileup.getMappingQuals()) {
+ mapQual += qual;
+ }
+
+ metrics.baseQual(baseQual);
+ metrics.mapQual(mapQual);
+ metrics.gccontent(baseIndex == 'C' || baseIndex == 'G' ? 1.0 : 0.0);
+ metrics.reads(nBases);
+ metrics.refs(1);
+
+ return metrics;
+ }
+
+ @Override
+ public Metrics reduce(Metrics value, Metrics sum) {
+ return sum.combine(value);
+ }
+
+ public void onTraversalDone(List> results) {
+ for (Pair r : results) {
+ GenomeLoc interval = r.getFirst();
+ Metrics metrics = r.getSecond();
+ simpleReport.addRow(
+ interval.toString(),
+ metrics.gccontent(),
+ metrics.baseQual(),
+ metrics.mapQual(),
+ getPositionInTarget(interval),
+ cds.overlaps(interval),
+ interval.size()
+ );
+ }
+ simpleReport.print(out);
+ out.close();
+ }
+
+ private static GenomeLoc parseInterval(String s, GenomeLocParser parser) {
+ if (s.isEmpty()) {
+ return null;
+ }
+ String[] first = s.split(":");
+ if (first.length == 2) {
+ String[] second = first[1].split("\\-");
+ return parser.createGenomeLoc(first[0], Integer.decode(second[0]), Integer.decode(second[1]));
+ } else {
+ throw new UserException.BadInput("Interval doesn't parse correctly: " + s);
+ }
+ }
+
+ private void parseFile(File file, GenomeLocSortedSet set, GenomeLocParser parser) {
+ try {
+ for (String s : new XReadLines(file) ) {
+ GenomeLoc interval = parseInterval(s, parser);
+ if (interval != null)
+ set.add(interval, true);
+ }
+ } catch (FileNotFoundException e) {
+ e.printStackTrace();
+ }
+ }
+
+ private int getPositionInTarget(GenomeLoc interval) {
+ final List hits = target.getOverlapping(interval);
+ int result = 0;
+ for (GenomeLoc hit : hits) {
+ result = interval.getStart() - hit.getStart(); // if there are multiple hits, we'll get the last one.
+ }
+ return result;
+ }
+}
diff --git a/protected/java/src/org/broadinstitute/sting/gatk/walkers/genotyper/IndelGenotypeLikelihoodsCalculationModel.java b/protected/java/src/org/broadinstitute/sting/gatk/walkers/genotyper/IndelGenotypeLikelihoodsCalculationModel.java
index c6e9ea379..0f3f7739d 100644
--- a/protected/java/src/org/broadinstitute/sting/gatk/walkers/genotyper/IndelGenotypeLikelihoodsCalculationModel.java
+++ b/protected/java/src/org/broadinstitute/sting/gatk/walkers/genotyper/IndelGenotypeLikelihoodsCalculationModel.java
@@ -76,7 +76,8 @@ public class IndelGenotypeLikelihoodsCalculationModel extends GenotypeLikelihood
private List alleleList = new ArrayList();
- protected IndelGenotypeLikelihoodsCalculationModel(UnifiedArgumentCollection UAC, Logger logger) {
+ protected IndelGenotypeLikelihoodsCalculationModel(final UnifiedArgumentCollection UAC,
+ final Logger logger) {
super(UAC, logger);
pairModel = new PairHMMIndelErrorModel(UAC.INDEL_GAP_OPEN_PENALTY, UAC.INDEL_GAP_CONTINUATION_PENALTY,
UAC.OUTPUT_DEBUG_INDEL_INFO, UAC.pairHMM);
@@ -85,10 +86,11 @@ public class IndelGenotypeLikelihoodsCalculationModel extends GenotypeLikelihood
ignoreSNPAllelesWhenGenotypingIndels = UAC.IGNORE_SNP_ALLELES;
}
- protected static List computeConsensusAlleles(ReferenceContext ref,
- Map contexts,
- AlignmentContextUtils.ReadOrientation contextType,
- GenomeLocParser locParser, UnifiedArgumentCollection UAC) {
+ protected static List computeConsensusAlleles(final ReferenceContext ref,
+ final Map contexts,
+ final AlignmentContextUtils.ReadOrientation contextType,
+ final GenomeLocParser locParser,
+ final UnifiedArgumentCollection UAC) {
ConsensusAlleleCounter counter = new ConsensusAlleleCounter(locParser, true, UAC.MIN_INDEL_COUNT_FOR_GENOTYPING, UAC.MIN_INDEL_FRACTION_PER_SAMPLE);
return counter.computeConsensusAlleles(ref, contexts, contextType);
}
diff --git a/protected/java/src/org/broadinstitute/sting/gatk/walkers/genotyper/SNPGenotypeLikelihoodsCalculationModel.java b/protected/java/src/org/broadinstitute/sting/gatk/walkers/genotyper/SNPGenotypeLikelihoodsCalculationModel.java
index ce5f94478..360f88e51 100644
--- a/protected/java/src/org/broadinstitute/sting/gatk/walkers/genotyper/SNPGenotypeLikelihoodsCalculationModel.java
+++ b/protected/java/src/org/broadinstitute/sting/gatk/walkers/genotyper/SNPGenotypeLikelihoodsCalculationModel.java
@@ -147,9 +147,17 @@ public class SNPGenotypeLikelihoodsCalculationModel extends GenotypeLikelihoodsC
// if we only want variants, then we don't need to calculate genotype likelihoods
if ( UAC.OutputMode == UnifiedGenotyperEngine.OUTPUT_MODE.EMIT_VARIANTS_ONLY )
return builder.make();
+ // if user requires all PLs at all sites, add all possible alt alleles
+ else if (UAC.annotateAllSitesWithPLs) {
+ for ( final byte base : BaseUtils.BASES ) {
+ if ( base != refBase )
+ alleles.add(Allele.create(base));
+ }
+ }
- // otherwise, choose any alternate allele (it doesn't really matter)
- alleles.add(Allele.create(BaseUtils.baseIndexToSimpleBase(indexOfRefBase == 0 ? 1 : 0)));
+ else
+ // otherwise, choose any alternate allele (it doesn't really matter)
+ alleles.add(Allele.create(BaseUtils.baseIndexToSimpleBase(indexOfRefBase == 0 ? 1 : 0)));
}
}
diff --git a/protected/java/src/org/broadinstitute/sting/gatk/walkers/genotyper/UnifiedArgumentCollection.java b/protected/java/src/org/broadinstitute/sting/gatk/walkers/genotyper/UnifiedArgumentCollection.java
index e346b10b7..f156468cc 100644
--- a/protected/java/src/org/broadinstitute/sting/gatk/walkers/genotyper/UnifiedArgumentCollection.java
+++ b/protected/java/src/org/broadinstitute/sting/gatk/walkers/genotyper/UnifiedArgumentCollection.java
@@ -52,6 +52,9 @@ import org.broadinstitute.sting.utils.pairhmm.PairHMM;
import org.broadinstitute.sting.utils.variant.GATKVariantContextUtils;
import org.broadinstitute.variant.variantcontext.VariantContext;
+import java.util.Collections;
+import java.util.List;
+
public class UnifiedArgumentCollection extends StandardCallerArgumentCollection {
@Argument(fullName = "genotype_likelihoods_model", shortName = "glm", doc = "Genotype likelihoods calculation model to employ -- SNP is the default option, while INDEL is also available for calling indels and BOTH is available for calling both together", required = false)
@@ -82,7 +85,7 @@ public class UnifiedArgumentCollection extends StandardCallerArgumentCollection
* The PairHMM implementation to use for -glm INDEL genotype likelihood calculations. The various implementations balance a tradeoff of accuracy and runtime.
*/
@Argument(fullName = "pair_hmm_implementation", shortName = "pairHMM", doc = "The PairHMM implementation to use for -glm INDEL genotype likelihood calculations", required = false)
- public PairHMM.HMM_IMPLEMENTATION pairHMM = PairHMM.HMM_IMPLEMENTATION.ORIGINAL;
+ public PairHMM.HMM_IMPLEMENTATION pairHMM = PairHMM.HMM_IMPLEMENTATION.LOGLESS_CACHING;
/**
* The minimum confidence needed in a given base for it to be used in variant calling. Note that the base quality of a base
@@ -95,6 +98,18 @@ public class UnifiedArgumentCollection extends StandardCallerArgumentCollection
@Argument(fullName = "max_deletion_fraction", shortName = "deletions", doc = "Maximum fraction of reads with deletions spanning this locus for it to be callable [to disable, set to < 0 or > 1; default:0.05]", required = false)
public Double MAX_DELETION_FRACTION = 0.05;
+ /**
+ * Advanced, experimental argument: if SNP likelihood model is specified, and if EMIT_ALL_SITES output mode is set, when we set this argument then we will also emit PLs at all sites.
+ * This will give a measure of reference confidence and a measure of which alt alleles are more plausible (if any).
+ * WARNINGS:
+ * - This feature will inflate VCF file size considerably.
+ * - All SNP ALT alleles will be emitted with corresponding 10 PL values.
+ * - An error will be emitted if EMIT_ALL_SITES is not set, or if anything other than diploid SNP model is used
+ */
+ @Advanced
+ @Argument(fullName = "allSitePLs", shortName = "allSitePLs", doc = "Annotate all sites with PLs", required = false)
+ public boolean annotateAllSitesWithPLs = false;
+
// indel-related arguments
/**
* A candidate indel is genotyped (and potentially called) if there are this number of reads with a consensus indel at a site.
@@ -247,7 +262,7 @@ public class UnifiedArgumentCollection extends StandardCallerArgumentCollection
this.EXCLUDE_FILTERED_REFERENCE_SITES = uac.EXCLUDE_FILTERED_REFERENCE_SITES;
this.IGNORE_LANE_INFO = uac.IGNORE_LANE_INFO;
this.pairHMM = uac.pairHMM;
-
+ this.annotateAllSitesWithPLs = uac.annotateAllSitesWithPLs;
// todo- arguments to remove
this.IGNORE_SNP_ALLELES = uac.IGNORE_SNP_ALLELES;
}
diff --git a/protected/java/src/org/broadinstitute/sting/gatk/walkers/genotyper/UnifiedGenotyperEngine.java b/protected/java/src/org/broadinstitute/sting/gatk/walkers/genotyper/UnifiedGenotyperEngine.java
index 3380efcc9..9f3368cf8 100644
--- a/protected/java/src/org/broadinstitute/sting/gatk/walkers/genotyper/UnifiedGenotyperEngine.java
+++ b/protected/java/src/org/broadinstitute/sting/gatk/walkers/genotyper/UnifiedGenotyperEngine.java
@@ -83,6 +83,9 @@ public class UnifiedGenotyperEngine {
public static final double HUMAN_SNP_HETEROZYGOSITY = 1e-3;
public static final double HUMAN_INDEL_HETEROZYGOSITY = 1e-4;
+ private static final int SNP_MODEL = 0;
+ private static final int INDEL_MODEL = 1;
+
public enum OUTPUT_MODE {
/** produces calls only at variant sites */
EMIT_VARIANTS_ONLY,
@@ -165,6 +168,13 @@ public class UnifiedGenotyperEngine {
filter.add(LOW_QUAL_FILTER_NAME);
determineGLModelsToUse();
+
+ // do argument checking
+ if (UAC.annotateAllSitesWithPLs) {
+ if (!modelsToUse.contains(GenotypeLikelihoodsCalculationModel.Model.SNP))
+ throw new IllegalArgumentException("Invalid genotype likelihood model specification: Only diploid SNP model can be used in conjunction with option allSitePLs");
+
+ }
}
/**
@@ -436,7 +446,8 @@ public class UnifiedGenotyperEngine {
bestGuessIsRef = false;
}
// if in GENOTYPE_GIVEN_ALLELES mode, we still want to allow the use of a poor allele
- else if ( UAC.GenotypingMode == GenotypeLikelihoodsCalculationModel.GENOTYPING_MODE.GENOTYPE_GIVEN_ALLELES ) {
+ else if ( UAC.GenotypingMode == GenotypeLikelihoodsCalculationModel.GENOTYPING_MODE.GENOTYPE_GIVEN_ALLELES ||
+ UAC.annotateAllSitesWithPLs) {
myAlleles.add(alternateAllele);
alleleCountsofMLE.add(AFresult.getAlleleCountAtMLE(alternateAllele));
}
@@ -446,7 +457,7 @@ public class UnifiedGenotyperEngine {
// note the math.abs is necessary because -10 * 0.0 => -0.0 which isn't nice
final double phredScaledConfidence =
- Math.abs(! bestGuessIsRef || UAC.GenotypingMode == GenotypeLikelihoodsCalculationModel.GENOTYPING_MODE.GENOTYPE_GIVEN_ALLELES
+ Math.abs(! bestGuessIsRef || UAC.GenotypingMode == GenotypeLikelihoodsCalculationModel.GENOTYPING_MODE.GENOTYPE_GIVEN_ALLELES || UAC.annotateAllSitesWithPLs
? -10 * AFresult.getLog10PosteriorOfAFEq0()
: -10 * AFresult.getLog10PosteriorOfAFGT0());
@@ -540,11 +551,6 @@ public class UnifiedGenotyperEngine {
builder.attributes(attributes);
VariantContext vcCall = builder.make();
- // if we are subsetting alleles (either because there were too many or because some were not polymorphic)
- // then we may need to trim the alleles (because the original VariantContext may have had to pad at the end).
- if ( myAlleles.size() != vc.getAlleles().size() && !limitedContext ) // limitedContext callers need to handle allele trimming on their own to keep their perReadAlleleLikelihoodMap alleles in sync
- vcCall = GATKVariantContextUtils.reverseTrimAlleles(vcCall);
-
if ( annotationEngine != null && !limitedContext ) { // limitedContext callers need to handle annotations on their own by calling their own annotationEngine
// Note: we want to use the *unfiltered* and *unBAQed* context for the annotations
final ReadBackedPileup pileup = rawContext.getBasePileup();
@@ -553,6 +559,11 @@ public class UnifiedGenotyperEngine {
vcCall = annotationEngine.annotateContext(tracker, refContext, stratifiedContexts, vcCall, perReadAlleleLikelihoodMap);
}
+ // if we are subsetting alleles (either because there were too many or because some were not polymorphic)
+ // then we may need to trim the alleles (because the original VariantContext may have had to pad at the end).
+ if ( myAlleles.size() != vc.getAlleles().size() && !limitedContext ) // limitedContext callers need to handle allele trimming on their own to keep their perReadAlleleLikelihoodMap alleles in sync
+ vcCall = GATKVariantContextUtils.reverseTrimAlleles(vcCall);
+
return new VariantCallContext(vcCall, confidentlyCalled(phredScaledConfidence, PoFGT0));
}
@@ -693,13 +704,13 @@ public class UnifiedGenotyperEngine {
}
private void determineGLModelsToUse() {
-
String modelPrefix = "";
if ( !UAC.GLmodel.name().contains(GPSTRING) && UAC.samplePloidy != GATKVariantContextUtils.DEFAULT_PLOIDY )
modelPrefix = GPSTRING;
- if ( UAC.GLmodel.name().toUpperCase().contains("BOTH") ) {
- modelPrefix += UAC.GLmodel.name().toUpperCase().replaceAll("BOTH","");
+ // GGA mode => must initialize both the SNP and indel models
+ if ( UAC.GenotypingMode == GenotypeLikelihoodsCalculationModel.GENOTYPING_MODE.GENOTYPE_GIVEN_ALLELES ||
+ UAC.GLmodel.name().toUpperCase().contains("BOTH") ) {
modelsToUse.add(GenotypeLikelihoodsCalculationModel.Model.valueOf(modelPrefix+"SNP"));
modelsToUse.add(GenotypeLikelihoodsCalculationModel.Model.valueOf(modelPrefix+"INDEL"));
}
@@ -712,31 +723,24 @@ public class UnifiedGenotyperEngine {
private List getGLModelsToUse(final RefMetaDataTracker tracker,
final ReferenceContext refContext,
final AlignmentContext rawContext) {
-
if ( UAC.GenotypingMode != GenotypeLikelihoodsCalculationModel.GENOTYPING_MODE.GENOTYPE_GIVEN_ALLELES )
return modelsToUse;
+ if ( modelsToUse.size() != 2 )
+ throw new IllegalStateException("GGA mode assumes that we have initialized both the SNP and indel models but found " + modelsToUse);
+
// if we're genotyping given alleles then we need to choose the model corresponding to the variant type requested
- final List GGAmodel = new ArrayList(1);
final VariantContext vcInput = getVCFromAllelesRod(tracker, refContext, rawContext.getLocation(), false, logger, UAC.alleles);
- if ( vcInput == null )
- return GGAmodel; // no work to be done
- if ( vcInput.isSNP() ) {
- // use the SNP model unless the user chose INDEL mode only
- if ( modelsToUse.size() == 2 || modelsToUse.get(0).name().endsWith("SNP") )
- GGAmodel.add(modelsToUse.get(0));
+ if ( vcInput == null ) {
+ return Collections.emptyList(); // no work to be done
+ } else if ( vcInput.isSNP() ) {
+ return Collections.singletonList(modelsToUse.get(SNP_MODEL));
+ } else if ( vcInput.isIndel() || vcInput.isMixed() ) {
+ return Collections.singletonList(modelsToUse.get(INDEL_MODEL));
+ } else {
+ return Collections.emptyList(); // No support for other types yet
}
- else if ( vcInput.isIndel() || vcInput.isMixed() ) {
- // use the INDEL model unless the user chose SNP mode only
- if ( modelsToUse.size() == 2 )
- GGAmodel.add(modelsToUse.get(1));
- else if ( modelsToUse.get(0).name().endsWith("INDEL") )
- GGAmodel.add(modelsToUse.get(0));
- }
- // No support for other types yet
-
- return GGAmodel;
}
/**
diff --git a/protected/java/src/org/broadinstitute/sting/gatk/walkers/genotyper/afcalc/DiploidExactAFCalc.java b/protected/java/src/org/broadinstitute/sting/gatk/walkers/genotyper/afcalc/DiploidExactAFCalc.java
index 170b6e250..2ece18002 100644
--- a/protected/java/src/org/broadinstitute/sting/gatk/walkers/genotyper/afcalc/DiploidExactAFCalc.java
+++ b/protected/java/src/org/broadinstitute/sting/gatk/walkers/genotyper/afcalc/DiploidExactAFCalc.java
@@ -106,7 +106,7 @@ public abstract class DiploidExactAFCalc extends ExactAFCalc {
alleles.add(vc.getReference());
alleles.addAll(chooseMostLikelyAlternateAlleles(vc, getMaxAltAlleles()));
builder.alleles(alleles);
- builder.genotypes(GATKVariantContextUtils.subsetDiploidAlleles(vc, alleles, false));
+ builder.genotypes(GATKVariantContextUtils.subsetDiploidAlleles(vc, alleles, GATKVariantContextUtils.GenotypeAssignmentMethod.SET_TO_NO_CALL));
return builder.make();
} else {
return vc;
@@ -352,6 +352,9 @@ public abstract class DiploidExactAFCalc extends ExactAFCalc {
final List allelesToUse,
final boolean assignGenotypes,
final int ploidy) {
- return GATKVariantContextUtils.subsetDiploidAlleles(vc, allelesToUse, assignGenotypes);
+ return allelesToUse.size() == 1
+ ? GATKVariantContextUtils.subsetToRefOnly(vc, ploidy)
+ : GATKVariantContextUtils.subsetDiploidAlleles(vc, allelesToUse,
+ assignGenotypes ? GATKVariantContextUtils.GenotypeAssignmentMethod.USE_PLS_TO_ASSIGN : GATKVariantContextUtils.GenotypeAssignmentMethod.SET_TO_NO_CALL);
}
}
diff --git a/protected/java/src/org/broadinstitute/sting/gatk/walkers/haplotypecaller/DeBruijnAssembler.java b/protected/java/src/org/broadinstitute/sting/gatk/walkers/haplotypecaller/DeBruijnAssembler.java
index 48972dfd5..d876a403b 100644
--- a/protected/java/src/org/broadinstitute/sting/gatk/walkers/haplotypecaller/DeBruijnAssembler.java
+++ b/protected/java/src/org/broadinstitute/sting/gatk/walkers/haplotypecaller/DeBruijnAssembler.java
@@ -77,6 +77,7 @@ public class DeBruijnAssembler extends LocalAssemblyEngine {
private final static int NUM_PATHS_PER_GRAPH = 25;
private static final int KMER_OVERLAP = 5; // the additional size of a valid chunk of sequence, used to string together k-mers
private static final int GRAPH_KMER_STEP = 6;
+ private static final int GGA_MODE_ARTIFICIAL_COUNTS = 1000;
private final int minKmer;
private final int onlyBuildKmersOfThisSizeWhenDebuggingGraphAlgorithms;
@@ -92,8 +93,8 @@ public class DeBruijnAssembler extends LocalAssemblyEngine {
}
@Override
- protected List assemble(final List reads, final Haplotype refHaplotype) {
- final List graphs = new LinkedList();
+ protected List assemble(final List reads, final Haplotype refHaplotype, final List activeAlleleHaplotypes ) {
+ final List graphs = new LinkedList<>();
final int maxKmer = ReadUtils.getMaxReadLength(reads) - KMER_OVERLAP - 1;
if( maxKmer < minKmer) {
@@ -106,7 +107,7 @@ public class DeBruijnAssembler extends LocalAssemblyEngine {
continue;
if ( debug ) logger.info("Creating de Bruijn graph for " + kmer + " kmer using " + reads.size() + " reads");
- DeBruijnGraph graph = createGraphFromSequences( reads, kmer, refHaplotype);
+ DeBruijnGraph graph = createGraphFromSequences(reads, kmer, refHaplotype, activeAlleleHaplotypes);
if( graph != null ) { // graphs that fail during creation ( for example, because there are cycles in the reference graph ) will show up here as a null graph object
// do a series of steps to clean up the raw assembly graph to make it analysis-ready
if ( debugGraphTransformations ) graph.printGraph(new File("unpruned.dot"), pruneFactor);
@@ -133,7 +134,7 @@ public class DeBruijnAssembler extends LocalAssemblyEngine {
}
@Requires({"reads != null", "kmerLength > 0", "refHaplotype != null"})
- protected DeBruijnGraph createGraphFromSequences( final List reads, final int kmerLength, final Haplotype refHaplotype ) {
+ protected DeBruijnGraph createGraphFromSequences( final List reads, final int kmerLength, final Haplotype refHaplotype, final List activeAlleleHaplotypes ) {
final DeBruijnGraph graph = new DeBruijnGraph(kmerLength);
final DeBruijnGraphBuilder builder = new DeBruijnGraphBuilder(graph);
@@ -142,6 +143,11 @@ public class DeBruijnAssembler extends LocalAssemblyEngine {
// something went wrong, so abort right now with a null graph
return null;
+ // add the artificial GGA haplotypes to the graph
+ if ( ! addGGAKmersToGraph(builder, activeAlleleHaplotypes) )
+ // something went wrong, so abort right now with a null graph
+ return null;
+
// now go through the graph already seeded with the reference sequence and add the read kmers to it
if ( ! addReadKmersToGraph(builder, reads) )
// some problem was detected adding the reads to the graph, return null to indicate we failed
@@ -151,6 +157,28 @@ public class DeBruijnAssembler extends LocalAssemblyEngine {
return graph;
}
+ /**
+ * Add the high-quality kmers from the artificial GGA haplotypes to the graph
+ *
+ * @param builder a debruijn graph builder to add the read kmers to
+ * @param activeAlleleHaplotypes a list of haplotypes to add to the graph for GGA mode
+ * @return true if we successfully added the read kmers to the graph without corrupting it in some way
+ */
+ protected boolean addGGAKmersToGraph(final DeBruijnGraphBuilder builder, final List activeAlleleHaplotypes) {
+
+ final int kmerLength = builder.getKmerSize();
+
+ for( final Haplotype haplotype : activeAlleleHaplotypes ) {
+ final int end = haplotype.length() - kmerLength;
+ for( int start = 0; start < end; start++ ) {
+ builder.addKmerPairFromSeqToGraph( haplotype.getBases(), start, GGA_MODE_ARTIFICIAL_COUNTS );
+ }
+ }
+
+ // always returns true now, but it's possible that we'd add kmers and decide we don't like the graph in some way
+ return true;
+ }
+
/**
* Add the high-quality kmers from the reads to the graph
*
diff --git a/protected/java/src/org/broadinstitute/sting/gatk/walkers/haplotypecaller/GenotypingEngine.java b/protected/java/src/org/broadinstitute/sting/gatk/walkers/haplotypecaller/GenotypingEngine.java
index 419ea378f..04173b64f 100644
--- a/protected/java/src/org/broadinstitute/sting/gatk/walkers/haplotypecaller/GenotypingEngine.java
+++ b/protected/java/src/org/broadinstitute/sting/gatk/walkers/haplotypecaller/GenotypingEngine.java
@@ -49,6 +49,7 @@ package org.broadinstitute.sting.gatk.walkers.haplotypecaller;
import com.google.java.contract.Ensures;
import com.google.java.contract.Requires;
import org.apache.log4j.Logger;
+import org.broadinstitute.sting.gatk.refdata.RefMetaDataTracker;
import org.broadinstitute.sting.gatk.walkers.annotator.VariantAnnotatorEngine;
import org.broadinstitute.sting.gatk.walkers.genotyper.GenotypeLikelihoodsCalculationModel;
import org.broadinstitute.sting.gatk.walkers.genotyper.UnifiedGenotyperEngine;
@@ -71,7 +72,7 @@ public class GenotypingEngine {
private final boolean DEBUG;
private final boolean USE_FILTERED_READ_MAP_FOR_ANNOTATIONS;
- private final static List noCall = new ArrayList(); // used to noCall all genotypes until the exact model is applied
+ private final static List noCall = new ArrayList<>(); // used to noCall all genotypes until the exact model is applied
private final VariantAnnotatorEngine annotationEngine;
private final MergeVariantsAcrossHaplotypes crossHaplotypeEventMerger;
@@ -146,6 +147,7 @@ public class GenotypingEngine {
final GenomeLoc refLoc,
final GenomeLoc activeRegionWindow,
final GenomeLocParser genomeLocParser,
+ final RefMetaDataTracker tracker,
final List activeAllelesToGenotype ) {
// sanity check input arguments
if (UG_engine == null) throw new IllegalArgumentException("UG_Engine input can't be null, got "+UG_engine);
@@ -162,8 +164,8 @@ public class GenotypingEngine {
final TreeSet startPosKeySet = decomposeHaplotypesIntoVariantContexts(haplotypes, haplotypeReadMap, ref, refLoc, activeAllelesToGenotype);
// Walk along each position in the key set and create each event to be outputted
- final Set calledHaplotypes = new HashSet();
- final List returnCalls = new ArrayList();
+ final Set calledHaplotypes = new HashSet<>();
+ final List returnCalls = new ArrayList<>();
for( final int loc : startPosKeySet ) {
if( loc >= activeRegionWindow.getStart() && loc <= activeRegionWindow.getStop() ) { // genotyping an event inside this active region
final List eventsAtThisLoc = getVCsAtThisLocation(haplotypes, loc, activeAllelesToGenotype);
@@ -183,7 +185,7 @@ public class GenotypingEngine {
if( eventsAtThisLoc.size() != mergedVC.getAlternateAlleles().size() ) {
throw new ReviewedStingException("Record size mismatch! Something went wrong in the merging of alleles.");
}
- final Map mergeMap = new LinkedHashMap();
+ final Map mergeMap = new LinkedHashMap<>();
mergeMap.put(null, mergedVC.getReference()); // the reference event (null) --> the reference allele
for(int iii = 0; iii < mergedVC.getAlternateAlleles().size(); iii++) {
mergeMap.put(eventsAtThisLoc.get(iii), mergedVC.getAlternateAllele(iii)); // BUGBUG: This is assuming that the order of alleles is the same as the priority list given to simpleMerge function
@@ -204,13 +206,12 @@ public class GenotypingEngine {
convertHaplotypeReadMapToAlleleReadMap( haplotypeReadMap, alleleMapper, 0.0 ) );
final Map stratifiedReadMap = filterToOnlyOverlappingReads( genomeLocParser, alleleReadMap_annotations, perSampleFilteredReadList, call );
- VariantContext annotatedCall = call;
- if( annotatedCall.getAlleles().size() != mergedVC.getAlleles().size() ) { // some alleles were removed so reverseTrimming might be necessary!
+ VariantContext annotatedCall = annotationEngine.annotateContextForActiveRegion(tracker, stratifiedReadMap, call);
+
+ if( call.getAlleles().size() != mergedVC.getAlleles().size() ) { // some alleles were removed so reverseTrimming might be necessary!
annotatedCall = GATKVariantContextUtils.reverseTrimAlleles(annotatedCall);
}
- annotatedCall = annotationEngine.annotateContext(stratifiedReadMap, annotatedCall);
-
// maintain the set of all called haplotypes
for ( final Allele calledAllele : call.getAlleles() )
calledHaplotypes.addAll(alleleMapper.get(calledAllele));
@@ -244,7 +245,7 @@ public class GenotypingEngine {
if ( in_GGA_mode ) startPosKeySet.clear();
- cleanUpSymbolicUnassembledEvents( haplotypes );
+ //cleanUpSymbolicUnassembledEvents( haplotypes ); // We don't make symbolic alleles so this isn't needed currently
if ( !in_GGA_mode ) {
// run the event merger if we're not in GGA mode
final boolean mergedAnything = crossHaplotypeEventMerger.merge(haplotypes, haplotypeReadMap, startPosKeySet, ref, refLoc);
@@ -267,7 +268,7 @@ public class GenotypingEngine {
* @return the list of the sources of vcs in the same order
*/
private List makePriorityList(final List vcs) {
- final List priorityList = new LinkedList();
+ final List priorityList = new LinkedList<>();
for ( final VariantContext vc : vcs ) priorityList.add(vc.getSource());
return priorityList;
}
@@ -276,7 +277,7 @@ public class GenotypingEngine {
final int loc,
final List activeAllelesToGenotype) {
// the overlapping events to merge into a common reference view
- final List eventsAtThisLoc = new ArrayList();
+ final List eventsAtThisLoc = new ArrayList<>();
if( activeAllelesToGenotype.isEmpty() ) {
for( final Haplotype h : haplotypes ) {
@@ -292,7 +293,7 @@ public class GenotypingEngine {
if( compVC.getStart() == loc ) {
int alleleCount = 0;
for( final Allele compAltAllele : compVC.getAlternateAlleles() ) {
- List alleleSet = new ArrayList(2);
+ List alleleSet = new ArrayList<>(2);
alleleSet.add(compVC.getReference());
alleleSet.add(compAltAllele);
final String vcSourceName = "Comp" + compCount + "Allele" + alleleCount;
@@ -348,7 +349,7 @@ public class GenotypingEngine {
final Map> perSampleFilteredReadList,
final VariantContext call ) {
- final Map returnMap = new LinkedHashMap();
+ final Map returnMap = new LinkedHashMap<>();
final GenomeLoc callLoc = parser.createGenomeLoc(call);
for( final Map.Entry sample : perSampleReadMap.entrySet() ) {
final PerReadAlleleLikelihoodMap likelihoodMap = new PerReadAlleleLikelihoodMap();
@@ -384,7 +385,7 @@ public class GenotypingEngine {
// TODO - split into input haplotypes and output haplotypes as not to share I/O arguments
@Requires("haplotypes != null")
protected static void cleanUpSymbolicUnassembledEvents( final List haplotypes ) {
- final List haplotypesToRemove = new ArrayList();
+ final List haplotypesToRemove = new ArrayList<>();
for( final Haplotype h : haplotypes ) {
for( final VariantContext vc : h.getEventMap().getVariantContexts() ) {
if( vc.isSymbolic() ) {
@@ -407,7 +408,7 @@ public class GenotypingEngine {
final Map> alleleMapper,
final double downsamplingFraction ) {
- final Map alleleReadMap = new LinkedHashMap();
+ final Map alleleReadMap = new LinkedHashMap<>();
for( final Map.Entry haplotypeReadMapEntry : haplotypeReadMap.entrySet() ) { // for each sample
final PerReadAlleleLikelihoodMap perReadAlleleLikelihoodMap = new PerReadAlleleLikelihoodMap();
for( final Map.Entry> alleleMapperEntry : alleleMapper.entrySet() ) { // for each output allele
@@ -430,7 +431,7 @@ public class GenotypingEngine {
}
protected static Map> createAlleleMapper( final Map mergeMap, final Map> eventMap ) {
- final Map> alleleMapper = new LinkedHashMap>();
+ final Map> alleleMapper = new LinkedHashMap<>();
for( final Map.Entry entry : mergeMap.entrySet() ) {
alleleMapper.put(entry.getValue(), eventMap.get(new Event(entry.getKey())));
}
@@ -441,100 +442,33 @@ public class GenotypingEngine {
@Ensures({"result.size() == eventsAtThisLoc.size() + 1"})
protected static Map> createEventMapper( final int loc, final List eventsAtThisLoc, final List haplotypes ) {
- final Map> eventMapper = new LinkedHashMap>(eventsAtThisLoc.size()+1);
- VariantContext refVC = eventsAtThisLoc.get(0); // the genome loc is the only safe thing to pull out of this VC because ref/alt pairs might change reference basis
- eventMapper.put(new Event(null), new ArrayList());
+ final Map> eventMapper = new LinkedHashMap<>(eventsAtThisLoc.size()+1);
+ final Event refEvent = new Event(null);
+ eventMapper.put(refEvent, new ArrayList());
for( final VariantContext vc : eventsAtThisLoc ) {
eventMapper.put(new Event(vc), new ArrayList());
}
- final List undeterminedHaplotypes = new ArrayList(haplotypes.size());
for( final Haplotype h : haplotypes ) {
- if( h.isArtificialHaplotype() && loc == h.getArtificialAllelePosition() ) {
- final List alleles = new ArrayList(2);
- alleles.add(h.getArtificialRefAllele());
- alleles.add(h.getArtificialAltAllele());
- final Event artificialVC = new Event( (new VariantContextBuilder()).source("artificialHaplotype")
- .alleles(alleles)
- .loc(refVC.getChr(), refVC.getStart(), refVC.getStart() + h.getArtificialRefAllele().length() - 1).make() );
- if( eventMapper.containsKey(artificialVC) ) {
- eventMapper.get(artificialVC).add(h);
- }
- } else if( h.getEventMap().get(loc) == null ) { // no event at this location so let's investigate later
- undeterminedHaplotypes.add(h);
+ if( h.getEventMap().get(loc) == null ) {
+ eventMapper.get(refEvent).add(h);
} else {
- boolean haplotypeIsDetermined = false;
for( final VariantContext vcAtThisLoc : eventsAtThisLoc ) {
if( h.getEventMap().get(loc).hasSameAllelesAs(vcAtThisLoc) ) {
eventMapper.get(new Event(vcAtThisLoc)).add(h);
- haplotypeIsDetermined = true;
break;
}
}
-
- if( !haplotypeIsDetermined )
- undeterminedHaplotypes.add(h);
}
}
- for( final Haplotype h : undeterminedHaplotypes ) {
- Event matchingEvent = new Event(null);
- for( final Map.Entry> eventToTest : eventMapper.entrySet() ) {
- // don't test against the reference allele
- if( eventToTest.getKey().equals(new Event(null)) )
- continue;
-
- // only try to disambiguate for alleles that have had haplotypes previously assigned above
- if( eventToTest.getValue().isEmpty() )
- continue;
-
- final Haplotype artificialHaplotype = eventToTest.getValue().get(0);
- if( isSubSetOf(artificialHaplotype.getEventMap(), h.getEventMap(), true) ) {
- matchingEvent = eventToTest.getKey();
- break;
- }
- }
-
- eventMapper.get(matchingEvent).add(h);
- }
-
return eventMapper;
}
- protected static boolean isSubSetOf(final Map subset, final Map superset, final boolean resolveSupersetToSubset) {
-
- for ( final Map.Entry fromSubset : subset.entrySet() ) {
- final VariantContext fromSuperset = superset.get(fromSubset.getKey());
- if ( fromSuperset == null )
- return false;
-
- List supersetAlleles = fromSuperset.getAlternateAlleles();
- if ( resolveSupersetToSubset )
- supersetAlleles = resolveAlternateAlleles(fromSubset.getValue().getReference(), fromSuperset.getReference(), supersetAlleles);
-
- if ( !supersetAlleles.contains(fromSubset.getValue().getAlternateAllele(0)) )
- return false;
- }
-
- return true;
- }
-
- private static List resolveAlternateAlleles(final Allele targetReference, final Allele actualReference, final List currentAlleles) {
- if ( targetReference.length() <= actualReference.length() )
- return currentAlleles;
-
- final List newAlleles = new ArrayList(currentAlleles.size());
- final byte[] extraBases = Arrays.copyOfRange(targetReference.getBases(), actualReference.length(), targetReference.length());
- for ( final Allele a : currentAlleles ) {
- newAlleles.add(Allele.extend(a, extraBases));
- }
- return newAlleles;
- }
-
@Ensures({"result.size() == haplotypeAllelesForSample.size()"})
protected static List findEventAllelesInSample( final List eventAlleles, final List haplotypeAlleles, final List haplotypeAllelesForSample, final List> alleleMapper, final List haplotypes ) {
if( haplotypeAllelesForSample.contains(Allele.NO_CALL) ) { return noCall; }
- final List eventAllelesForSample = new ArrayList();
+ final List eventAllelesForSample = new ArrayList<>();
for( final Allele a : haplotypeAllelesForSample ) {
final Haplotype haplotype = haplotypes.get(haplotypeAlleles.indexOf(a));
for( int iii = 0; iii < alleleMapper.size(); iii++ ) {
diff --git a/protected/java/src/org/broadinstitute/sting/gatk/walkers/haplotypecaller/HaplotypeCaller.java b/protected/java/src/org/broadinstitute/sting/gatk/walkers/haplotypecaller/HaplotypeCaller.java
index db184854b..87f1ae75c 100644
--- a/protected/java/src/org/broadinstitute/sting/gatk/walkers/haplotypecaller/HaplotypeCaller.java
+++ b/protected/java/src/org/broadinstitute/sting/gatk/walkers/haplotypecaller/HaplotypeCaller.java
@@ -47,6 +47,10 @@
package org.broadinstitute.sting.gatk.walkers.haplotypecaller;
import com.google.java.contract.Ensures;
+import net.sf.samtools.Cigar;
+import net.sf.samtools.CigarElement;
+import net.sf.samtools.CigarOperator;
+import net.sf.samtools.SAMFileWriter;
import org.broadinstitute.sting.commandline.*;
import org.broadinstitute.sting.gatk.CommandLineGATK;
import org.broadinstitute.sting.gatk.arguments.DbsnpArgumentCollection;
@@ -76,8 +80,6 @@ import org.broadinstitute.sting.utils.activeregion.ActivityProfileState;
import org.broadinstitute.sting.utils.clipping.ReadClipper;
import org.broadinstitute.sting.utils.exceptions.UserException;
import org.broadinstitute.sting.utils.fasta.CachingIndexedFastaSequenceFile;
-import org.broadinstitute.sting.utils.fragments.FragmentCollection;
-import org.broadinstitute.sting.utils.fragments.FragmentUtils;
import org.broadinstitute.sting.utils.genotyper.PerReadAlleleLikelihoodMap;
import org.broadinstitute.sting.utils.haplotype.*;
import org.broadinstitute.sting.utils.haplotypeBAMWriter.HaplotypeBAMWriter;
@@ -219,7 +221,7 @@ public class HaplotypeCaller extends ActiveRegionWalker, In
*/
@Advanced
@Argument(fullName="annotation", shortName="A", doc="One or more specific annotations to apply to variant calls", required=false)
- protected List annotationsToUse = new ArrayList(Arrays.asList(new String[]{"ClippingRankSumTest"}));
+ protected List annotationsToUse = new ArrayList<>(Arrays.asList(new String[]{"ClippingRankSumTest", "DepthPerSampleHC"}));
/**
* Which annotations to exclude from output in the VCF file. Note that this argument has higher priority than the -A or -G arguments,
@@ -262,6 +264,14 @@ public class HaplotypeCaller extends ActiveRegionWalker, In
@Argument(fullName="kmerSize", shortName="kmerSize", doc="Kmer size to use in the read threading assembler", required = false)
protected List kmerSizes = Arrays.asList(10, 25);
+ @Advanced
+ @Argument(fullName="dontIncreaseKmerSizesForCycles", shortName="dontIncreaseKmerSizesForCycles", doc="Should we disable the iterating over kmer sizes when graph cycles are detected?", required = false)
+ protected boolean dontIncreaseKmerSizesForCycles = false;
+
+ @Advanced
+ @Argument(fullName="numPruningSamples", shortName="numPruningSamples", doc="The number of samples that must pass the minPuning factor in order for the path to be kept", required = false)
+ protected int numPruningSamples = 1;
+
/**
* Assembly graph can be quite complex, and could imply a very large number of possible haplotypes. Each haplotype
* considered requires N PairHMM evaluations if there are N reads across all samples. In order to control the
@@ -328,7 +338,7 @@ public class HaplotypeCaller extends ActiveRegionWalker, In
*/
@Advanced
@Argument(fullName="phredScaledGlobalReadMismappingRate", shortName="globalMAPQ", doc="The global assumed mismapping rate for reads", required = false)
- protected int phredScaledGlobalReadMismappingRate = 60;
+ protected int phredScaledGlobalReadMismappingRate = 45;
@Advanced
@Argument(fullName="maxNumHaplotypesInPopulation", shortName="maxNumHaplotypesInPopulation", doc="Maximum number of haplotypes to consider for your population. This number will probably need to be increased when calling organisms with high heterozygosity.", required = false)
@@ -384,6 +394,10 @@ public class HaplotypeCaller extends ActiveRegionWalker, In
@Argument(fullName="dontUseSoftClippedBases", shortName="dontUseSoftClippedBases", doc="If specified, we will not analyze soft clipped bases in the reads", required = false)
protected boolean dontUseSoftClippedBases = false;
+ @Hidden
+ @Argument(fullName="captureAssemblyFailureBAM", shortName="captureAssemblyFailureBAM", doc="If specified, we will write a BAM called assemblyFailure.bam capturing all of the reads that were in the active region when the assembler failed for any reason", required = false)
+ protected boolean captureAssemblyFailureBAM = false;
+
@Hidden
@Argument(fullName="allowCyclesInKmerGraphToGeneratePaths", shortName="allowCyclesInKmerGraphToGeneratePaths", doc="If specified, we will allow cycles in the kmer graphs to generate paths with multiple copies of the path sequenece rather than just the shortest paths", required = false)
protected boolean allowCyclesInKmerGraphToGeneratePaths = false;
@@ -392,6 +406,20 @@ public class HaplotypeCaller extends ActiveRegionWalker, In
@Argument(fullName="noFpga", shortName="noFpga", doc="If provided, disables the use of the FPGA HMM implementation", required = false)
protected boolean noFpga = false;
+ // Parameters to control read error correction
+ @Hidden
+ @Argument(fullName="errorCorrectReads", shortName="errorCorrectReads", doc = "Use an exploratory algorithm to error correct the kmers used during assembly. May cause fundamental problems with the assembly graph itself", required=false)
+ protected boolean errorCorrectReads = false;
+
+ @Hidden
+ @Argument(fullName="kmerLengthForReadErrorCorrection", shortName="kmerLengthForReadErrorCorrection", doc = "Use an exploratory algorithm to error correct the kmers used during assembly. May cause fundamental problems with the assembly graph itself", required=false)
+ protected int kmerLengthForReadErrorCorrection = 25;
+
+ @Hidden
+ @Argument(fullName="minObservationsForKmerToBeSolid", shortName="minObservationsForKmerToBeSolid", doc = "A k-mer must be seen at least these times for it considered to be solid", required=false)
+ protected int minObservationsForKmerToBeSolid = 20;
+
+
// -----------------------------------------------------------------------------------------------
// done with Haplotype caller parameters
// -----------------------------------------------------------------------------------------------
@@ -422,7 +450,8 @@ public class HaplotypeCaller extends ActiveRegionWalker, In
private final static int PADDING_AROUND_OTHERS_FOR_CALLING = 150;
// the maximum extent into the full active region extension that we're willing to go in genotyping our events
- private final static int MAX_GENOTYPING_ACTIVE_REGION_EXTENSION = 25;
+ private final static int MAX_DISCOVERY_ACTIVE_REGION_EXTENSION = 25;
+ private final static int MAX_GGA_ACTIVE_REGION_EXTENSION = 100;
private ActiveRegionTrimmer trimmer = null;
@@ -432,13 +461,11 @@ public class HaplotypeCaller extends ActiveRegionWalker, In
// bases with quality less than or equal to this value are trimmed off the tails of the reads
private static final byte MIN_TAIL_QUALITY = 20;
+ private static final byte MIN_TAIL_QUALITY_WITH_ERROR_CORRECTION = 6;
// the minimum length of a read we'd consider using for genotyping
private final static int MIN_READ_LENGTH = 10;
- private List samplesList = new ArrayList();
- private final static double LOG_ONE_HALF = -Math.log10(2.0);
- private final static double LOG_ONE_THIRD = -Math.log10(3.0);
- private final List allelesToGenotype = new ArrayList();
+ private List samplesList = new ArrayList<>();
private final static Allele FAKE_REF_ALLELE = Allele.create("N", true); // used in isActive function to call into UG Engine. Should never appear anywhere in a VCF file
private final static Allele FAKE_ALT_ALLELE = Allele.create("", false); // used in isActive function to call into UG Engine. Should never appear anywhere in a VCF file
@@ -518,7 +545,7 @@ public class HaplotypeCaller extends ActiveRegionWalker, In
final int maxAllowedPathsForReadThreadingAssembler = Math.max(maxPathsPerSample * nSamples, MIN_PATHS_PER_GRAPH);
assemblyEngine = useDebruijnAssembler
? new DeBruijnAssembler(minKmerForDebruijnAssembler, onlyUseKmerSizeForDebruijnAssembler)
- : new ReadThreadingAssembler(maxAllowedPathsForReadThreadingAssembler, kmerSizes);
+ : new ReadThreadingAssembler(maxAllowedPathsForReadThreadingAssembler, kmerSizes, dontIncreaseKmerSizesForCycles, numPruningSamples);
assemblyEngine.setErrorCorrectKmers(errorCorrectKmers);
assemblyEngine.setPruneFactor(MIN_PRUNE_FACTOR);
@@ -549,11 +576,16 @@ public class HaplotypeCaller extends ActiveRegionWalker, In
genotypingEngine = new GenotypingEngine( DEBUG, annotationEngine, USE_FILTERED_READ_MAP_FOR_ANNOTATIONS, variantMerger );
- if ( bamWriter != null )
+ if ( bamWriter != null ) {
+ // we currently do not support multi-threaded BAM writing, so exception out
+ if ( getToolkit().getTotalNumberOfThreads() > 1 )
+ throw new UserException.BadArgumentValue("bamout", "Currently cannot emit a BAM file from the HaplotypeCaller in multi-threaded mode.");
haplotypeBAMWriter = HaplotypeBAMWriter.create(bamWriterType, bamWriter, getToolkit().getSAMFileHeader());
+ }
trimmer = new ActiveRegionTrimmer(DEBUG, PADDING_AROUND_SNPS_FOR_CALLING, PADDING_AROUND_OTHERS_FOR_CALLING,
- MAX_GENOTYPING_ACTIVE_REGION_EXTENSION, getToolkit().getGenomeLocParser());
+ UAC.GenotypingMode.equals(GenotypeLikelihoodsCalculationModel.GENOTYPING_MODE.GENOTYPE_GIVEN_ALLELES) ? MAX_GGA_ACTIVE_REGION_EXTENSION : MAX_DISCOVERY_ACTIVE_REGION_EXTENSION,
+ getToolkit().getGenomeLocParser());
}
//---------------------------------------------------------------------------------------------------------------
@@ -592,7 +624,6 @@ public class HaplotypeCaller extends ActiveRegionWalker, In
if( UG_engine.getUAC().GenotypingMode == GenotypeLikelihoodsCalculationModel.GENOTYPING_MODE.GENOTYPE_GIVEN_ALLELES ) {
final VariantContext vcFromAllelesRod = UnifiedGenotyperEngine.getVCFromAllelesRod(tracker, ref, ref.getLocus(), false, logger, UG_engine.getUAC().alleles);
if( vcFromAllelesRod != null ) {
- allelesToGenotype.add(vcFromAllelesRod); // save for later for processing during the ActiveRegion's map call. Should be folded into a RefMetaDataTracker object
return new ActivityProfileState(ref.getLocus(), 1.0);
}
}
@@ -605,7 +636,7 @@ public class HaplotypeCaller extends ActiveRegionWalker, In
// if we don't have any data, just abort early
return new ActivityProfileState(ref.getLocus(), 0.0);
- final List noCall = new ArrayList(); // used to noCall all genotypes until the exact model is applied
+ final List noCall = new ArrayList<>(); // used to noCall all genotypes until the exact model is applied
noCall.add(Allele.NO_CALL);
final Map splitContexts = AlignmentContextUtils.splitContextBySampleName(context);
@@ -627,14 +658,14 @@ public class HaplotypeCaller extends ActiveRegionWalker, In
}
}
genotypeLikelihoods[AA] += p.getRepresentativeCount() * QualityUtils.qualToProbLog10(qual);
- genotypeLikelihoods[AB] += p.getRepresentativeCount() * MathUtils.approximateLog10SumLog10( QualityUtils.qualToProbLog10(qual) + LOG_ONE_HALF, QualityUtils.qualToErrorProbLog10(qual) + LOG_ONE_THIRD + LOG_ONE_HALF );
- genotypeLikelihoods[BB] += p.getRepresentativeCount() * QualityUtils.qualToErrorProbLog10(qual) + LOG_ONE_THIRD;
+ genotypeLikelihoods[AB] += p.getRepresentativeCount() * MathUtils.approximateLog10SumLog10( QualityUtils.qualToProbLog10(qual) + MathUtils.LOG_ONE_HALF, QualityUtils.qualToErrorProbLog10(qual) + MathUtils.LOG_ONE_THIRD + MathUtils.LOG_ONE_HALF );
+ genotypeLikelihoods[BB] += p.getRepresentativeCount() * QualityUtils.qualToErrorProbLog10(qual) + MathUtils.LOG_ONE_THIRD;
}
}
genotypes.add( new GenotypeBuilder(sample.getKey()).alleles(noCall).PL(genotypeLikelihoods).make() );
}
- final List alleles = new ArrayList();
+ final List alleles = new ArrayList<>();
alleles.add( FAKE_REF_ALLELE );
alleles.add( FAKE_ALT_ALLELE );
final VariantCallContext vcOut = UG_engine_simple_genotyper.calculateGenotypes(new VariantContextBuilder("HCisActive!", context.getContig(), context.getLocation().getStart(), context.getLocation().getStop(), alleles).genotypes(genotypes).make(), GenotypeLikelihoodsCalculationModel.Model.INDEL);
@@ -660,12 +691,11 @@ public class HaplotypeCaller extends ActiveRegionWalker, In
final List activeAllelesToGenotype = new ArrayList<>();
if( UG_engine.getUAC().GenotypingMode == GenotypeLikelihoodsCalculationModel.GENOTYPING_MODE.GENOTYPE_GIVEN_ALLELES ) {
- for( final VariantContext vc : allelesToGenotype ) {
- if( originalActiveRegion.getLocation().overlapsP( getToolkit().getGenomeLocParser().createGenomeLoc(vc) ) ) {
+ for ( final VariantContext vc : metaDataTracker.getValues(UG_engine.getUAC().alleles) ) {
+ if ( vc.isNotFiltered() ) {
activeAllelesToGenotype.add(vc); // do something with these VCs during GGA mode
}
}
- allelesToGenotype.removeAll( activeAllelesToGenotype );
// No alleles found in this region so nothing to do!
if ( activeAllelesToGenotype.isEmpty() ) { return NO_CALLS; }
} else {
@@ -680,7 +710,7 @@ public class HaplotypeCaller extends ActiveRegionWalker, In
if (dontGenotype) return NO_CALLS; // user requested we not proceed
// filter out reads from genotyping which fail mapping quality based criteria
- final List filteredReads = filterNonPassingReads( assemblyResult.regionForGenotyping );
+ final Collection filteredReads = filterNonPassingReads( assemblyResult.regionForGenotyping );
final Map> perSampleFilteredReadList = splitReadsBySample( filteredReads );
if( assemblyResult.regionForGenotyping.size() == 0 ) { return NO_CALLS; } // no reads remain after filtering so nothing else to do!
@@ -689,23 +719,27 @@ public class HaplotypeCaller extends ActiveRegionWalker, In
//logger.info("Computing read likelihoods with " + assemblyResult.regionForGenotyping.size() + " reads");
final Map stratifiedReadMap = likelihoodCalculationEngine.computeReadLikelihoods( assemblyResult.haplotypes, splitReadsBySample( assemblyResult.regionForGenotyping.getReads() ) );
- // subset down to only the best haplotypes to be genotyped in all samples ( in GGA mode use all discovered haplotypes )
- final List