diff --git a/protected/gatk-tools-protected/src/main/java/org/broadinstitute/gatk/tools/walkers/diagnostics/BaseCoverageDistribution.java b/protected/gatk-tools-protected/src/main/java/org/broadinstitute/gatk/tools/walkers/diagnostics/BaseCoverageDistribution.java
deleted file mode 100644
index dadd5a5fc..000000000
--- a/protected/gatk-tools-protected/src/main/java/org/broadinstitute/gatk/tools/walkers/diagnostics/BaseCoverageDistribution.java
+++ /dev/null
@@ -1,308 +0,0 @@
-/*
-* By downloading the PROGRAM you agree to the following terms of use:
-*
-* BROAD INSTITUTE
-* SOFTWARE LICENSE AGREEMENT
-* FOR ACADEMIC NON-COMMERCIAL RESEARCH PURPOSES ONLY
-*
-* This Agreement is made between the Broad Institute, Inc. with a principal address at 415 Main Street, Cambridge, MA 02142 (“BROAD”) and the LICENSEE and is effective at the date the downloading is completed (“EFFECTIVE DATE”).
-*
-* WHEREAS, LICENSEE desires to license the PROGRAM, as defined hereinafter, and BROAD wishes to have this PROGRAM utilized in the public interest, subject only to the royalty-free, nonexclusive, nontransferable license rights of the United States Government pursuant to 48 CFR 52.227-14; and
-* WHEREAS, LICENSEE desires to license the PROGRAM and BROAD desires to grant a license on the following terms and conditions.
-* NOW, THEREFORE, in consideration of the promises and covenants made herein, the parties hereto agree as follows:
-*
-* 1. DEFINITIONS
-* 1.1 PROGRAM shall mean copyright in the object code and source code known as GATK3 and related documentation, if any, as they exist on the EFFECTIVE DATE and can be downloaded from http://www.broadinstitute.org/gatk on the EFFECTIVE DATE.
-*
-* 2. LICENSE
-* 2.1 Grant. Subject to the terms of this Agreement, BROAD hereby grants to LICENSEE, solely for academic non-commercial research purposes, a non-exclusive, non-transferable license to: (a) download, execute and display the PROGRAM and (b) create bug fixes and modify the PROGRAM. LICENSEE hereby automatically grants to BROAD a non-exclusive, royalty-free, irrevocable license to any LICENSEE bug fixes or modifications to the PROGRAM with unlimited rights to sublicense and/or distribute. LICENSEE agrees to provide any such modifications and bug fixes to BROAD promptly upon their creation.
-* The LICENSEE may apply the PROGRAM in a pipeline to data owned by users other than the LICENSEE and provide these users the results of the PROGRAM provided LICENSEE does so for academic non-commercial purposes only. For clarification purposes, academic sponsored research is not a commercial use under the terms of this Agreement.
-* 2.2 No Sublicensing or Additional Rights. LICENSEE shall not sublicense or distribute the PROGRAM, in whole or in part, without prior written permission from BROAD. LICENSEE shall ensure that all of its users agree to the terms of this Agreement. LICENSEE further agrees that it shall not put the PROGRAM on a network, server, or other similar technology that may be accessed by anyone other than the LICENSEE and its employees and users who have agreed to the terms of this agreement.
-* 2.3 License Limitations. Nothing in this Agreement shall be construed to confer any rights upon LICENSEE by implication, estoppel, or otherwise to any computer software, trademark, intellectual property, or patent rights of BROAD, or of any other entity, except as expressly granted herein. LICENSEE agrees that the PROGRAM, in whole or part, shall not be used for any commercial purpose, including without limitation, as the basis of a commercial software or hardware product or to provide services. LICENSEE further agrees that the PROGRAM shall not be copied or otherwise adapted in order to circumvent the need for obtaining a license for use of the PROGRAM.
-*
-* 3. PHONE-HOME FEATURE
-* LICENSEE expressly acknowledges that the PROGRAM contains an embedded automatic reporting system (“PHONE-HOME”) which is enabled by default upon download. Unless LICENSEE requests disablement of PHONE-HOME, LICENSEE agrees that BROAD may collect limited information transmitted by PHONE-HOME regarding LICENSEE and its use of the PROGRAM. Such information shall include LICENSEE’S user identification, version number of the PROGRAM and tools being run, mode of analysis employed, and any error reports generated during run-time. Collection of such information is used by BROAD solely to monitor usage rates, fulfill reporting requirements to BROAD funding agencies, drive improvements to the PROGRAM, and facilitate adjustments to PROGRAM-related documentation.
-*
-* 4. OWNERSHIP OF INTELLECTUAL PROPERTY
-* LICENSEE acknowledges that title to the PROGRAM shall remain with BROAD. The PROGRAM is marked with the following BROAD copyright notice and notice of attribution to contributors. LICENSEE shall retain such notice on all copies. LICENSEE agrees to include appropriate attribution if any results obtained from use of the PROGRAM are included in any publication.
-* Copyright 2012-2015 Broad Institute, Inc.
-* Notice of attribution: The GATK3 program was made available through the generosity of Medical and Population Genetics program at the Broad Institute, Inc.
-* LICENSEE shall not use any trademark or trade name of BROAD, or any variation, adaptation, or abbreviation, of such marks or trade names, or any names of officers, faculty, students, employees, or agents of BROAD except as states above for attribution purposes.
-*
-* 5. INDEMNIFICATION
-* LICENSEE shall indemnify, defend, and hold harmless BROAD, and their respective officers, faculty, students, employees, associated investigators and agents, and their respective successors, heirs and assigns, (Indemnitees), against any liability, damage, loss, or expense (including reasonable attorneys fees and expenses) incurred by or imposed upon any of the Indemnitees in connection with any claims, suits, actions, demands or judgments arising out of any theory of liability (including, without limitation, actions in the form of tort, warranty, or strict liability and regardless of whether such action has any factual basis) pursuant to any right or license granted under this Agreement.
-*
-* 6. NO REPRESENTATIONS OR WARRANTIES
-* THE PROGRAM IS DELIVERED AS IS. BROAD MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND CONCERNING THE PROGRAM OR THE COPYRIGHT, EXPRESS OR IMPLIED, INCLUDING, WITHOUT LIMITATION, WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, NONINFRINGEMENT, OR THE ABSENCE OF LATENT OR OTHER DEFECTS, WHETHER OR NOT DISCOVERABLE. BROAD EXTENDS NO WARRANTIES OF ANY KIND AS TO PROGRAM CONFORMITY WITH WHATEVER USER MANUALS OR OTHER LITERATURE MAY BE ISSUED FROM TIME TO TIME.
-* IN NO EVENT SHALL BROAD OR ITS RESPECTIVE DIRECTORS, OFFICERS, EMPLOYEES, AFFILIATED INVESTIGATORS AND AFFILIATES BE LIABLE FOR INCIDENTAL OR CONSEQUENTIAL DAMAGES OF ANY KIND, INCLUDING, WITHOUT LIMITATION, ECONOMIC DAMAGES OR INJURY TO PROPERTY AND LOST PROFITS, REGARDLESS OF WHETHER BROAD SHALL BE ADVISED, SHALL HAVE OTHER REASON TO KNOW, OR IN FACT SHALL KNOW OF THE POSSIBILITY OF THE FOREGOING.
-*
-* 7. ASSIGNMENT
-* This Agreement is personal to LICENSEE and any rights or obligations assigned by LICENSEE without the prior written consent of BROAD shall be null and void.
-*
-* 8. MISCELLANEOUS
-* 8.1 Export Control. LICENSEE gives assurance that it will comply with all United States export control laws and regulations controlling the export of the PROGRAM, including, without limitation, all Export Administration Regulations of the United States Department of Commerce. Among other things, these laws and regulations prohibit, or require a license for, the export of certain types of software to specified countries.
-* 8.2 Termination. LICENSEE shall have the right to terminate this Agreement for any reason upon prior written notice to BROAD. If LICENSEE breaches any provision hereunder, and fails to cure such breach within thirty (30) days, BROAD may terminate this Agreement immediately. Upon termination, LICENSEE shall provide BROAD with written assurance that the original and all copies of the PROGRAM have been destroyed, except that, upon prior written authorization from BROAD, LICENSEE may retain a copy for archive purposes.
-* 8.3 Survival. The following provisions shall survive the expiration or termination of this Agreement: Articles 1, 3, 4, 5 and Sections 2.2, 2.3, 7.3, and 7.4.
-* 8.4 Notice. Any notices under this Agreement shall be in writing, shall specifically refer to this Agreement, and shall be sent by hand, recognized national overnight courier, confirmed facsimile transmission, confirmed electronic mail, or registered or certified mail, postage prepaid, return receipt requested. All notices under this Agreement shall be deemed effective upon receipt.
-* 8.5 Amendment and Waiver; Entire Agreement. This Agreement may be amended, supplemented, or otherwise modified only by means of a written instrument signed by all parties. Any waiver of any rights or failure to act in a specific instance shall relate only to such instance and shall not be construed as an agreement to waive any rights or fail to act in any other instance, whether or not similar. This Agreement constitutes the entire agreement among the parties with respect to its subject matter and supersedes prior agreements or understandings between the parties relating to its subject matter.
-* 8.6 Binding Effect; Headings. This Agreement shall be binding upon and inure to the benefit of the parties and their respective permitted successors and assigns. All headings are for convenience only and shall not affect the meaning of any provision of this Agreement.
-* 8.7 Governing Law. This Agreement shall be construed, governed, interpreted and applied in accordance with the internal laws of the Commonwealth of Massachusetts, U.S.A., without regard to conflict of laws principles.
-*/
-
-package org.broadinstitute.gatk.tools.walkers.diagnostics;
-
-import org.broadinstitute.gatk.utils.commandline.Argument;
-import org.broadinstitute.gatk.utils.commandline.Output;
-import org.broadinstitute.gatk.engine.CommandLineGATK;
-import org.broadinstitute.gatk.utils.contexts.AlignmentContext;
-import org.broadinstitute.gatk.utils.contexts.ReferenceContext;
-import org.broadinstitute.gatk.utils.refdata.RefMetaDataTracker;
-import org.broadinstitute.gatk.utils.report.GATKReport;
-import org.broadinstitute.gatk.engine.walkers.LocusWalker;
-import org.broadinstitute.gatk.utils.GenomeLoc;
-import org.broadinstitute.gatk.utils.GenomeLocParser;
-import org.broadinstitute.gatk.utils.help.DocumentedGATKFeature;
-import org.broadinstitute.gatk.utils.help.HelpConstants;
-
-import java.io.PrintStream;
-import java.util.ArrayList;
-import java.util.HashMap;
-import java.util.LinkedList;
-import java.util.Map;
-
-/**
- * Evaluate coverage distribution per base
- *
- *
- * This tool reports the distribution of coverage per base. It includes reads with deletions in the counts unless
- * otherwise specified. Quality filters can be applied before the coverage is calculated.
- *
- *
- *
Input
- *
- * The BAM file and an optional interval list
- *
- *
- *
Output
- *
- * A GATK Report with the coverage distribution per base
- *
- *
- *
- * @author carneiro
- * @since 1/27/13
- */
-@DocumentedGATKFeature( groupName = HelpConstants.DOCS_CAT_QC, extraDocs = {CommandLineGATK.class} )
-public class BaseCoverageDistribution extends LocusWalker, Map>> {
- /**
- * The name of the file to output the GATK Report table. See the FAQs for more information on the GATK Report format.
- */
- @Output(doc = "Output filename")
- private PrintStream out;
-
- /**
- * Whether or not a deletion should be counted towards the coverage of a site
- */
- @Argument(required = false, shortName="del", fullName = "include_deletions", doc ="Include reads with deletions")
- private boolean includeDeletions = true;
-
- /**
- * Whether or not to apply quality filters before calculating coverage distribution. Filtering will use the
- * minimum_mapping_quality and minimum_base_quality parameters below.
- */
- @Argument(required = false, shortName="fd", fullName = "filtered_distribution", doc ="Apply quality filters")
- private boolean calculateFilteredDistribution = false;
-
- /**
- * The minimum mapping quality a read must have to be counted towards the filtered coverage of a site
- */
- @Argument(required = false, shortName="mmq", fullName = "minimum_mapping_quality", doc ="Minimum read mapping quality of a read to pass filters")
- private byte minMappingQuality = 20;
-
- /**
- * The minimum base quality a base must have to be counted towards the filtered coverage of a site
- */
- @Argument(required = false, shortName="mbq", fullName = "minimum_base_quality", doc ="Minimum base quality to pass filters")
- private byte minBaseQuality = 17;
-
- private GenomeLoc previousLocus = null;
- private long uncoveredBases = 0L;
- private final LinkedList intervalList = new LinkedList();
-
- @Override
- public boolean includeReadsWithDeletionAtLoci() {
- return includeDeletions;
- }
-
- @Override
- public void initialize() {
- if (getToolkit().getIntervals() != null)
- intervalList.addAll(getToolkit().getIntervals()); // if the user provided intervals, keep track of them for uncovered bases calculation
- }
-
- @Override
- public ArrayList map(RefMetaDataTracker tracker, ReferenceContext ref, AlignmentContext context) {
- ArrayList result = new ArrayList(2);
- GenomeLoc currentLocus = ref.getLocus();
- tallyUncoveredBases(currentLocus);
- previousLocus = currentLocus;
- result.add(context.getBasePileup().getReads().size()); // I want the reads instead of the base pileup because I want to count deletions.
- if (calculateFilteredDistribution)
- result.add(context.getBasePileup().getBaseAndMappingFilteredPileup(minBaseQuality, minMappingQuality).getReads().size()); // filtered pileup
- else {
- result.add(result.get(0)); // repeat the same value as the unfiltered pileup if filters are not on
- }
- return result;
- }
-
- @Override
- public Map> reduceInit() {
- return new HashMap>(10000);
- }
-
- @Override
- public Map> reduce(ArrayList value, Map> sum) {
- final int unfilteredCoverage = value.get(0);
- final int filteredCoverage = value.get(1);
- incrementSumArray(sum, unfilteredCoverage, 0);
- incrementSumArray(sum, filteredCoverage, 1);
- return sum;
- }
-
- @Override
- public void onTraversalDone(Map> result) {
- tallyUncoveredBasesTillEndOfTraversal();
- GATKReport report;
-
- if (calculateFilteredDistribution) {
- report = GATKReport.newSimpleReport("BaseCoverageDistribution", "Coverage", "Count", "Filtered");
- } else {
- report = GATKReport.newSimpleReport("BaseCoverageDistribution", "Coverage", "Count");
- report.addRow(0, uncoveredBases); // preemptively add the uncovered bases row (since they'll never exist in the Map)
- }
-
- for (Map.Entry> entry : result.entrySet()) {
- final ArrayList values = entry.getValue();
- final int coverage = entry.getKey();
- if (calculateFilteredDistribution) {
- if (coverage == 0) { // special case for the uncovered bases. The filtered pileups may have an entry, but the unfiltered ones won't.
- report.addRow(coverage, uncoveredBases, uncoveredBases + values.get(1));
- } else {
- report.addRow(coverage, values.get(0), values.get(1));
- }
- } else {
- report.addRow(coverage, values.get(0));
- }
- }
- // In case the filtered distribution never had a pileup filtered down to zero coverage, output the overall uncovered bases for both
- if (calculateFilteredDistribution && !result.containsKey(0)) {
- report.addRow(0, uncoveredBases, uncoveredBases);
- }
- report.print(out);
- }
-
- /**
- * Initializes the ArrayList if needed. Returns the initialized element (or previously initialized)
- * this method is used directly by the incrementSumArray.
- *
- * @param sum the map
- * @param coverage the key to the map to extract the array list
- * @return if the ArrayList exists, return it. Otherwise, initialize it with 0 counters.
- */
- private ArrayList initializeSumArray(final Map> sum, final int coverage) {
- ArrayList curr = sum.get(coverage);
- if (curr == null) {
- curr = new ArrayList(2);
- curr.add(0L); // number of bases with this unfiltered coverage
- curr.add(0L); // number of bases with this filtered coverage
- sum.put(coverage, curr);
- }
- return curr;
- }
-
- /**
- * Increments the counter for the given arrayindex (type of coverage : filtered or unfiltered) initializing if necessary
- *
- * @param sum the hash
- * @param coverage the hash key
- * @param arrayIndex which distribution to increment, 0 for unfiltered, 1 for filtered.
- */
- private void incrementSumArray(final Map> sum, final int coverage, final int arrayIndex) {
- final ArrayList currentTally = initializeSumArray(sum, coverage);
- currentTally.set(arrayIndex, currentTally.get(arrayIndex) + 1);
- }
-
- /**
- * Counts all the uncovered loci after the end of traversal.
- *
- * - Modifies the global variable uncoveredBases
- * - Uses global variables: intervalList and previousLocus
- *
- * takes into account that the traversal may have been due over a set of intervals, or over the whole genome.
- */
- private void tallyUncoveredBasesTillEndOfTraversal() {
- GenomeLocParser parser = getToolkit().getGenomeLocParser();
- GenomeLoc lastLocus;
- if (intervalList.isEmpty()) { // whole genome, add up all contigs past previousLocus
- final int lastContigIndex = getToolkit().getSAMFileHeader().getSequenceDictionary().size() - 1;
- final int lastContigLength = getToolkit().getSAMFileHeader().getSequence(lastContigIndex).getSequenceLength();
- final String lastContigName = getToolkit().getSAMFileHeader().getSequence(lastContigIndex).getSequenceName();
- lastLocus = parser.createGenomeLoc(lastContigName, lastContigIndex, lastContigLength, lastContigLength);
- } else {
- GenomeLoc lastInterval = intervalList.getLast();
- lastLocus = parser.createGenomeLoc(lastInterval.getContig(), lastInterval.getContigIndex(), lastInterval.getStop(), lastInterval.getStop());
- }
- tallyUncoveredBases(lastLocus);
- }
-
- /**
- * Counts all the uncovered loci that have been skipped since the last visited locus. This method allows coverage
- * tools to run with @By(DataSource.READS) instead of @By(DataSource.REFERENCE), while still accurately calculating
- * uncovered bases
- *
- * //todo -- make this a generic capability of Coverage and DiagnoseTargets
- *
- * - Modifies the global variable uncoveredBases
- * - Uses global variables: intervalList and previousLocus
- *
- * takes into account that the traversal may have been due over a set of intervals, or over the whole genome.
- *
- * @param currentLocus the locus we are visiting right now
- */
- private void tallyUncoveredBases(GenomeLoc currentLocus) {
- long distance = 0;
- if (previousLocus == null) { // first base visited
- GenomeLocParser parser = getToolkit().getGenomeLocParser();
- if (intervalList.isEmpty()) { // if this is whole genome (no intervals requested), add what we missed.
- final GenomeLoc zeroLoc = parser.createGenomeLoc(getToolkit().getSAMFileHeader().getSequence(0).getSequenceName(), 0, 1, 1);
- distance += currentLocus.distanceAcrossContigs(zeroLoc, getToolkit().getSAMFileHeader());
- } else { // if we are running on an interval list, add all intervals before the current locus to the uncovered bases counter
- while (!intervalList.peek().containsP(currentLocus)) {
- GenomeLoc interval = intervalList.removeFirst();
- distance += interval.size();
- }
- distance += currentLocus.getStart() - intervalList.peek().getStart(); // now this is the interval that contains the current locus. Discount the bases from the beginning.
- }
- } else {
- final GenomeLoc previousInterval = intervalList.peekFirst(); // peekFirst returns null if interval list is empty (WGS).
- distance = currentLocus.distanceAcrossContigs(previousLocus, getToolkit().getSAMFileHeader()) - 1;
- if (previousInterval != null && !previousInterval.containsP(currentLocus)) {
- intervalList.removeFirst(); // we're done with the previous interval
- final GenomeLoc currentInterval = intervalList.peekFirst();
- distance -= currentInterval.distanceAcrossContigs(previousInterval, getToolkit().getSAMFileHeader()) - 1;
- }
- }
-
- uncoveredBases += distance;
- }
-}
\ No newline at end of file
diff --git a/protected/gatk-tools-protected/src/main/java/org/broadinstitute/gatk/tools/walkers/diagnostics/FindCoveredIntervals.java b/protected/gatk-tools-protected/src/main/java/org/broadinstitute/gatk/tools/walkers/diagnostics/FindCoveredIntervals.java
deleted file mode 100644
index cd87a845a..000000000
--- a/protected/gatk-tools-protected/src/main/java/org/broadinstitute/gatk/tools/walkers/diagnostics/FindCoveredIntervals.java
+++ /dev/null
@@ -1,160 +0,0 @@
-/*
-* By downloading the PROGRAM you agree to the following terms of use:
-*
-* BROAD INSTITUTE
-* SOFTWARE LICENSE AGREEMENT
-* FOR ACADEMIC NON-COMMERCIAL RESEARCH PURPOSES ONLY
-*
-* This Agreement is made between the Broad Institute, Inc. with a principal address at 415 Main Street, Cambridge, MA 02142 (“BROAD”) and the LICENSEE and is effective at the date the downloading is completed (“EFFECTIVE DATE”).
-*
-* WHEREAS, LICENSEE desires to license the PROGRAM, as defined hereinafter, and BROAD wishes to have this PROGRAM utilized in the public interest, subject only to the royalty-free, nonexclusive, nontransferable license rights of the United States Government pursuant to 48 CFR 52.227-14; and
-* WHEREAS, LICENSEE desires to license the PROGRAM and BROAD desires to grant a license on the following terms and conditions.
-* NOW, THEREFORE, in consideration of the promises and covenants made herein, the parties hereto agree as follows:
-*
-* 1. DEFINITIONS
-* 1.1 PROGRAM shall mean copyright in the object code and source code known as GATK3 and related documentation, if any, as they exist on the EFFECTIVE DATE and can be downloaded from http://www.broadinstitute.org/gatk on the EFFECTIVE DATE.
-*
-* 2. LICENSE
-* 2.1 Grant. Subject to the terms of this Agreement, BROAD hereby grants to LICENSEE, solely for academic non-commercial research purposes, a non-exclusive, non-transferable license to: (a) download, execute and display the PROGRAM and (b) create bug fixes and modify the PROGRAM. LICENSEE hereby automatically grants to BROAD a non-exclusive, royalty-free, irrevocable license to any LICENSEE bug fixes or modifications to the PROGRAM with unlimited rights to sublicense and/or distribute. LICENSEE agrees to provide any such modifications and bug fixes to BROAD promptly upon their creation.
-* The LICENSEE may apply the PROGRAM in a pipeline to data owned by users other than the LICENSEE and provide these users the results of the PROGRAM provided LICENSEE does so for academic non-commercial purposes only. For clarification purposes, academic sponsored research is not a commercial use under the terms of this Agreement.
-* 2.2 No Sublicensing or Additional Rights. LICENSEE shall not sublicense or distribute the PROGRAM, in whole or in part, without prior written permission from BROAD. LICENSEE shall ensure that all of its users agree to the terms of this Agreement. LICENSEE further agrees that it shall not put the PROGRAM on a network, server, or other similar technology that may be accessed by anyone other than the LICENSEE and its employees and users who have agreed to the terms of this agreement.
-* 2.3 License Limitations. Nothing in this Agreement shall be construed to confer any rights upon LICENSEE by implication, estoppel, or otherwise to any computer software, trademark, intellectual property, or patent rights of BROAD, or of any other entity, except as expressly granted herein. LICENSEE agrees that the PROGRAM, in whole or part, shall not be used for any commercial purpose, including without limitation, as the basis of a commercial software or hardware product or to provide services. LICENSEE further agrees that the PROGRAM shall not be copied or otherwise adapted in order to circumvent the need for obtaining a license for use of the PROGRAM.
-*
-* 3. PHONE-HOME FEATURE
-* LICENSEE expressly acknowledges that the PROGRAM contains an embedded automatic reporting system (“PHONE-HOME”) which is enabled by default upon download. Unless LICENSEE requests disablement of PHONE-HOME, LICENSEE agrees that BROAD may collect limited information transmitted by PHONE-HOME regarding LICENSEE and its use of the PROGRAM. Such information shall include LICENSEE’S user identification, version number of the PROGRAM and tools being run, mode of analysis employed, and any error reports generated during run-time. Collection of such information is used by BROAD solely to monitor usage rates, fulfill reporting requirements to BROAD funding agencies, drive improvements to the PROGRAM, and facilitate adjustments to PROGRAM-related documentation.
-*
-* 4. OWNERSHIP OF INTELLECTUAL PROPERTY
-* LICENSEE acknowledges that title to the PROGRAM shall remain with BROAD. The PROGRAM is marked with the following BROAD copyright notice and notice of attribution to contributors. LICENSEE shall retain such notice on all copies. LICENSEE agrees to include appropriate attribution if any results obtained from use of the PROGRAM are included in any publication.
-* Copyright 2012-2015 Broad Institute, Inc.
-* Notice of attribution: The GATK3 program was made available through the generosity of Medical and Population Genetics program at the Broad Institute, Inc.
-* LICENSEE shall not use any trademark or trade name of BROAD, or any variation, adaptation, or abbreviation, of such marks or trade names, or any names of officers, faculty, students, employees, or agents of BROAD except as states above for attribution purposes.
-*
-* 5. INDEMNIFICATION
-* LICENSEE shall indemnify, defend, and hold harmless BROAD, and their respective officers, faculty, students, employees, associated investigators and agents, and their respective successors, heirs and assigns, (Indemnitees), against any liability, damage, loss, or expense (including reasonable attorneys fees and expenses) incurred by or imposed upon any of the Indemnitees in connection with any claims, suits, actions, demands or judgments arising out of any theory of liability (including, without limitation, actions in the form of tort, warranty, or strict liability and regardless of whether such action has any factual basis) pursuant to any right or license granted under this Agreement.
-*
-* 6. NO REPRESENTATIONS OR WARRANTIES
-* THE PROGRAM IS DELIVERED AS IS. BROAD MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND CONCERNING THE PROGRAM OR THE COPYRIGHT, EXPRESS OR IMPLIED, INCLUDING, WITHOUT LIMITATION, WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, NONINFRINGEMENT, OR THE ABSENCE OF LATENT OR OTHER DEFECTS, WHETHER OR NOT DISCOVERABLE. BROAD EXTENDS NO WARRANTIES OF ANY KIND AS TO PROGRAM CONFORMITY WITH WHATEVER USER MANUALS OR OTHER LITERATURE MAY BE ISSUED FROM TIME TO TIME.
-* IN NO EVENT SHALL BROAD OR ITS RESPECTIVE DIRECTORS, OFFICERS, EMPLOYEES, AFFILIATED INVESTIGATORS AND AFFILIATES BE LIABLE FOR INCIDENTAL OR CONSEQUENTIAL DAMAGES OF ANY KIND, INCLUDING, WITHOUT LIMITATION, ECONOMIC DAMAGES OR INJURY TO PROPERTY AND LOST PROFITS, REGARDLESS OF WHETHER BROAD SHALL BE ADVISED, SHALL HAVE OTHER REASON TO KNOW, OR IN FACT SHALL KNOW OF THE POSSIBILITY OF THE FOREGOING.
-*
-* 7. ASSIGNMENT
-* This Agreement is personal to LICENSEE and any rights or obligations assigned by LICENSEE without the prior written consent of BROAD shall be null and void.
-*
-* 8. MISCELLANEOUS
-* 8.1 Export Control. LICENSEE gives assurance that it will comply with all United States export control laws and regulations controlling the export of the PROGRAM, including, without limitation, all Export Administration Regulations of the United States Department of Commerce. Among other things, these laws and regulations prohibit, or require a license for, the export of certain types of software to specified countries.
-* 8.2 Termination. LICENSEE shall have the right to terminate this Agreement for any reason upon prior written notice to BROAD. If LICENSEE breaches any provision hereunder, and fails to cure such breach within thirty (30) days, BROAD may terminate this Agreement immediately. Upon termination, LICENSEE shall provide BROAD with written assurance that the original and all copies of the PROGRAM have been destroyed, except that, upon prior written authorization from BROAD, LICENSEE may retain a copy for archive purposes.
-* 8.3 Survival. The following provisions shall survive the expiration or termination of this Agreement: Articles 1, 3, 4, 5 and Sections 2.2, 2.3, 7.3, and 7.4.
-* 8.4 Notice. Any notices under this Agreement shall be in writing, shall specifically refer to this Agreement, and shall be sent by hand, recognized national overnight courier, confirmed facsimile transmission, confirmed electronic mail, or registered or certified mail, postage prepaid, return receipt requested. All notices under this Agreement shall be deemed effective upon receipt.
-* 8.5 Amendment and Waiver; Entire Agreement. This Agreement may be amended, supplemented, or otherwise modified only by means of a written instrument signed by all parties. Any waiver of any rights or failure to act in a specific instance shall relate only to such instance and shall not be construed as an agreement to waive any rights or fail to act in any other instance, whether or not similar. This Agreement constitutes the entire agreement among the parties with respect to its subject matter and supersedes prior agreements or understandings between the parties relating to its subject matter.
-* 8.6 Binding Effect; Headings. This Agreement shall be binding upon and inure to the benefit of the parties and their respective permitted successors and assigns. All headings are for convenience only and shall not affect the meaning of any provision of this Agreement.
-* 8.7 Governing Law. This Agreement shall be construed, governed, interpreted and applied in accordance with the internal laws of the Commonwealth of Massachusetts, U.S.A., without regard to conflict of laws principles.
-*/
-
-package org.broadinstitute.gatk.tools.walkers.diagnostics;
-
-import org.broadinstitute.gatk.utils.commandline.Argument;
-import org.broadinstitute.gatk.utils.commandline.Output;
-import org.broadinstitute.gatk.engine.CommandLineGATK;
-import org.broadinstitute.gatk.utils.contexts.AlignmentContext;
-import org.broadinstitute.gatk.utils.contexts.ReferenceContext;
-import org.broadinstitute.gatk.utils.refdata.RefMetaDataTracker;
-import org.broadinstitute.gatk.engine.walkers.ActiveRegionTraversalParameters;
-import org.broadinstitute.gatk.engine.walkers.ActiveRegionWalker;
-import org.broadinstitute.gatk.engine.walkers.PartitionBy;
-import org.broadinstitute.gatk.engine.walkers.PartitionType;
-import org.broadinstitute.gatk.utils.GenomeLoc;
-import org.broadinstitute.gatk.utils.activeregion.ActivityProfileState;
-import org.broadinstitute.gatk.utils.help.DocumentedGATKFeature;
-import org.broadinstitute.gatk.utils.help.HelpConstants;
-
-import java.io.PrintStream;
-
-/**
- * Outputs a list of intervals that are covered above a given threshold
- *
- *
The output list can be used as an interval list for other tools. Note that if the -uncovered argument is given, the
- * logic will be inverted and the tool will instead output intervals that fail the coverage threshold.
- *
- */
-@DocumentedGATKFeature( groupName = HelpConstants.DOCS_CAT_QC, extraDocs = {CommandLineGATK.class} )
-@PartitionBy(value = PartitionType.CONTIG)
-@ActiveRegionTraversalParameters(extension = 0, maxRegion = 50000)
-public class FindCoveredIntervals extends ActiveRegionWalker {
- @Output
- private PrintStream out;
-
- @Argument(fullName = "uncovered", shortName = "u", required = false, doc = "output intervals that fail the coverage threshold instead")
- private boolean outputUncovered = false;
-
- @Argument(fullName = "coverage_threshold", shortName = "cov", doc = "The minimum allowable coverage to be considered covered", required = false)
- private int coverageThreshold = 20;
-
- @Argument(fullName = "minBaseQuality", shortName = "minBQ", doc = "The minimum allowable base quality score to be counted for coverage",required = false)
- private int minBaseQuality = 0;
-
- @Argument(fullName = "minMappingQuality", shortName = "minMQ", doc = "The minimum allowable mapping quality score to be counted for coverage",required = false)
- private int minMappingQuality = 0;
-
-
-
-
- @Override
- // Look to see if the region has sufficient coverage
- public ActivityProfileState isActive(final RefMetaDataTracker tracker, final ReferenceContext ref, final AlignmentContext context) {
-
- int depth;
- if(minBaseQuality == 0 && minMappingQuality == 0)
- depth = context.getBasePileup().getBaseFilteredPileup(coverageThreshold).depthOfCoverage();
- else
- depth = context.getBasePileup().getBaseAndMappingFilteredPileup(minBaseQuality,minMappingQuality).depthOfCoverage();
-
- // note the linear probability scale
- return new ActivityProfileState(ref.getLocus(), Math.min(depth / coverageThreshold, 1));
-
- }
-
- @Override
- public GenomeLoc map(final org.broadinstitute.gatk.utils.activeregion.ActiveRegion activeRegion, final RefMetaDataTracker tracker) {
- if ((!outputUncovered && activeRegion.isActive()) || (outputUncovered && !activeRegion.isActive()))
- return activeRegion.getLocation();
-
- return null;
- }
-
- @Override
- public Long reduceInit() {
- return 0L;
- }
-
- @Override
- public Long reduce(final GenomeLoc value, Long reduce) {
- if (value != null) {
- out.println(value.toString());
- reduce++;
- }
- return reduce;
- }
-
- @Override
- public void onTraversalDone(Long reduce) {
- logger.info(String.format("Found %d intervals", reduce));
- }
-}
diff --git a/protected/gatk-tools-protected/src/test/java/org/broadinstitute/gatk/tools/walkers/diagnostics/BaseCoverageDistributionIntegrationTest.java b/protected/gatk-tools-protected/src/test/java/org/broadinstitute/gatk/tools/walkers/diagnostics/BaseCoverageDistributionIntegrationTest.java
deleted file mode 100644
index 916bb68a9..000000000
--- a/protected/gatk-tools-protected/src/test/java/org/broadinstitute/gatk/tools/walkers/diagnostics/BaseCoverageDistributionIntegrationTest.java
+++ /dev/null
@@ -1,95 +0,0 @@
-/*
-* By downloading the PROGRAM you agree to the following terms of use:
-*
-* BROAD INSTITUTE
-* SOFTWARE LICENSE AGREEMENT
-* FOR ACADEMIC NON-COMMERCIAL RESEARCH PURPOSES ONLY
-*
-* This Agreement is made between the Broad Institute, Inc. with a principal address at 415 Main Street, Cambridge, MA 02142 (“BROAD”) and the LICENSEE and is effective at the date the downloading is completed (“EFFECTIVE DATE”).
-*
-* WHEREAS, LICENSEE desires to license the PROGRAM, as defined hereinafter, and BROAD wishes to have this PROGRAM utilized in the public interest, subject only to the royalty-free, nonexclusive, nontransferable license rights of the United States Government pursuant to 48 CFR 52.227-14; and
-* WHEREAS, LICENSEE desires to license the PROGRAM and BROAD desires to grant a license on the following terms and conditions.
-* NOW, THEREFORE, in consideration of the promises and covenants made herein, the parties hereto agree as follows:
-*
-* 1. DEFINITIONS
-* 1.1 PROGRAM shall mean copyright in the object code and source code known as GATK3 and related documentation, if any, as they exist on the EFFECTIVE DATE and can be downloaded from http://www.broadinstitute.org/gatk on the EFFECTIVE DATE.
-*
-* 2. LICENSE
-* 2.1 Grant. Subject to the terms of this Agreement, BROAD hereby grants to LICENSEE, solely for academic non-commercial research purposes, a non-exclusive, non-transferable license to: (a) download, execute and display the PROGRAM and (b) create bug fixes and modify the PROGRAM. LICENSEE hereby automatically grants to BROAD a non-exclusive, royalty-free, irrevocable license to any LICENSEE bug fixes or modifications to the PROGRAM with unlimited rights to sublicense and/or distribute. LICENSEE agrees to provide any such modifications and bug fixes to BROAD promptly upon their creation.
-* The LICENSEE may apply the PROGRAM in a pipeline to data owned by users other than the LICENSEE and provide these users the results of the PROGRAM provided LICENSEE does so for academic non-commercial purposes only. For clarification purposes, academic sponsored research is not a commercial use under the terms of this Agreement.
-* 2.2 No Sublicensing or Additional Rights. LICENSEE shall not sublicense or distribute the PROGRAM, in whole or in part, without prior written permission from BROAD. LICENSEE shall ensure that all of its users agree to the terms of this Agreement. LICENSEE further agrees that it shall not put the PROGRAM on a network, server, or other similar technology that may be accessed by anyone other than the LICENSEE and its employees and users who have agreed to the terms of this agreement.
-* 2.3 License Limitations. Nothing in this Agreement shall be construed to confer any rights upon LICENSEE by implication, estoppel, or otherwise to any computer software, trademark, intellectual property, or patent rights of BROAD, or of any other entity, except as expressly granted herein. LICENSEE agrees that the PROGRAM, in whole or part, shall not be used for any commercial purpose, including without limitation, as the basis of a commercial software or hardware product or to provide services. LICENSEE further agrees that the PROGRAM shall not be copied or otherwise adapted in order to circumvent the need for obtaining a license for use of the PROGRAM.
-*
-* 3. PHONE-HOME FEATURE
-* LICENSEE expressly acknowledges that the PROGRAM contains an embedded automatic reporting system (“PHONE-HOME”) which is enabled by default upon download. Unless LICENSEE requests disablement of PHONE-HOME, LICENSEE agrees that BROAD may collect limited information transmitted by PHONE-HOME regarding LICENSEE and its use of the PROGRAM. Such information shall include LICENSEE’S user identification, version number of the PROGRAM and tools being run, mode of analysis employed, and any error reports generated during run-time. Collection of such information is used by BROAD solely to monitor usage rates, fulfill reporting requirements to BROAD funding agencies, drive improvements to the PROGRAM, and facilitate adjustments to PROGRAM-related documentation.
-*
-* 4. OWNERSHIP OF INTELLECTUAL PROPERTY
-* LICENSEE acknowledges that title to the PROGRAM shall remain with BROAD. The PROGRAM is marked with the following BROAD copyright notice and notice of attribution to contributors. LICENSEE shall retain such notice on all copies. LICENSEE agrees to include appropriate attribution if any results obtained from use of the PROGRAM are included in any publication.
-* Copyright 2012-2015 Broad Institute, Inc.
-* Notice of attribution: The GATK3 program was made available through the generosity of Medical and Population Genetics program at the Broad Institute, Inc.
-* LICENSEE shall not use any trademark or trade name of BROAD, or any variation, adaptation, or abbreviation, of such marks or trade names, or any names of officers, faculty, students, employees, or agents of BROAD except as states above for attribution purposes.
-*
-* 5. INDEMNIFICATION
-* LICENSEE shall indemnify, defend, and hold harmless BROAD, and their respective officers, faculty, students, employees, associated investigators and agents, and their respective successors, heirs and assigns, (Indemnitees), against any liability, damage, loss, or expense (including reasonable attorneys fees and expenses) incurred by or imposed upon any of the Indemnitees in connection with any claims, suits, actions, demands or judgments arising out of any theory of liability (including, without limitation, actions in the form of tort, warranty, or strict liability and regardless of whether such action has any factual basis) pursuant to any right or license granted under this Agreement.
-*
-* 6. NO REPRESENTATIONS OR WARRANTIES
-* THE PROGRAM IS DELIVERED AS IS. BROAD MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND CONCERNING THE PROGRAM OR THE COPYRIGHT, EXPRESS OR IMPLIED, INCLUDING, WITHOUT LIMITATION, WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, NONINFRINGEMENT, OR THE ABSENCE OF LATENT OR OTHER DEFECTS, WHETHER OR NOT DISCOVERABLE. BROAD EXTENDS NO WARRANTIES OF ANY KIND AS TO PROGRAM CONFORMITY WITH WHATEVER USER MANUALS OR OTHER LITERATURE MAY BE ISSUED FROM TIME TO TIME.
-* IN NO EVENT SHALL BROAD OR ITS RESPECTIVE DIRECTORS, OFFICERS, EMPLOYEES, AFFILIATED INVESTIGATORS AND AFFILIATES BE LIABLE FOR INCIDENTAL OR CONSEQUENTIAL DAMAGES OF ANY KIND, INCLUDING, WITHOUT LIMITATION, ECONOMIC DAMAGES OR INJURY TO PROPERTY AND LOST PROFITS, REGARDLESS OF WHETHER BROAD SHALL BE ADVISED, SHALL HAVE OTHER REASON TO KNOW, OR IN FACT SHALL KNOW OF THE POSSIBILITY OF THE FOREGOING.
-*
-* 7. ASSIGNMENT
-* This Agreement is personal to LICENSEE and any rights or obligations assigned by LICENSEE without the prior written consent of BROAD shall be null and void.
-*
-* 8. MISCELLANEOUS
-* 8.1 Export Control. LICENSEE gives assurance that it will comply with all United States export control laws and regulations controlling the export of the PROGRAM, including, without limitation, all Export Administration Regulations of the United States Department of Commerce. Among other things, these laws and regulations prohibit, or require a license for, the export of certain types of software to specified countries.
-* 8.2 Termination. LICENSEE shall have the right to terminate this Agreement for any reason upon prior written notice to BROAD. If LICENSEE breaches any provision hereunder, and fails to cure such breach within thirty (30) days, BROAD may terminate this Agreement immediately. Upon termination, LICENSEE shall provide BROAD with written assurance that the original and all copies of the PROGRAM have been destroyed, except that, upon prior written authorization from BROAD, LICENSEE may retain a copy for archive purposes.
-* 8.3 Survival. The following provisions shall survive the expiration or termination of this Agreement: Articles 1, 3, 4, 5 and Sections 2.2, 2.3, 7.3, and 7.4.
-* 8.4 Notice. Any notices under this Agreement shall be in writing, shall specifically refer to this Agreement, and shall be sent by hand, recognized national overnight courier, confirmed facsimile transmission, confirmed electronic mail, or registered or certified mail, postage prepaid, return receipt requested. All notices under this Agreement shall be deemed effective upon receipt.
-* 8.5 Amendment and Waiver; Entire Agreement. This Agreement may be amended, supplemented, or otherwise modified only by means of a written instrument signed by all parties. Any waiver of any rights or failure to act in a specific instance shall relate only to such instance and shall not be construed as an agreement to waive any rights or fail to act in any other instance, whether or not similar. This Agreement constitutes the entire agreement among the parties with respect to its subject matter and supersedes prior agreements or understandings between the parties relating to its subject matter.
-* 8.6 Binding Effect; Headings. This Agreement shall be binding upon and inure to the benefit of the parties and their respective permitted successors and assigns. All headings are for convenience only and shall not affect the meaning of any provision of this Agreement.
-* 8.7 Governing Law. This Agreement shall be construed, governed, interpreted and applied in accordance with the internal laws of the Commonwealth of Massachusetts, U.S.A., without regard to conflict of laws principles.
-*/
-
-package org.broadinstitute.gatk.tools.walkers.diagnostics;
-
-import org.broadinstitute.gatk.engine.walkers.WalkerTest;
-import org.testng.annotations.DataProvider;
-import org.testng.annotations.Test;
-
-import java.util.Arrays;
-
-/**
- * @author Mauricio Carneiro
- * @since 2/6/13
- */
-public class BaseCoverageDistributionIntegrationTest extends WalkerTest {
- final static String REF = hg18Reference;
- final String bam = validationDataLocation + "small_bam_for_countloci.withRG.bam";
-
- @DataProvider(name = "BasicArguments")
- public Object[][] basicArgumentsDataProvider() {
- return new Object[][] {
- // Tests simple counting on one interval with everything in the same contig including tallying of uncovered bases.
- {"testSingleInterval ", "-L chr1:90000-100000", "45368696dc008d1a07fb2b05fbafd1f4"},
- // Tests specially the tallying of uncovered bases across multiple intervals. Makes sure it's only adding the bases present in the intervals requested.
- {"testMultipleIntervals ", "-L chr1:10-20 -L chr1:40-100 -L chr1:10,000-11,000 -L chr1:40,000-60,000 -L chr1:76,000-99,000 ", "45dafe59e5e54451b88c914d6ecbddc6"},
- // Tests adding the entire genome around every covered base as uncovered. Especially tests the tally in the beginning and end of the run, adding up all chromosomes not visited (this test file only has reads on chr1).
- {"testNoIntervals ", "", "c399f780f0b7da6be2614d837c368d1c"},
-
- // the following three tests are equivalent but now include the filtered distribution option. These tests are aimed at the filtered distribution output.
- {"testFilteredSingleInterval ", "-fd -L chr1:90000-100000", "7017cf191bf54e85111972a882e1d5fa"},
- {"testFilteredMultipleIntervals ", "-fd -L chr1:10-20 -L chr1:40-100 -L chr1:10,000-11,000 -L chr1:40,000-60,000 -L chr1:76,000-99,000 ", "75d11cc02210676d6c19939fb0b9ab2e"},
- {"testFilteredNoIntervals ", "-fd ", "e7abfa6c7be493de4557a64f66688148"},
- };
- }
-
- @Test(dataProvider = "BasicArguments", enabled = true)
- private void BaseCoverageDistributionTest(String testName, String args, String md5) {
- String base = String.format("-T BaseCoverageDistribution -R %s -I %s ", REF, bam) + " -o %s ";
- WalkerTestSpec spec = new WalkerTestSpec(base + args, Arrays.asList(md5));
- executeTest(testName, spec);
- }
-
-
-
-
-}
diff --git a/protected/gatk-tools-protected/src/test/java/org/broadinstitute/gatk/tools/walkers/variantutils/LiftoverVariantsIntegrationTest.java b/protected/gatk-tools-protected/src/test/java/org/broadinstitute/gatk/tools/walkers/variantutils/LiftoverVariantsIntegrationTest.java
deleted file mode 100644
index 637e338e4..000000000
--- a/protected/gatk-tools-protected/src/test/java/org/broadinstitute/gatk/tools/walkers/variantutils/LiftoverVariantsIntegrationTest.java
+++ /dev/null
@@ -1,109 +0,0 @@
-/*
-* By downloading the PROGRAM you agree to the following terms of use:
-*
-* BROAD INSTITUTE
-* SOFTWARE LICENSE AGREEMENT
-* FOR ACADEMIC NON-COMMERCIAL RESEARCH PURPOSES ONLY
-*
-* This Agreement is made between the Broad Institute, Inc. with a principal address at 415 Main Street, Cambridge, MA 02142 (“BROAD”) and the LICENSEE and is effective at the date the downloading is completed (“EFFECTIVE DATE”).
-*
-* WHEREAS, LICENSEE desires to license the PROGRAM, as defined hereinafter, and BROAD wishes to have this PROGRAM utilized in the public interest, subject only to the royalty-free, nonexclusive, nontransferable license rights of the United States Government pursuant to 48 CFR 52.227-14; and
-* WHEREAS, LICENSEE desires to license the PROGRAM and BROAD desires to grant a license on the following terms and conditions.
-* NOW, THEREFORE, in consideration of the promises and covenants made herein, the parties hereto agree as follows:
-*
-* 1. DEFINITIONS
-* 1.1 PROGRAM shall mean copyright in the object code and source code known as GATK3 and related documentation, if any, as they exist on the EFFECTIVE DATE and can be downloaded from http://www.broadinstitute.org/gatk on the EFFECTIVE DATE.
-*
-* 2. LICENSE
-* 2.1 Grant. Subject to the terms of this Agreement, BROAD hereby grants to LICENSEE, solely for academic non-commercial research purposes, a non-exclusive, non-transferable license to: (a) download, execute and display the PROGRAM and (b) create bug fixes and modify the PROGRAM. LICENSEE hereby automatically grants to BROAD a non-exclusive, royalty-free, irrevocable license to any LICENSEE bug fixes or modifications to the PROGRAM with unlimited rights to sublicense and/or distribute. LICENSEE agrees to provide any such modifications and bug fixes to BROAD promptly upon their creation.
-* The LICENSEE may apply the PROGRAM in a pipeline to data owned by users other than the LICENSEE and provide these users the results of the PROGRAM provided LICENSEE does so for academic non-commercial purposes only. For clarification purposes, academic sponsored research is not a commercial use under the terms of this Agreement.
-* 2.2 No Sublicensing or Additional Rights. LICENSEE shall not sublicense or distribute the PROGRAM, in whole or in part, without prior written permission from BROAD. LICENSEE shall ensure that all of its users agree to the terms of this Agreement. LICENSEE further agrees that it shall not put the PROGRAM on a network, server, or other similar technology that may be accessed by anyone other than the LICENSEE and its employees and users who have agreed to the terms of this agreement.
-* 2.3 License Limitations. Nothing in this Agreement shall be construed to confer any rights upon LICENSEE by implication, estoppel, or otherwise to any computer software, trademark, intellectual property, or patent rights of BROAD, or of any other entity, except as expressly granted herein. LICENSEE agrees that the PROGRAM, in whole or part, shall not be used for any commercial purpose, including without limitation, as the basis of a commercial software or hardware product or to provide services. LICENSEE further agrees that the PROGRAM shall not be copied or otherwise adapted in order to circumvent the need for obtaining a license for use of the PROGRAM.
-*
-* 3. PHONE-HOME FEATURE
-* LICENSEE expressly acknowledges that the PROGRAM contains an embedded automatic reporting system (“PHONE-HOME”) which is enabled by default upon download. Unless LICENSEE requests disablement of PHONE-HOME, LICENSEE agrees that BROAD may collect limited information transmitted by PHONE-HOME regarding LICENSEE and its use of the PROGRAM. Such information shall include LICENSEE’S user identification, version number of the PROGRAM and tools being run, mode of analysis employed, and any error reports generated during run-time. Collection of such information is used by BROAD solely to monitor usage rates, fulfill reporting requirements to BROAD funding agencies, drive improvements to the PROGRAM, and facilitate adjustments to PROGRAM-related documentation.
-*
-* 4. OWNERSHIP OF INTELLECTUAL PROPERTY
-* LICENSEE acknowledges that title to the PROGRAM shall remain with BROAD. The PROGRAM is marked with the following BROAD copyright notice and notice of attribution to contributors. LICENSEE shall retain such notice on all copies. LICENSEE agrees to include appropriate attribution if any results obtained from use of the PROGRAM are included in any publication.
-* Copyright 2012-2015 Broad Institute, Inc.
-* Notice of attribution: The GATK3 program was made available through the generosity of Medical and Population Genetics program at the Broad Institute, Inc.
-* LICENSEE shall not use any trademark or trade name of BROAD, or any variation, adaptation, or abbreviation, of such marks or trade names, or any names of officers, faculty, students, employees, or agents of BROAD except as states above for attribution purposes.
-*
-* 5. INDEMNIFICATION
-* LICENSEE shall indemnify, defend, and hold harmless BROAD, and their respective officers, faculty, students, employees, associated investigators and agents, and their respective successors, heirs and assigns, (Indemnitees), against any liability, damage, loss, or expense (including reasonable attorneys fees and expenses) incurred by or imposed upon any of the Indemnitees in connection with any claims, suits, actions, demands or judgments arising out of any theory of liability (including, without limitation, actions in the form of tort, warranty, or strict liability and regardless of whether such action has any factual basis) pursuant to any right or license granted under this Agreement.
-*
-* 6. NO REPRESENTATIONS OR WARRANTIES
-* THE PROGRAM IS DELIVERED AS IS. BROAD MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND CONCERNING THE PROGRAM OR THE COPYRIGHT, EXPRESS OR IMPLIED, INCLUDING, WITHOUT LIMITATION, WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, NONINFRINGEMENT, OR THE ABSENCE OF LATENT OR OTHER DEFECTS, WHETHER OR NOT DISCOVERABLE. BROAD EXTENDS NO WARRANTIES OF ANY KIND AS TO PROGRAM CONFORMITY WITH WHATEVER USER MANUALS OR OTHER LITERATURE MAY BE ISSUED FROM TIME TO TIME.
-* IN NO EVENT SHALL BROAD OR ITS RESPECTIVE DIRECTORS, OFFICERS, EMPLOYEES, AFFILIATED INVESTIGATORS AND AFFILIATES BE LIABLE FOR INCIDENTAL OR CONSEQUENTIAL DAMAGES OF ANY KIND, INCLUDING, WITHOUT LIMITATION, ECONOMIC DAMAGES OR INJURY TO PROPERTY AND LOST PROFITS, REGARDLESS OF WHETHER BROAD SHALL BE ADVISED, SHALL HAVE OTHER REASON TO KNOW, OR IN FACT SHALL KNOW OF THE POSSIBILITY OF THE FOREGOING.
-*
-* 7. ASSIGNMENT
-* This Agreement is personal to LICENSEE and any rights or obligations assigned by LICENSEE without the prior written consent of BROAD shall be null and void.
-*
-* 8. MISCELLANEOUS
-* 8.1 Export Control. LICENSEE gives assurance that it will comply with all United States export control laws and regulations controlling the export of the PROGRAM, including, without limitation, all Export Administration Regulations of the United States Department of Commerce. Among other things, these laws and regulations prohibit, or require a license for, the export of certain types of software to specified countries.
-* 8.2 Termination. LICENSEE shall have the right to terminate this Agreement for any reason upon prior written notice to BROAD. If LICENSEE breaches any provision hereunder, and fails to cure such breach within thirty (30) days, BROAD may terminate this Agreement immediately. Upon termination, LICENSEE shall provide BROAD with written assurance that the original and all copies of the PROGRAM have been destroyed, except that, upon prior written authorization from BROAD, LICENSEE may retain a copy for archive purposes.
-* 8.3 Survival. The following provisions shall survive the expiration or termination of this Agreement: Articles 1, 3, 4, 5 and Sections 2.2, 2.3, 7.3, and 7.4.
-* 8.4 Notice. Any notices under this Agreement shall be in writing, shall specifically refer to this Agreement, and shall be sent by hand, recognized national overnight courier, confirmed facsimile transmission, confirmed electronic mail, or registered or certified mail, postage prepaid, return receipt requested. All notices under this Agreement shall be deemed effective upon receipt.
-* 8.5 Amendment and Waiver; Entire Agreement. This Agreement may be amended, supplemented, or otherwise modified only by means of a written instrument signed by all parties. Any waiver of any rights or failure to act in a specific instance shall relate only to such instance and shall not be construed as an agreement to waive any rights or fail to act in any other instance, whether or not similar. This Agreement constitutes the entire agreement among the parties with respect to its subject matter and supersedes prior agreements or understandings between the parties relating to its subject matter.
-* 8.6 Binding Effect; Headings. This Agreement shall be binding upon and inure to the benefit of the parties and their respective permitted successors and assigns. All headings are for convenience only and shall not affect the meaning of any provision of this Agreement.
-* 8.7 Governing Law. This Agreement shall be construed, governed, interpreted and applied in accordance with the internal laws of the Commonwealth of Massachusetts, U.S.A., without regard to conflict of laws principles.
-*/
-
-package org.broadinstitute.gatk.tools.walkers.variantutils;
-
-import org.broadinstitute.gatk.engine.walkers.WalkerTest;
-import org.broadinstitute.gatk.utils.exceptions.UserException;
-import org.testng.annotations.Test;
-
-import java.util.Arrays;
-
-/**
- * Tests LiftoverVariants
- */
-public class LiftoverVariantsIntegrationTest extends WalkerTest {
-
- @Test
- public void testb36Tohg19() {
- WalkerTestSpec spec = new WalkerTestSpec(
- "-T LiftoverVariants -o %s -R " + b36KGReference + " --variant " + privateTestDir + "yri.trio.gatk_glftrio.intersection.annotated.filtered.chr1.500.noheader.vcf -chain " + validationDataLocation + "b36ToHg19.broad.over.chain -dict /seq/references/Homo_sapiens_assembly19/v0/Homo_sapiens_assembly19.dict",
- 1,
- Arrays.asList("eb38adfffb7913fa761c5c579b885ea2"));
- executeTest("test b36 to hg19", spec);
- }
-
- @Test
- public void testb36Tohg19UnsortedSamples() {
- WalkerTestSpec spec = new WalkerTestSpec(
- "-T LiftoverVariants -o %s -R " + b36KGReference + " --variant " + privateTestDir + "yri.trio.gatk_glftrio.intersection.annotated.filtered.chr1.500.noheader.unsortedSamples.vcf -chain " + validationDataLocation + "b36ToHg19.broad.over.chain -dict /seq/references/Homo_sapiens_assembly19/v0/Homo_sapiens_assembly19.dict",
- 1,
- Arrays.asList("f40e078fd5fb1d4107738a4f2f602902"));
- executeTest("test b36 to hg19, unsorted samples", spec);
- }
-
- @Test
- public void testhg18Tohg19Unsorted() {
- WalkerTestSpec spec = new WalkerTestSpec(
- "-T LiftoverVariants -o %s -R " + hg18Reference + " --variant:vcf " + privateTestDir + "liftover_test.vcf -chain " + validationDataLocation + "hg18ToHg19.broad.over.chain -dict /seq/references/Homo_sapiens_assembly19/v0/Homo_sapiens_assembly19.dict",
- 1,
- Arrays.asList("186a675c9758c1b83a1232399e18d8fe"));
- executeTest("test hg18 to hg19, unsorted", spec);
- }
-
- @Test
- public void testLiftoverFilteringOfIndels() {
- WalkerTestSpec spec = new WalkerTestSpec(
- "-T FilterLiftedVariants -o %s -R " + b37KGReference + " --variant:vcf " + privateTestDir + "liftover_indel_test.vcf --no_cmdline_in_header",
- 1,
- Arrays.asList("303e3df3bf737feaff397b4eeb037ecb"));
- executeTest("test liftover filtering of indels", spec);
- }
-
- @Test
- public void testLiftoverFailsWithNoOutput() {
- WalkerTestSpec spec = new WalkerTestSpec(
- "-T LiftoverVariants -R " + hg18Reference + " --variant:vcf " + privateTestDir + "liftover_test.vcf -chain " + validationDataLocation + "hg18ToHg19.broad.over.chain -dict /seq/references/Homo_sapiens_assembly19/v0/Homo_sapiens_assembly19.dict",
- 0,
- UserException.class);
- executeTest("test liftover fails with no output", spec);
- }
-}
diff --git a/public/gatk-tools-public/src/main/java/org/broadinstitute/gatk/tools/ListAnnotations.java b/public/gatk-tools-public/src/main/java/org/broadinstitute/gatk/tools/ListAnnotations.java
deleted file mode 100644
index bd7b78899..000000000
--- a/public/gatk-tools-public/src/main/java/org/broadinstitute/gatk/tools/ListAnnotations.java
+++ /dev/null
@@ -1,85 +0,0 @@
-/*
-* Copyright 2012-2015 Broad Institute, Inc.
-*
-* Permission is hereby granted, free of charge, to any person
-* obtaining a copy of this software and associated documentation
-* files (the "Software"), to deal in the Software without
-* restriction, including without limitation the rights to use,
-* copy, modify, merge, publish, distribute, sublicense, and/or sell
-* copies of the Software, and to permit persons to whom the
-* Software is furnished to do so, subject to the following
-* conditions:
-*
-* The above copyright notice and this permission notice shall be
-* included in all copies or substantial portions of the Software.
-*
-* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
-* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
-* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
-* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
-* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
-* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
-* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR
-* THE USE OR OTHER DEALINGS IN THE SOFTWARE.
-*/
-
-package org.broadinstitute.gatk.tools;
-
-import org.broadinstitute.gatk.utils.commandline.CommandLineProgram;
-import org.broadinstitute.gatk.utils.exceptions.UserException;
-import org.broadinstitute.gatk.tools.walkers.annotator.interfaces.AnnotationHelpUtils;
-import org.broadinstitute.gatk.utils.help.DocumentedGATKFeature;
-import org.broadinstitute.gatk.utils.help.HelpConstants;
-
-/**
- * Utility program to print a list of available annotations
- *
- *
This is a very simple utility tool that retrieves available annotations for use with tools such as
- * UnifiedGenotyper, HaplotypeCaller and VariantAnnotator.
- *
- *
Important note
- *
This is a command-line utility that bypasses the GATK engine. As a result, the command-line you must use to
- * invoke it is a little different from other GATK tools (see usage below), and it does not accept any of the
- * classic "CommandLineGATK" arguments.
- *
- * @author vdauwera
- * @since 3/14/13
- */
-@DocumentedGATKFeature( groupName = HelpConstants.DOCS_CAT_HELPUTILS )
-public class ListAnnotations extends CommandLineProgram {
-
- /*
- * Print usage information
- *
- * TODO: would be more convenient if we could just call the program by name instead of the full classpath
- */
- private static void printUsage() {
- System.err.println("Usage: java -cp dist/GenomeAnalysisTK.jar org.broadinstitute.gatk.tools.ListAnnotations");
- System.err.println(" Prints a list of available annotations and exits.");
- }
-
- // TODO: override CommandLineProgram bit that offers version, logging etc arguments. We don't need that stuff here and it makes the doc confusing.
-
- @Override
- protected int execute() throws Exception {
-
- AnnotationHelpUtils.listAnnotations();
- return 0;
- }
-
- public static void main(String[] args){
- try {
- ListAnnotations instance = new ListAnnotations();
- start(instance, args);
- System.exit(CommandLineProgram.result);
- } catch ( UserException e ) {
- printUsage();
- exitSystemWithUserError(e);
- } catch ( Exception e ) {
- exitSystemWithError(e);
- }
- }
-}
diff --git a/public/gatk-tools-public/src/main/java/org/broadinstitute/gatk/tools/walkers/diagnostics/CoveredByNSamplesSites.java b/public/gatk-tools-public/src/main/java/org/broadinstitute/gatk/tools/walkers/diagnostics/CoveredByNSamplesSites.java
deleted file mode 100644
index 432c7d095..000000000
--- a/public/gatk-tools-public/src/main/java/org/broadinstitute/gatk/tools/walkers/diagnostics/CoveredByNSamplesSites.java
+++ /dev/null
@@ -1,154 +0,0 @@
-/*
-* Copyright 2012-2015 Broad Institute, Inc.
-*
-* Permission is hereby granted, free of charge, to any person
-* obtaining a copy of this software and associated documentation
-* files (the "Software"), to deal in the Software without
-* restriction, including without limitation the rights to use,
-* copy, modify, merge, publish, distribute, sublicense, and/or sell
-* copies of the Software, and to permit persons to whom the
-* Software is furnished to do so, subject to the following
-* conditions:
-*
-* The above copyright notice and this permission notice shall be
-* included in all copies or substantial portions of the Software.
-*
-* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
-* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
-* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
-* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
-* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
-* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
-* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR
-* THE USE OR OTHER DEALINGS IN THE SOFTWARE.
-*/
-
-package org.broadinstitute.gatk.tools.walkers.diagnostics;
-
-
-import org.broadinstitute.gatk.engine.walkers.By;
-import org.broadinstitute.gatk.engine.walkers.DataSource;
-import org.broadinstitute.gatk.engine.walkers.RodWalker;
-import org.broadinstitute.gatk.engine.walkers.TreeReducible;
-import org.broadinstitute.gatk.utils.commandline.Argument;
-import org.broadinstitute.gatk.utils.commandline.ArgumentCollection;
-import org.broadinstitute.gatk.utils.commandline.Output;
-import org.broadinstitute.gatk.engine.CommandLineGATK;
-import org.broadinstitute.gatk.engine.arguments.StandardVariantContextInputArgumentCollection;
-import org.broadinstitute.gatk.utils.contexts.AlignmentContext;
-import org.broadinstitute.gatk.utils.contexts.ReferenceContext;
-import org.broadinstitute.gatk.utils.refdata.RefMetaDataTracker;
-import org.broadinstitute.gatk.utils.GenomeLoc;
-import org.broadinstitute.gatk.utils.help.DocumentedGATKFeature;
-import org.broadinstitute.gatk.utils.help.HelpConstants;
-import htsjdk.variant.variantcontext.Genotype;
-import htsjdk.variant.variantcontext.GenotypesContext;
-import htsjdk.variant.variantcontext.VariantContext;
-
-
-import java.io.*;
-import java.util.Collection;
-
-/**
- * Report well-covered intervals
- *
- *
- * This tool evaluates whether sites are well-covered or not according to specific coverage quality parameters, and
- * outputs a list of intervals that are considered well-covered, i.e. where most samples have good coverage. This is
- * useful for masking out poorly-covered sites where we cannot expect meaningful results in downstream analyses. See
- * argument defaults for what constitutes "most" samples and "good" coverage.
- *
- *
- *
Input
- *
- * A variant file and optionally, minimum coverage and sample percentage values.
- *
- *
- */
-@DocumentedGATKFeature( groupName = HelpConstants.DOCS_CAT_QC, extraDocs = {CommandLineGATK.class} )
-@By(DataSource.REFERENCE_ORDERED_DATA)
-public class CoveredByNSamplesSites extends RodWalker implements TreeReducible {
-
- @Output(fullName = "OutputIntervals", shortName = "out", doc = "Name of file for output intervals")
- PrintStream outputStream;
-
- @ArgumentCollection
- protected StandardVariantContextInputArgumentCollection variantCollection = new StandardVariantContextInputArgumentCollection();
-
- @Argument(fullName = "minCoverage", shortName = "minCov",doc = "only samples that have coverage bigger than minCoverage will be counted",required = false)
- int minCoverage = 10;
-
- @Argument(fullName = "percentageOfSamples", shortName = "percentage", doc = "only sites where at least percentageOfSamples of the samples have good coverage, will be emitted", required = false)
- double percentageOfSamples = 0.9;
-
- @Override
- public GenomeLoc map(RefMetaDataTracker tracker, ReferenceContext ref, AlignmentContext context) {
- if ( tracker == null )
- return null;
-
- Collection VCs = tracker.getValues(variantCollection.variants, context.getLocation());
- if ( VCs.size() == 0 )
- return null;
-
- boolean emitSite = false;
- for(VariantContext vc : VCs){
- int coveredSamples = 0;
- final GenotypesContext genotypes = vc.getGenotypes();
- final int numOfGenotypes = genotypes.size();
- for(Genotype g : genotypes){
- if(g.getDP() >= minCoverage)
- coveredSamples++;
- }
- if((double)coveredSamples/numOfGenotypes > percentageOfSamples){
- emitSite = true;
- }
- }
- if (emitSite)
- return ref.getLocus();
- else
- return null;
- }
-
- @Override
- public Integer reduceInit() { return 0; }
-
- @Override
- public Integer reduce(GenomeLoc value, Integer sum) {
- if ( value != null ) {
- outputStream.println(value);
- sum++;
- }
- return sum;
- }
-
- @Override
- public Integer treeReduce(Integer lhs, Integer rhs) {
- return lhs + rhs;
- }
-
- /**
- *
- * @param result the number of sites that passed the filter.
- */
- public void onTraversalDone(Integer result) {
- logger.info(result+" sites that have "+(percentageOfSamples*100)+"% of the samples with at least "+minCoverage+" coverage.\n");
- }
-
-
-
-}
diff --git a/public/gatk-tools-public/src/main/java/org/broadinstitute/gatk/tools/walkers/readutils/ReadAdaptorTrimmer.java b/public/gatk-tools-public/src/main/java/org/broadinstitute/gatk/tools/walkers/readutils/ReadAdaptorTrimmer.java
deleted file mode 100644
index 0944cf675..000000000
--- a/public/gatk-tools-public/src/main/java/org/broadinstitute/gatk/tools/walkers/readutils/ReadAdaptorTrimmer.java
+++ /dev/null
@@ -1,397 +0,0 @@
-/*
-* Copyright 2012-2015 Broad Institute, Inc.
-*
-* Permission is hereby granted, free of charge, to any person
-* obtaining a copy of this software and associated documentation
-* files (the "Software"), to deal in the Software without
-* restriction, including without limitation the rights to use,
-* copy, modify, merge, publish, distribute, sublicense, and/or sell
-* copies of the Software, and to permit persons to whom the
-* Software is furnished to do so, subject to the following
-* conditions:
-*
-* The above copyright notice and this permission notice shall be
-* included in all copies or substantial portions of the Software.
-*
-* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
-* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
-* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
-* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
-* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
-* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
-* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR
-* THE USE OR OTHER DEALINGS IN THE SOFTWARE.
-*/
-
-package org.broadinstitute.gatk.tools.walkers.readutils;
-
-import com.google.java.contract.Ensures;
-import com.google.java.contract.Requires;
-import htsjdk.samtools.SAMFileWriter;
-import org.apache.log4j.Logger;
-import org.broadinstitute.gatk.engine.walkers.NanoSchedulable;
-import org.broadinstitute.gatk.engine.walkers.PartitionBy;
-import org.broadinstitute.gatk.engine.walkers.PartitionType;
-import org.broadinstitute.gatk.engine.walkers.ReadWalker;
-import org.broadinstitute.gatk.utils.commandline.Advanced;
-import org.broadinstitute.gatk.utils.commandline.Argument;
-import org.broadinstitute.gatk.utils.commandline.Hidden;
-import org.broadinstitute.gatk.utils.commandline.Output;
-import org.broadinstitute.gatk.engine.CommandLineGATK;
-import org.broadinstitute.gatk.utils.contexts.ReferenceContext;
-import org.broadinstitute.gatk.utils.refdata.RefMetaDataTracker;
-import org.broadinstitute.gatk.utils.BaseUtils;
-import org.broadinstitute.gatk.utils.collections.Pair;
-import org.broadinstitute.gatk.utils.help.DocumentedGATKFeature;
-import org.broadinstitute.gatk.utils.help.HelpConstants;
-import org.broadinstitute.gatk.utils.sam.GATKSAMRecord;
-
-import java.util.ArrayList;
-import java.util.Arrays;
-import java.util.List;
-/**
- * Utility tool to blindly strip base adaptors
- *
- *
This tool is mainly intended to be applied to FASTQ/unaligned BAM pre-processing where libraries
- * have very short inserts, and hence a substantial part of the sequencing data will have adaptor sequence present. By
- * design, tool will only work for Illumina-like library constructs, where the typical library architecture is:
- * [Adaptor 1]-[Genomic Insert]-[Adaptor 2 (index/barcode)]
- *
- *
We assume that when data is paired, one read will span the forward strand and one read will span the reverse strand.
- * Hence, adaptors should be specified as both forward and reverse-complement to ensure they are removed in all cases.
- * By design, as well, "circular" constructions where a read can have an insert, then adaptor, then more genomic insert, are not supported.
- * When an adaptor is detected, all bases downstream from it (i.e. in the 3' direction) will be removed.
- * Adaptor detection is carried out by looking for overlaps between forward and reverse reads in a pair.
- * If a sufficiently high overlap is found, the insert size is computed and if insert size < read lengths adaptor bases are removed from reads.
- *
- *
- *
Advantage over ReadClipper: No previous knowledge of adaptors or library structure is necessary.
- *
- *
Advantages over 3rd party tools like SeqPrep:
- *
- *
Can do BAM streaming instead of having to convert to fastq
- *
No need to merge reads; merging reads can have some advantages, but complicates downstream processing and loses information that can be used,
- * e.g. in variant calling
- *
- *
- *
Input
- *
- * The input read data in BAM format. Read data MUST be in query name ordering as produced, for example with Picard's FastqToBam
- *
- */
-@DocumentedGATKFeature( groupName = HelpConstants.DOCS_CAT_DATA, extraDocs = {CommandLineGATK.class} )
-@PartitionBy(PartitionType.READ)
-public class ReadAdaptorTrimmer extends ReadWalker, SAMFileWriter> implements NanoSchedulable {
- @Output(doc="Write output to this BAM filename instead of STDOUT", required = false)
- SAMFileWriter out;
-
- /**
- * Only prints the first n reads of the file - for short testing
- */
- @Hidden
- @Argument(fullName = "number", shortName = "n", doc="Print the first n reads from the file, discarding the rest", required = false)
- int nReadsToPrint = -1;
-
- /**
- * Argument to control strictness of match between forward and reverse reads - by default, we require 15 matches between them to declare
- * an overlap.
- */
- @Advanced
- @Argument(fullName = "minMatches", shortName = "minMatches", doc="Minimum number of substring matches to detect pair overlaps", required = false)
- int minMatchesForOverlap = 15;
-
-
- /**
- * If true, this argument will make the walker discard unpaired reads instead of erroring out.
- */
- @Advanced
- @Argument(fullName = "removeUnpairedReads", shortName = "removeUnpairedReads", doc="Remove unpaired reads instead of erroring out", required = false)
- boolean cleanUnpairedReads = false;
-
- /**
- * private class members
- */
- private GATKSAMRecord firstReadInPair;
- private TrimStats trimStats = new TrimStats();
-
- static class TrimStats {
- long numReadsProcessed;
- long numReadsWithAdaptorTrimmed;
- long numUnpairedReadsFound;
- }
-
- /**
- * The reads filter function.
- *
- * @param ref the reference bases that correspond to our read, if a reference was provided
- * @param read the read itself, as a GATKSAMRecord
- * @return true if the read passes the filter, false if it doesn't
- */
- public boolean filter(ReferenceContext ref, GATKSAMRecord read) {
- // check if we've reached the output limit
- if ( nReadsToPrint == 0 ) {
- return false; // n == 0 means we've printed all we needed.
- }
- else if (nReadsToPrint > 0) {
- nReadsToPrint--; // n > 0 means there are still reads to be printed.
- }
- return true;
- }
- /**
- * reduceInit is called once before any calls to the map function. We use it here to setup the output
- * bam file, if it was specified on the command line
- *
- * @return SAMFileWriter, set to the BAM output file if the command line option was set, null otherwise
- */
- public SAMFileWriter reduceInit() {
- return out;
- }
-
- public List map( final ReferenceContext ref, final GATKSAMRecord readIn, final RefMetaDataTracker metaDataTracker ) {
-
-
- final List readsToEmit = new ArrayList();
-
-
- // cache first read in pair if flag set.
- if (readIn.getFirstOfPairFlag()) {
- firstReadInPair = GATKSAMRecord.emptyRead(readIn);
- firstReadInPair.setReadString(readIn.getReadString());
- firstReadInPair.setReadName(readIn.getReadName());
- firstReadInPair.setBaseQualities(readIn.getBaseQualities());
- }
- else {
- if (!readIn.getReadName().equals(firstReadInPair.getReadName())) {
- if (cleanUnpairedReads) {
- trimStats.numUnpairedReadsFound++;
- return readsToEmit;
- }
- else // by default require that reads be completely paired
- throw new IllegalStateException("Second read in pair must follow first read in pair: data not ordered?");
- }
-
- final int oldLength1 = firstReadInPair.getReadLength();
- final int oldLength2 = readIn.getReadLength();
- // try to strip any adaptor sequence in read pair
- final Integer result = trimReads(firstReadInPair, readIn, minMatchesForOverlap, logger);
-
- if (logger.isDebugEnabled()) {
- if (result == null)
- logger.debug("No overlap found, insert size cannot be computed");
- else
- logger.debug("Insert size estimate = " + result);
-
- }
-
-
- readsToEmit.add(firstReadInPair);
- readsToEmit.add(readIn);
-
- if (oldLength1 != firstReadInPair.getReadLength())
- trimStats.numReadsWithAdaptorTrimmed++;
- if (oldLength2 != readIn.getReadLength())
- trimStats.numReadsWithAdaptorTrimmed++;
-
- }
-
-
- trimStats.numReadsProcessed++;
- return readsToEmit;
-
- }
-
- /**
- * given a read and a output location, reduce by emitting the read
- *
- * @param readsToEmit the read itself
- * @param output the output source
- * @return the SAMFileWriter, so that the next reduce can emit to the same source
- */
- public SAMFileWriter reduce( final List readsToEmit, final SAMFileWriter output ) {
- for (final GATKSAMRecord read : readsToEmit)
- output.addAlignment(read);
-
- return output;
- }
-
- @Override
- public void onTraversalDone(SAMFileWriter output) {
-
- logger.info("Finished Trimming:");
- logger.info("Number of processed reads: "+ trimStats.numReadsProcessed);
- logger.info("Number of reads with adaptor sequence trimmed: "+ trimStats.numReadsWithAdaptorTrimmed);
- if (cleanUnpairedReads)
- logger.info("Number of unpaired reads thrown out: "+ trimStats.numUnpairedReadsFound);
- }
-
-
- /**
- *
- * Workhorse routines...
- *
- */
- /**
- * Core routine that does most underlying work for walker. Takes two reads and looks for overlaps in them.
- * An overlap is defined as a contiguous chunk of N bases that matches reverse-complement between reads.
- * Currently, the only insert structure that it will look for overlaps is as follows:
- * CASE 1: Insert shorter than read length:
- * 3' XXXXXXXXXXXXXXXX 5' (second read)
- * 5' YYYYYYYYYYYYYYYY 3' (first read)
- * ***********
- *
- * In this case, if X and Y are complements at the 11 positions marked by *, routine will do the following
- * iff minMatchesForOverlap <= 11:
- * a) Cleave adaptor from end of second read (leftmost dangling part in diagram above)
- * b) Cleave adaptor from end of first read (rightmost part in diagram).
- *
- * CASE 2: Insert size >= read length:
- * 3' XXXXXXXXXXXXXXXX 5' (second read)
- * 5' YYYYYYYYYYYYYYYY 3' (first read)
- * ********* (overlap)
- *
- * In this case, no trimming is done and reads are left unchanged
- * @param first (I/O) First read in pair - read contents (bases/quals) can be modified if adaptor is detected
- * @param second (I/O) Second read in pair - read contents (bases/quals) can be modified if adaptor is detected
- * @param minMatchesForOverlap Reads need to match in these # of bases to be joined
- * @return Offset between second and first read.
- * If there's no detectable offset, return Null
- */
- @Requires({"first != null","second != null","minMatchesForOverlap>0"})
- protected static Integer trimReads(final GATKSAMRecord first,
- final GATKSAMRecord second,
- final int minMatchesForOverlap,
- final Logger logger) {
-
- final Integer insertSize = estimateInsertSize(first.getReadBases(), second.getReadBases(),
- minMatchesForOverlap, logger);
-
- if (insertSize == null)
- return insertSize;
- if (insertSize < first.getReadLength()) {
- // trim adaptor sequence from read
- first.setReadBases(Arrays.copyOfRange(first.getReadBases(),0,insertSize));
- first.setBaseQualities(Arrays.copyOfRange(first.getBaseQualities(),0,insertSize));
- }
- if (insertSize < second.getReadLength()) {
- // trim adaptor sequence from read
- second.setReadBases(Arrays.copyOfRange(second.getReadBases(),0,insertSize));
- second.setBaseQualities(Arrays.copyOfRange(second.getBaseQualities(),0,insertSize));
- }
- return insertSize;
- }
-
- /**
- * Brain-dead implementation of an aligner of two sequences, where it's assumed that there might be an overlap
- * from the first into the second. From this, an estimate of insert size is performed and returned
- * Assumes that reads come in reverse direction, so one of the base sequences needs to be reverse-complemented.]
- *
- * @param firstRead Bytes from first read
- * @param secondRead Bytes from second read (reverse direction)
- * @return Estimated insert size based on offset between first and second read.
- * If no overlap can be detected, return null
- */
-
- @Requires({"firstRead != null","secondRead != null","minMatches>0","firstRead.length == secondRead.length"})
- protected static Integer estimateInsertSize(final byte[] firstRead,
- final byte[] secondRead,
- final int minMatches,
- final Logger logger) {
- final byte[] firstBases = firstRead;
- final byte[] secondBases = BaseUtils.simpleReverseComplement(secondRead);
-
- final Pair overlaps = findOverlappingSequence(firstBases, secondBases);
- final int bestOffset = overlaps.first;
- final int maxScore = overlaps.second;
- if ( logger.isDebugEnabled()) {
- String sb="", s1 = new String(firstBases), s2 = new String(secondBases);
- for (int k=0; k < Math.abs(bestOffset); k++) sb+=" ";
- if (maxScore >= minMatches) {
- logger.debug(String.format("Match, Max Score = %d, best offset = %d\n",maxScore, bestOffset));
- if (bestOffset>0)
- s2 = sb+s2;
- else
- s1 = sb+s1;
- }
- else logger.debug("NoMatch:");
- logger.debug("R1:"+s1);
- logger.debug("R2:"+s2);
-
-
- }
-
- if (maxScore < minMatches)
- return null; // no overlap detected
-
- return bestOffset+secondRead.length;
-
-
- }
-
-
- /**
- * Tries to find overlapping sequence between two reads, and computes offset between them
- * For each possible offset, computes matching score, which is = MATCH_SCORE*Num_matches + MISMATCH_SCORE*num_mismatches
- * (like SW with infinite gap penalties).
- * @param first First read bytes
- * @param second Second read bytes
- * @return Pair of integers (x,y). x = best offset between reads, y = corresponding score
- */
- @Requires({"first != null","second != null"})
- @Ensures("result != null")
- protected static Pair findOverlappingSequence(final byte[] first,
- final byte[] second) {
- final int MATCH_SCORE = 1;
- final int MISMATCH_SCORE = -1;
- // try every possible offset - O(N^2) algorithm
-
- // In case of following structure,
- // 111111111
- // 222222222
- // computed offset will be negative (=-5 in this case).
- // If however,
- // 111111111
- // 222222222
- // then offset will be positive (=3 in this case)
- int maxScore = 0, bestOffset =0;
- for (int offset = -second.length; offset < first.length; offset++) {
- int score = 0;
- // compute start index for each array
- int ind1 = (offset<0)?0:offset;
- int ind2 = (offset<0)?-offset:0;
- for (int k=0; k < Math.min(first.length, second.length) ; k++) {
- if (ind1 >= first.length)
- break;
- if (ind2 >= second.length )
- break;
- if (first[ind1] != 'N' && second[ind2] != 'N') {
- if (first[ind1] == second[ind2])
- score += MATCH_SCORE;
- else
- score += MISMATCH_SCORE;
- }
- ind1++;
- ind2++;
- }
- if (score > maxScore) {
- maxScore = score;
- bestOffset = offset;
- }
- }
- return new Pair(bestOffset,maxScore);
- }
-
-}
diff --git a/public/gatk-tools-public/src/main/java/org/broadinstitute/gatk/tools/walkers/variantutils/FilterLiftedVariants.java b/public/gatk-tools-public/src/main/java/org/broadinstitute/gatk/tools/walkers/variantutils/FilterLiftedVariants.java
deleted file mode 100644
index 9c5e3b4cd..000000000
--- a/public/gatk-tools-public/src/main/java/org/broadinstitute/gatk/tools/walkers/variantutils/FilterLiftedVariants.java
+++ /dev/null
@@ -1,159 +0,0 @@
-/*
-* Copyright 2012-2015 Broad Institute, Inc.
-*
-* Permission is hereby granted, free of charge, to any person
-* obtaining a copy of this software and associated documentation
-* files (the "Software"), to deal in the Software without
-* restriction, including without limitation the rights to use,
-* copy, modify, merge, publish, distribute, sublicense, and/or sell
-* copies of the Software, and to permit persons to whom the
-* Software is furnished to do so, subject to the following
-* conditions:
-*
-* The above copyright notice and this permission notice shall be
-* included in all copies or substantial portions of the Software.
-*
-* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
-* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
-* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
-* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
-* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
-* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
-* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR
-* THE USE OR OTHER DEALINGS IN THE SOFTWARE.
-*/
-
-package org.broadinstitute.gatk.tools.walkers.variantutils;
-
-import org.broadinstitute.gatk.engine.walkers.Reference;
-import org.broadinstitute.gatk.engine.walkers.RodWalker;
-import org.broadinstitute.gatk.engine.walkers.Window;
-import org.broadinstitute.gatk.utils.commandline.ArgumentCollection;
-import org.broadinstitute.gatk.utils.commandline.Output;
-import org.broadinstitute.gatk.engine.CommandLineGATK;
-import org.broadinstitute.gatk.engine.arguments.StandardVariantContextInputArgumentCollection;
-import org.broadinstitute.gatk.utils.contexts.AlignmentContext;
-import org.broadinstitute.gatk.utils.contexts.ReferenceContext;
-import org.broadinstitute.gatk.utils.refdata.RefMetaDataTracker;
-import org.broadinstitute.gatk.engine.SampleUtils;
-import org.broadinstitute.gatk.utils.help.HelpConstants;
-import htsjdk.variant.vcf.VCFHeader;
-import htsjdk.variant.vcf.VCFHeaderLine;
-import org.broadinstitute.gatk.engine.GATKVCFUtils;
-import org.broadinstitute.gatk.utils.help.DocumentedGATKFeature;
-import htsjdk.variant.variantcontext.writer.VariantContextWriter;
-import htsjdk.variant.variantcontext.VariantContext;
-
-import java.util.*;
-
-/**
- * Filters a lifted-over VCF file for reference bases that have been changed
- *
- *
"Lifting over" variants means adjusting variant calls from one reference to another. Specifically, the process
- * adjusts the position of the call to match the corresponding position on the target reference. For example, if you
- * have variants called from reads aligned to the hg19 reference, and you want to compare them to calls made based on
- * the b37 reference, you need to liftover one of the callsets to the other reference.
- *
- *
This tool is intended to be the second of two processing steps for the liftover process. The first step is to
- * run LiftoverVariants on your VCF file. The second step is to run FilterLiftedVariants on the output of
- * LiftoverVariants. This will produce valid well-behaved VCF files, where you'll see that the contig names in the
- * header have all been correctly replaced.
- *
- */
-@DocumentedGATKFeature( groupName = HelpConstants.DOCS_CAT_VARMANIP, extraDocs = {CommandLineGATK.class} )
-@Reference(window=@Window(start=0,stop=100))
-public class FilterLiftedVariants extends RodWalker {
-
- @ArgumentCollection
- protected StandardVariantContextInputArgumentCollection variantCollection = new StandardVariantContextInputArgumentCollection();
-
- private static final int MAX_VARIANT_SIZE = 100;
-
- @Output(doc="File to which variants should be written")
- protected VariantContextWriter writer = null;
-
- private long failedLocs = 0, totalLocs = 0;
-
- public void initialize() {
- String trackName = variantCollection.variants.getName();
- Set samples = SampleUtils.getSampleListWithVCFHeader(getToolkit(), Arrays.asList(trackName));
- Map vcfHeaders = GATKVCFUtils.getVCFHeadersFromRods(getToolkit(), Arrays.asList(trackName));
-
- final VCFHeader vcfHeader = new VCFHeader(vcfHeaders.containsKey(trackName) ? vcfHeaders.get(trackName).getMetaDataInSortedOrder() : Collections.emptySet(), samples);
- writer.writeHeader(vcfHeader);
- }
-
- /**
- * Determines whether records should be filtered; if not, writes them to the output
- *
- * @param ref the reference context
- * @param vc the VariantContext to process
- * @return true if the record is not filtered, false otherwise
- */
- protected boolean filterOrWrite(final byte[] ref, final VariantContext vc) {
- if ( ref == null ) throw new IllegalArgumentException("Cannot filter based on a null reference array");
- if ( vc == null ) throw new IllegalArgumentException("Cannot filter a null Variant Context");
-
- totalLocs++;
-
- boolean filter = false;
- final byte[] recordRef = vc.getReference().getBases();
-
- // this can happen for records that get placed at the ends of chromosomes
- if ( recordRef.length > ref.length ) {
- filter = true;
- } else {
- for (int i = 0; i < recordRef.length && i < MAX_VARIANT_SIZE; i++) {
- if ( recordRef[i] != ref[i] ) {
- filter = true;
- break;
- }
- }
- }
-
- if ( filter )
- failedLocs++;
- else
- writer.add(vc);
-
- return !filter;
- }
-
- public Integer map(RefMetaDataTracker tracker, ReferenceContext ref, AlignmentContext context) {
- if ( tracker == null )
- return 0;
-
- final Collection VCs = tracker.getValues(variantCollection.variants, context.getLocation());
- for ( final VariantContext vc : VCs )
- filterOrWrite(ref.getBases(), vc);
-
- return 0;
- }
-
- public Integer reduceInit() { return 0; }
-
- public Integer reduce(Integer value, Integer sum) { return 0; }
-
- public void onTraversalDone(Integer result) {
- System.out.println("Filtered " + failedLocs + " records out of " + totalLocs + " total records.");
- }
-}
diff --git a/public/gatk-tools-public/src/main/java/org/broadinstitute/gatk/tools/walkers/variantutils/LiftoverVariants.java b/public/gatk-tools-public/src/main/java/org/broadinstitute/gatk/tools/walkers/variantutils/LiftoverVariants.java
deleted file mode 100644
index 4bed885f1..000000000
--- a/public/gatk-tools-public/src/main/java/org/broadinstitute/gatk/tools/walkers/variantutils/LiftoverVariants.java
+++ /dev/null
@@ -1,209 +0,0 @@
-/*
-* Copyright 2012-2015 Broad Institute, Inc.
-*
-* Permission is hereby granted, free of charge, to any person
-* obtaining a copy of this software and associated documentation
-* files (the "Software"), to deal in the Software without
-* restriction, including without limitation the rights to use,
-* copy, modify, merge, publish, distribute, sublicense, and/or sell
-* copies of the Software, and to permit persons to whom the
-* Software is furnished to do so, subject to the following
-* conditions:
-*
-* The above copyright notice and this permission notice shall be
-* included in all copies or substantial portions of the Software.
-*
-* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
-* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
-* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
-* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
-* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
-* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
-* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR
-* THE USE OR OTHER DEALINGS IN THE SOFTWARE.
-*/
-
-package org.broadinstitute.gatk.tools.walkers.variantutils;
-
-import htsjdk.samtools.liftover.LiftOver;
-import htsjdk.samtools.util.Interval;
-import htsjdk.samtools.SAMFileHeader;
-import htsjdk.samtools.SAMFileReader;
-import htsjdk.variant.vcf.VCFHeader;
-import htsjdk.variant.vcf.VCFHeaderLine;
-import org.broadinstitute.gatk.utils.commandline.Argument;
-import org.broadinstitute.gatk.utils.commandline.ArgumentCollection;
-import org.broadinstitute.gatk.utils.commandline.Output;
-import org.broadinstitute.gatk.engine.CommandLineGATK;
-import org.broadinstitute.gatk.engine.arguments.StandardVariantContextInputArgumentCollection;
-import org.broadinstitute.gatk.utils.contexts.AlignmentContext;
-import org.broadinstitute.gatk.utils.contexts.ReferenceContext;
-import org.broadinstitute.gatk.utils.refdata.RefMetaDataTracker;
-import org.broadinstitute.gatk.engine.walkers.RodWalker;
-import org.broadinstitute.gatk.engine.SampleUtils;
-import org.broadinstitute.gatk.utils.help.HelpConstants;
-import org.broadinstitute.gatk.engine.GATKVCFUtils;
-import org.broadinstitute.gatk.utils.variant.GATKVCFConstants;
-import org.broadinstitute.gatk.utils.variant.GATKVCFHeaderLines;
-import org.broadinstitute.gatk.utils.variant.GATKVariantContextUtils;
-import htsjdk.variant.variantcontext.writer.Options;
-import org.broadinstitute.gatk.utils.exceptions.UserException;
-import org.broadinstitute.gatk.utils.help.DocumentedGATKFeature;
-import htsjdk.variant.variantcontext.VariantContext;
-import htsjdk.variant.variantcontext.VariantContextBuilder;
-import htsjdk.variant.variantcontext.writer.VariantContextWriter;
-import htsjdk.variant.variantcontext.writer.VariantContextWriterFactory;
-
-import java.io.File;
-import java.util.*;
-
-/**
- * Lifts a VCF file over from one build to another
- *
- *
"Lifting over" variants means adjusting variant calls from one reference to another. Specifically, the process
- * adjusts the position of the call to match the corresponding position on the target reference. For example, if you
- * have variants called from reads aligned to the hg19 reference, and you want to compare them to calls made based on
- * the b37 reference, you need to liftover one of the callsets to the other reference.
- *
- *
LiftoverVariants is intended to be the first of two processing steps for the liftover process.
- * The second step is to run FilterLiftedVariants on the output of LiftoverVariants. This will produce valid
- * well-behaved VCF files, where you'll see that the contig names in the header have all been correctly replaced.
- *
- *
Caveat
- *
To be clear, the VCF resulting from the LiftoverVariants run is not guaranteed to be valid according to the official specification. The file could
- * possibly be mis-sorted and the header may not be complete. That is why you need to run FilterLiftedVariants on it.
- *
- *
Input
- *
- * A variant call set to lift over, the sequence dictionary of the new reference build and the appropriate liftover
- * chain file.
- *
- *
- */
-@DocumentedGATKFeature( groupName = HelpConstants.DOCS_CAT_VARMANIP, extraDocs = {CommandLineGATK.class} )
-public class LiftoverVariants extends RodWalker {
-
- @ArgumentCollection
- protected StandardVariantContextInputArgumentCollection variantCollection = new StandardVariantContextInputArgumentCollection();
-
- @Output(doc="File to which variants should be written", required=true, defaultToStdout=false)
- protected File file = null;
- protected VariantContextWriter writer = null;
-
- @Argument(fullName="chain", shortName="chain", doc="Chain file", required=true)
- protected File CHAIN = null;
-
- @Argument(fullName="newSequenceDictionary", shortName="dict", doc="Sequence .dict file for the new build", required=true)
- protected File NEW_SEQ_DICT = null;
-
- @Argument(fullName="recordOriginalLocation", shortName="recordOriginalLocation", doc="Should we record what the original location was in the INFO field?", required=false)
- protected Boolean RECORD_ORIGINAL_LOCATION = false;
-
- private LiftOver liftOver;
-
- private long successfulIntervals = 0, failedIntervals = 0;
-
- public void initialize() {
- try {
- liftOver = new LiftOver(CHAIN);
- } catch (RuntimeException e) {
- throw new UserException.BadInput("there is a problem with the chain file you are using: " + e.getMessage());
- }
-
- liftOver.setLiftOverMinMatch(LiftOver.DEFAULT_LIFTOVER_MINMATCH);
-
- try {
- final SAMFileHeader toHeader = new SAMFileReader(NEW_SEQ_DICT).getFileHeader();
- liftOver.validateToSequences(toHeader.getSequenceDictionary());
- } catch (RuntimeException e) {
- throw new UserException.BadInput("the chain file you are using is not compatible with the reference you are trying to lift over to; please use the appropriate chain file for the given reference");
- }
-
- String trackName = variantCollection.variants.getName();
- Set samples = SampleUtils.getSampleListWithVCFHeader(getToolkit(), Arrays.asList(trackName));
- Map vcfHeaders = GATKVCFUtils.getVCFHeadersFromRods(getToolkit(), Arrays.asList(trackName));
-
- Set metaData = new HashSet<>();
- if ( vcfHeaders.containsKey(trackName) )
- metaData.addAll(vcfHeaders.get(trackName).getMetaDataInSortedOrder());
- if ( RECORD_ORIGINAL_LOCATION ) {
- metaData.add(GATKVCFHeaderLines.getInfoLine(GATKVCFConstants.ORIGINAL_CONTIG_KEY));
- metaData.add(GATKVCFHeaderLines.getInfoLine(GATKVCFConstants.ORIGINAL_START_KEY));
- }
-
-
- final VCFHeader vcfHeader = new VCFHeader(metaData, samples);
- writer = VariantContextWriterFactory.create(file, getMasterSequenceDictionary(), EnumSet.of(Options.ALLOW_MISSING_FIELDS_IN_HEADER));
- writer.writeHeader(vcfHeader);
- }
-
- private void convertAndWrite(VariantContext vc, ReferenceContext ref) {
-
- final Interval fromInterval = new Interval(vc.getChr(), vc.getStart(), vc.getStart(), false, String.format("%s:%d", vc.getChr(), vc.getStart()));
- final int length = vc.getEnd() - vc.getStart();
- final Interval toInterval = liftOver.liftOver(fromInterval);
- VariantContext originalVC = vc;
-
- if ( toInterval != null ) {
- // check whether the strand flips, and if so reverse complement everything
- if ( fromInterval.isPositiveStrand() != toInterval.isPositiveStrand() && vc.isPointEvent() ) {
- vc = GATKVariantContextUtils.reverseComplement(vc);
- }
-
- vc = new VariantContextBuilder(vc).loc(toInterval.getSequence(), toInterval.getStart(), toInterval.getStart() + length).make();
-
- if ( RECORD_ORIGINAL_LOCATION ) {
- vc = new VariantContextBuilder(vc)
- .attribute(GATKVCFConstants.ORIGINAL_CONTIG_KEY, fromInterval.getSequence())
- .attribute(GATKVCFConstants.ORIGINAL_START_KEY, fromInterval.getStart()).make();
- }
-
- if ( originalVC.isSNP() && originalVC.isBiallelic() && GATKVariantContextUtils.getSNPSubstitutionType(originalVC) != GATKVariantContextUtils.getSNPSubstitutionType(vc) ) {
- logger.warn(String.format("VCF at %s / %d => %s / %d is switching substitution type %s/%s to %s/%s",
- originalVC.getChr(), originalVC.getStart(), vc.getChr(), vc.getStart(),
- originalVC.getReference(), originalVC.getAlternateAllele(0), vc.getReference(), vc.getAlternateAllele(0)));
- }
-
- writer.add(vc);
- successfulIntervals++;
- } else {
- failedIntervals++;
- }
- }
-
- public Integer map(RefMetaDataTracker tracker, ReferenceContext ref, AlignmentContext context) {
- if ( tracker == null )
- return 0;
-
- Collection VCs = tracker.getValues(variantCollection.variants, context.getLocation());
- for ( VariantContext vc : VCs )
- convertAndWrite(vc, ref);
-
- return 0;
- }
-
- public Integer reduceInit() { return 0; }
-
- public Integer reduce(Integer value, Integer sum) { return 0; }
-
- public void onTraversalDone(Integer result) {
- System.out.println("Converted " + successfulIntervals + " records; failed to convert " + failedIntervals + " records.");
- writer.close();
- }
-}
diff --git a/public/gatk-tools-public/src/main/java/org/broadinstitute/gatk/tools/walkers/variantutils/VariantValidationAssessor.java b/public/gatk-tools-public/src/main/java/org/broadinstitute/gatk/tools/walkers/variantutils/VariantValidationAssessor.java
deleted file mode 100644
index fd3aef8e9..000000000
--- a/public/gatk-tools-public/src/main/java/org/broadinstitute/gatk/tools/walkers/variantutils/VariantValidationAssessor.java
+++ /dev/null
@@ -1,304 +0,0 @@
-/*
-* Copyright 2012-2015 Broad Institute, Inc.
-*
-* Permission is hereby granted, free of charge, to any person
-* obtaining a copy of this software and associated documentation
-* files (the "Software"), to deal in the Software without
-* restriction, including without limitation the rights to use,
-* copy, modify, merge, publish, distribute, sublicense, and/or sell
-* copies of the Software, and to permit persons to whom the
-* Software is furnished to do so, subject to the following
-* conditions:
-*
-* The above copyright notice and this permission notice shall be
-* included in all copies or substantial portions of the Software.
-*
-* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
-* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
-* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
-* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
-* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
-* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
-* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR
-* THE USE OR OTHER DEALINGS IN THE SOFTWARE.
-*/
-
-package org.broadinstitute.gatk.tools.walkers.variantutils;
-
-import org.broadinstitute.gatk.engine.walkers.Reference;
-import org.broadinstitute.gatk.engine.walkers.RodWalker;
-import org.broadinstitute.gatk.engine.walkers.Window;
-import org.broadinstitute.gatk.utils.commandline.*;
-import org.broadinstitute.gatk.engine.CommandLineGATK;
-import org.broadinstitute.gatk.engine.arguments.StandardVariantContextInputArgumentCollection;
-import org.broadinstitute.gatk.utils.contexts.AlignmentContext;
-import org.broadinstitute.gatk.utils.contexts.ReferenceContext;
-import org.broadinstitute.gatk.utils.refdata.RefMetaDataTracker;
-import org.broadinstitute.gatk.utils.QualityUtils;
-import org.broadinstitute.gatk.engine.SampleUtils;
-import org.broadinstitute.gatk.utils.help.HelpConstants;
-import org.broadinstitute.gatk.engine.GATKVCFUtils;
-import org.broadinstitute.gatk.utils.variant.GATKVariantContextUtils;
-import htsjdk.variant.vcf.*;
-import org.broadinstitute.gatk.utils.help.DocumentedGATKFeature;
-import htsjdk.variant.variantcontext.writer.VariantContextWriter;
-import htsjdk.variant.variantcontext.Allele;
-import htsjdk.variant.variantcontext.VariantContext;
-import htsjdk.variant.variantcontext.VariantContextBuilder;
-
-import java.util.*;
-
-/**
- * Annotate a validation VCF with QC metrics
- *
- *
- * This tool is intended for vetting/assessing validation data (containing genotypes).
- * The tool produces a VCF that is annotated with information pertaining to plate quality control and by
- * default is soft-filtered by high no-call rate or low Hardy-Weinberg probability.
- * If you have .ped files, please first convert them to VCF format.
- *
- *
Input
- *
- * A validation VCF to annotate.
- *
- *
- *
Output
- *
- * An annotated VCF. Additionally, a table like the following will be output:
- *
- *
- * Total number of samples assayed: 185
- * Total number of records processed: 152
- * Number of Hardy-Weinberg violations: 34 (22%)
- * Number of no-call violations: 12 (7%)
- * Number of homozygous variant violations: 0 (0%)
- * Number of records passing all filters: 106 (69%)
- * Number of passing records that are polymorphic: 98 (92%)
- *