diff --git a/protected/java/src/org/broadinstitute/sting/gatk/walkers/bqsr/RecalibrationPerformance.java b/protected/java/src/org/broadinstitute/sting/gatk/walkers/bqsr/RecalibrationPerformance.java deleted file mode 100644 index 271617059..000000000 --- a/protected/java/src/org/broadinstitute/sting/gatk/walkers/bqsr/RecalibrationPerformance.java +++ /dev/null @@ -1,141 +0,0 @@ -/* -* By downloading the PROGRAM you agree to the following terms of use: -* -* BROAD INSTITUTE - SOFTWARE LICENSE AGREEMENT - FOR ACADEMIC NON-COMMERCIAL RESEARCH PURPOSES ONLY -* -* This Agreement is made between the Broad Institute, Inc. with a principal address at 7 Cambridge Center, Cambridge, MA 02142 (BROAD) and the LICENSEE and is effective at the date the downloading is completed (EFFECTIVE DATE). -* -* WHEREAS, LICENSEE desires to license the PROGRAM, as defined hereinafter, and BROAD wishes to have this PROGRAM utilized in the public interest, subject only to the royalty-free, nonexclusive, nontransferable license rights of the United States Government pursuant to 48 CFR 52.227-14; and -* WHEREAS, LICENSEE desires to license the PROGRAM and BROAD desires to grant a license on the following terms and conditions. -* NOW, THEREFORE, in consideration of the promises and covenants made herein, the parties hereto agree as follows: -* -* 1. DEFINITIONS -* 1.1 PROGRAM shall mean copyright in the object code and source code known as GATK2 and related documentation, if any, as they exist on the EFFECTIVE DATE and can be downloaded from http://www.broadinstitute/GATK on the EFFECTIVE DATE. -* -* 2. LICENSE -* 2.1 Grant. Subject to the terms of this Agreement, BROAD hereby grants to LICENSEE, solely for academic non-commercial research purposes, a non-exclusive, non-transferable license to: (a) download, execute and display the PROGRAM and (b) create bug fixes and modify the PROGRAM. -* The LICENSEE may apply the PROGRAM in a pipeline to data owned by users other than the LICENSEE and provide these users the results of the PROGRAM provided LICENSEE does so for academic non-commercial purposes only. For clarification purposes, academic sponsored research is not a commercial use under the terms of this Agreement. -* 2.2 No Sublicensing or Additional Rights. LICENSEE shall not sublicense or distribute the PROGRAM, in whole or in part, without prior written permission from BROAD. LICENSEE shall ensure that all of its users agree to the terms of this Agreement. LICENSEE further agrees that it shall not put the PROGRAM on a network, server, or other similar technology that may be accessed by anyone other than the LICENSEE and its employees and users who have agreed to the terms of this agreement. -* 2.3 License Limitations. Nothing in this Agreement shall be construed to confer any rights upon LICENSEE by implication, estoppel, or otherwise to any computer software, trademark, intellectual property, or patent rights of BROAD, or of any other entity, except as expressly granted herein. LICENSEE agrees that the PROGRAM, in whole or part, shall not be used for any commercial purpose, including without limitation, as the basis of a commercial software or hardware product or to provide services. LICENSEE further agrees that the PROGRAM shall not be copied or otherwise adapted in order to circumvent the need for obtaining a license for use of the PROGRAM. -* -* 3. OWNERSHIP OF INTELLECTUAL PROPERTY -* LICENSEE acknowledges that title to the PROGRAM shall remain with BROAD. The PROGRAM is marked with the following BROAD copyright notice and notice of attribution to contributors. LICENSEE shall retain such notice on all copies. LICENSEE agrees to include appropriate attribution if any results obtained from use of the PROGRAM are included in any publication. -* Copyright 2012 Broad Institute, Inc. -* Notice of attribution: The GATK2 program was made available through the generosity of Medical and Population Genetics program at the Broad Institute, Inc. -* LICENSEE shall not use any trademark or trade name of BROAD, or any variation, adaptation, or abbreviation, of such marks or trade names, or any names of officers, faculty, students, employees, or agents of BROAD except as states above for attribution purposes. -* -* 4. INDEMNIFICATION -* LICENSEE shall indemnify, defend, and hold harmless BROAD, and their respective officers, faculty, students, employees, associated investigators and agents, and their respective successors, heirs and assigns, (Indemnitees), against any liability, damage, loss, or expense (including reasonable attorneys fees and expenses) incurred by or imposed upon any of the Indemnitees in connection with any claims, suits, actions, demands or judgments arising out of any theory of liability (including, without limitation, actions in the form of tort, warranty, or strict liability and regardless of whether such action has any factual basis) pursuant to any right or license granted under this Agreement. -* -* 5. NO REPRESENTATIONS OR WARRANTIES -* THE PROGRAM IS DELIVERED AS IS. BROAD MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND CONCERNING THE PROGRAM OR THE COPYRIGHT, EXPRESS OR IMPLIED, INCLUDING, WITHOUT LIMITATION, WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, NONINFRINGEMENT, OR THE ABSENCE OF LATENT OR OTHER DEFECTS, WHETHER OR NOT DISCOVERABLE. BROAD EXTENDS NO WARRANTIES OF ANY KIND AS TO PROGRAM CONFORMITY WITH WHATEVER USER MANUALS OR OTHER LITERATURE MAY BE ISSUED FROM TIME TO TIME. -* IN NO EVENT SHALL BROAD OR ITS RESPECTIVE DIRECTORS, OFFICERS, EMPLOYEES, AFFILIATED INVESTIGATORS AND AFFILIATES BE LIABLE FOR INCIDENTAL OR CONSEQUENTIAL DAMAGES OF ANY KIND, INCLUDING, WITHOUT LIMITATION, ECONOMIC DAMAGES OR INJURY TO PROPERTY AND LOST PROFITS, REGARDLESS OF WHETHER BROAD SHALL BE ADVISED, SHALL HAVE OTHER REASON TO KNOW, OR IN FACT SHALL KNOW OF THE POSSIBILITY OF THE FOREGOING. -* -* 6. ASSIGNMENT -* This Agreement is personal to LICENSEE and any rights or obligations assigned by LICENSEE without the prior written consent of BROAD shall be null and void. -* -* 7. MISCELLANEOUS -* 7.1 Export Control. LICENSEE gives assurance that it will comply with all United States export control laws and regulations controlling the export of the PROGRAM, including, without limitation, all Export Administration Regulations of the United States Department of Commerce. Among other things, these laws and regulations prohibit, or require a license for, the export of certain types of software to specified countries. -* 7.2 Termination. LICENSEE shall have the right to terminate this Agreement for any reason upon prior written notice to BROAD. If LICENSEE breaches any provision hereunder, and fails to cure such breach within thirty (30) days, BROAD may terminate this Agreement immediately. Upon termination, LICENSEE shall provide BROAD with written assurance that the original and all copies of the PROGRAM have been destroyed, except that, upon prior written authorization from BROAD, LICENSEE may retain a copy for archive purposes. -* 7.3 Survival. The following provisions shall survive the expiration or termination of this Agreement: Articles 1, 3, 4, 5 and Sections 2.2, 2.3, 7.3, and 7.4. -* 7.4 Notice. Any notices under this Agreement shall be in writing, shall specifically refer to this Agreement, and shall be sent by hand, recognized national overnight courier, confirmed facsimile transmission, confirmed electronic mail, or registered or certified mail, postage prepaid, return receipt requested. All notices under this Agreement shall be deemed effective upon receipt. -* 7.5 Amendment and Waiver; Entire Agreement. This Agreement may be amended, supplemented, or otherwise modified only by means of a written instrument signed by all parties. Any waiver of any rights or failure to act in a specific instance shall relate only to such instance and shall not be construed as an agreement to waive any rights or fail to act in any other instance, whether or not similar. This Agreement constitutes the entire agreement among the parties with respect to its subject matter and supersedes prior agreements or understandings between the parties relating to its subject matter. -* 7.6 Binding Effect; Headings. This Agreement shall be binding upon and inure to the benefit of the parties and their respective permitted successors and assigns. All headings are for convenience only and shall not affect the meaning of any provision of this Agreement. -* 7.7 Governing Law. This Agreement shall be construed, governed, interpreted and applied in accordance with the internal laws of the Commonwealth of Massachusetts, U.S.A., without regard to conflict of laws principles. -*/ - -package org.broadinstitute.sting.gatk.walkers.bqsr; - -import org.broadinstitute.sting.commandline.*; -import org.broadinstitute.sting.gatk.CommandLineGATK; -import org.broadinstitute.sting.gatk.contexts.AlignmentContext; -import org.broadinstitute.sting.gatk.contexts.ReferenceContext; -import org.broadinstitute.sting.gatk.filters.*; -import org.broadinstitute.sting.gatk.refdata.RefMetaDataTracker; -import org.broadinstitute.sting.gatk.report.GATKReport; -import org.broadinstitute.sting.gatk.report.GATKReportTable; -import org.broadinstitute.sting.gatk.walkers.*; -import org.broadinstitute.sting.utils.exceptions.ReviewedStingException; -import org.broadinstitute.sting.utils.help.DocumentedGATKFeature; -import org.broadinstitute.sting.utils.help.HelpConstants; -import org.broadinstitute.sting.utils.recalibration.*; - -import java.io.*; - -/** - * Evaluate the performance of the base recalibration process - * - *

This tool aims to evaluate the results of the Base Quality Score Recalibration (BQSR) process.

- * - *

Caveat

- *

This tool is currently experimental. We do not provide documentation nor support for its operation.

- * - */ -@DocumentedGATKFeature( groupName = HelpConstants.DOCS_CAT_QC, extraDocs = {CommandLineGATK.class} ) -@ReadFilters({MappingQualityZeroFilter.class, MappingQualityUnavailableFilter.class, UnmappedReadFilter.class, NotPrimaryAlignmentFilter.class, DuplicateReadFilter.class, FailsVendorQualityCheckFilter.class}) -@PartitionBy(PartitionType.READ) -public class RecalibrationPerformance extends RodWalker implements NanoSchedulable { - - @Output - public PrintStream out; - - @Input(fullName="recal", shortName="recal", required=false, doc="The input covariates table file") - public File RECAL_FILE = null; - - public void initialize() { - out.println("Cycle\tQrep\tQemp\tIsJoint\tObservations\tErrors"); - - final GATKReport report = new GATKReport(RECAL_FILE); - final GATKReportTable table = report.getTable(RecalUtils.ALL_COVARIATES_REPORT_TABLE_TITLE); - for ( int row = 0; row < table.getNumRows(); row++ ) { - - final int nObservations = (int)asDouble(table.get(row, RecalUtils.NUMBER_OBSERVATIONS_COLUMN_NAME)); - final int nErrors = (int)Math.round(asDouble(table.get(row, RecalUtils.NUMBER_ERRORS_COLUMN_NAME))); - final double empiricalQuality = asDouble(table.get(row, RecalUtils.EMPIRICAL_QUALITY_COLUMN_NAME)); - - final byte QReported = Byte.parseByte((String) table.get(row, RecalUtils.QUALITY_SCORE_COLUMN_NAME)); - - final double jointEstimateQemp = RecalDatum.bayesianEstimateOfEmpiricalQuality(nObservations, nErrors, QReported); - - //if ( Math.abs((int)(jointEstimateQemp - empiricalQuality)) > 1 ) - // System.out.println(String.format("Qreported = %f, nObservations = %f, nErrors = %f, point Qemp = %f, joint Qemp = %f", estimatedQReported, nObservations, nErrors, empiricalQuality, jointEstimateQemp)); - - if ( table.get(row, RecalUtils.COVARIATE_NAME_COLUMN_NAME).equals("Cycle") && - table.get(row, RecalUtils.EVENT_TYPE_COLUMN_NAME).equals("M") && - table.get(row, RecalUtils.READGROUP_COLUMN_NAME).equals("20FUKAAXX100202.6") && - (QReported == 6 || QReported == 10 || QReported == 20 || QReported == 30 || QReported == 45) ) { - out.println(String.format("%s\t%d\t%d\t%s\t%d\t%d", table.get(row, RecalUtils.COVARIATE_VALUE_COLUMN_NAME), QReported, Math.round(empiricalQuality), "False", (int)nObservations, (int)nErrors)); - out.println(String.format("%s\t%d\t%d\t%s\t%d\t%d", table.get(row, RecalUtils.COVARIATE_VALUE_COLUMN_NAME), QReported, (int)jointEstimateQemp, "True", (int)nObservations, (int)nErrors)); - } - } - - } - - @Override - public boolean isDone() { - return true; - } - - private double asDouble(final Object o) { - if ( o instanceof Double ) - return (Double)o; - else if ( o instanceof Integer ) - return (Integer)o; - else if ( o instanceof Long ) - return (Long)o; - else - throw new ReviewedStingException("Object " + o + " is expected to be either a double, long or integer but its not either: " + o.getClass()); - } - - @Override - public Integer map(RefMetaDataTracker tracker, ReferenceContext ref, AlignmentContext context) { return 0; } - - @Override - public Integer reduceInit() { return 0; } - - @Override - public Integer reduce(Integer counter, Integer sum) { return 0; } - - @Override - public void onTraversalDone(Integer sum) {} -} \ No newline at end of file diff --git a/protected/java/src/org/broadinstitute/sting/gatk/walkers/haplotypecaller/HaplotypeCaller.java b/protected/java/src/org/broadinstitute/sting/gatk/walkers/haplotypecaller/HaplotypeCaller.java index 82015d153..0bedf9062 100644 --- a/protected/java/src/org/broadinstitute/sting/gatk/walkers/haplotypecaller/HaplotypeCaller.java +++ b/protected/java/src/org/broadinstitute/sting/gatk/walkers/haplotypecaller/HaplotypeCaller.java @@ -280,8 +280,14 @@ public class HaplotypeCaller extends ActiveRegionWalker, In // general advanced arguments to control haplotype caller behavior // ----------------------------------------------------------------------------------------------- + /** + * The reference confidence mode makes it possible to emit a per-bp or summarized confidence estimate for a site being strictly homozygous-reference. + * See http://www.broadinstitute.org/gatk/guide/article?id=2940 for more details of how this works. + * Note that if you set -ERC GVCF, you also need to set -variant_index_type LINEAR and -variant_index_parameter 128000 (with those exact values!). + * This requirement is a temporary workaround for an issue with index compression. + */ @Advanced - @Argument(fullName="emitRefConfidence", shortName="ERC", doc="Emit experimental reference confidence scores", required = false) + @Argument(fullName="emitRefConfidence", shortName="ERC", doc="Mode for emitting experimental reference confidence scores", required = false) protected ReferenceConfidenceMode emitReferenceConfidence = ReferenceConfidenceMode.NONE; public enum ReferenceConfidenceMode { diff --git a/protected/java/src/org/broadinstitute/sting/gatk/walkers/variantrecalibration/VariantRecalibrator.java b/protected/java/src/org/broadinstitute/sting/gatk/walkers/variantrecalibration/VariantRecalibrator.java index d43dc4a12..c5e2b8183 100644 --- a/protected/java/src/org/broadinstitute/sting/gatk/walkers/variantrecalibration/VariantRecalibrator.java +++ b/protected/java/src/org/broadinstitute/sting/gatk/walkers/variantrecalibration/VariantRecalibrator.java @@ -165,10 +165,10 @@ public class VariantRecalibrator extends RodWalker> resource = Collections.emptyList(); diff --git a/protected/java/src/org/broadinstitute/sting/gatk/walkers/variantrecalibration/VariantRecalibratorArgumentCollection.java b/protected/java/src/org/broadinstitute/sting/gatk/walkers/variantrecalibration/VariantRecalibratorArgumentCollection.java index b501655f8..81067e695 100644 --- a/protected/java/src/org/broadinstitute/sting/gatk/walkers/variantrecalibration/VariantRecalibratorArgumentCollection.java +++ b/protected/java/src/org/broadinstitute/sting/gatk/walkers/variantrecalibration/VariantRecalibratorArgumentCollection.java @@ -48,6 +48,7 @@ package org.broadinstitute.sting.gatk.walkers.variantrecalibration; import org.broadinstitute.sting.commandline.Advanced; import org.broadinstitute.sting.commandline.Argument; +import org.broadinstitute.sting.commandline.Hidden; import org.broadinstitute.sting.utils.exceptions.ReviewedStingException; /** @@ -117,4 +118,19 @@ public class VariantRecalibratorArgumentCollection { @Advanced @Argument(fullName="badLodCutoff", shortName="badLodCutoff", doc="The LOD score below which to be used when building the Gaussian mixture model of bad variants.", required=false) public double BAD_LOD_CUTOFF = -5.0; + + ///////////////////////////// + // Deprecated Arguments + // Keeping them here is meant to provide users with error messages that are more informative than "arg not defined" when they use an argument that has been put out of service + ///////////////////////////// + + @Hidden + @Deprecated + @Argument(fullName="percentBadVariants", shortName="percentBad", doc="This argument is no longer used in GATK versions 2.7 and newer. Please see the online documentation for the latest usage recommendations.", required=false) + public double PERCENT_BAD_VARIANTS = 0.03; + + @Hidden + @Deprecated + @Argument(fullName="numBadVariants", shortName="numBad", doc="This argument is no longer used in GATK versions 2.8 and newer. Please see the online documentation for the latest usage recommendations.", required=false) + public int NUM_BAD_VARIANTS = 1000; } diff --git a/public/java/src/org/broadinstitute/sting/commandline/CommandLineProgram.java b/public/java/src/org/broadinstitute/sting/commandline/CommandLineProgram.java index f00bd0ad6..8c7e11f35 100644 --- a/public/java/src/org/broadinstitute/sting/commandline/CommandLineProgram.java +++ b/public/java/src/org/broadinstitute/sting/commandline/CommandLineProgram.java @@ -43,26 +43,29 @@ public abstract class CommandLineProgram { /** The command-line program and the arguments it returned. */ public ParsingEngine parser = null; - /** the default log level */ - @Argument(fullName = "logging_level", - shortName = "l", - doc = "Set the minimum level of logging, i.e. setting INFO get's you INFO up to FATAL, setting ERROR gets you ERROR and FATAL level logging.", - required = false) + /** + * Setting INFO gets you INFO up to FATAL, setting ERROR gets you ERROR and FATAL level logging, and so on. + */ + @Argument(fullName = "logging_level", shortName = "l", doc = "Set the minimum level of logging", required = false) protected String logging_level = "INFO"; - - /** where to send the output of our logger */ - @Output(fullName = "log_to_file", - shortName = "log", - doc = "Set the logging location", - required = false) + /** + * File to save the logging output. + */ + @Output(fullName = "log_to_file", shortName = "log", doc = "Set the logging location", required = false) protected String toFile = null; - /** this is used to indicate if they've asked for help */ - @Argument(fullName = "help", shortName = "h", doc = "Generate this help message", required = false) + /** + * This will produce a help message in the terminal with general usage information, listing available arguments + * as well as tool-specific information if applicable. + */ + @Argument(fullName = "help", shortName = "h", doc = "Generate the help message", required = false) public Boolean help = false; - /** This is used to indicate if they've asked for the version information */ + /** + * Use this to check the version number of the GATK executable you are invoking. Note that the version number is + * always included in the output at the start of every run as well as any error message. + */ @Argument(fullName = "version", shortName = "version", doc ="Output version information", required = false) public Boolean version = false; diff --git a/public/java/src/org/broadinstitute/sting/commandline/IntervalArgumentCollection.java b/public/java/src/org/broadinstitute/sting/commandline/IntervalArgumentCollection.java index b491c9f3d..d2a1735fb 100644 --- a/public/java/src/org/broadinstitute/sting/commandline/IntervalArgumentCollection.java +++ b/public/java/src/org/broadinstitute/sting/commandline/IntervalArgumentCollection.java @@ -33,38 +33,53 @@ import java.util.List; public class IntervalArgumentCollection { /** - * Using this option one can instruct the GATK engine to traverse over only part of the genome. This argument can be specified multiple times. - * One may use samtools-style intervals either explicitly (e.g. -L chr1 or -L chr1:100-200) or listed in a file (e.g. -L myFile.intervals). - * Additionally, one may specify a rod file to traverse over the positions for which there is a record in the file (e.g. -L file.vcf). - * To specify the completely unmapped reads in the BAM file (i.e. those without a reference contig) use -L unmapped. + * Use this option to perform the analysis over only part of the genome. This argument can be specified multiple times. + * You can use samtools-style intervals either explicitly on the command line (e.g. -L chr1 or -L chr1:100-200) or + * by loading in a file containing a list of intervals (e.g. -L myFile.intervals). + * + * Additionally, you can also specify a ROD file (such as a VCF file) in order to perform the analysis at specific + * positions based on the records present in the file (e.g. -L file.vcf). + * + * Finally, you can also use this to perform the analysis on the reads that are completely unmapped in the BAM file + * (i.e. those without a reference contig) by specifying -L unmapped. */ - @Input(fullName = "intervals", shortName = "L", doc = "One or more genomic intervals over which to operate. Can be explicitly specified on the command line or in a file (including a rod file)", required = false) + @Input(fullName = "intervals", shortName = "L", doc = "One or more genomic intervals over which to operate", required = false) public List> intervals = null; /** - * Using this option one can instruct the GATK engine NOT to traverse over certain parts of the genome. This argument can be specified multiple times. - * One may use samtools-style intervals either explicitly (e.g. -XL chr1 or -XL chr1:100-200) or listed in a file (e.g. -XL myFile.intervals). - * Additionally, one may specify a rod file to skip over the positions for which there is a record in the file (e.g. -XL file.vcf). - */ - @Input(fullName = "excludeIntervals", shortName = "XL", doc = "One or more genomic intervals to exclude from processing. Can be explicitly specified on the command line or in a file (including a rod file)", required = false) + * Use this option to exclude certain parts of the genome from the analysis (like -L, but the opposite). + * This argument can be specified multiple times. You can use samtools-style intervals either explicitly on the + * command line (e.g. -XL chr1 or -XL chr1:100-200) or by loading in a file containing a list of intervals + * (e.g. -XL myFile.intervals). + * + * Additionally, you can also specify a ROD file (such as a VCF file) in order to exclude specific + * positions from the analysis based on the records present in the file (e.g. -L file.vcf). + * */ + @Input(fullName = "excludeIntervals", shortName = "XL", doc = "One or more genomic intervals to exclude from processing", required = false) public List> excludeIntervals = null; /** - * How should the intervals specified by multiple -L or -XL arguments be combined? Using this argument one can, for example, traverse over all of the positions - * for which there is a record in a VCF but just in chromosome 20 (-L chr20 -L file.vcf -isr INTERSECTION). + * By default, the program will take the UNION of all intervals specified using -L and/or -XL. However, you can + * change this setting, for example if you want to take the INTERSECTION of the sets instead. E.g. to perform the + * analysis on positions for which there is a record in a VCF, but restrict this to just those on chromosome 20, + * you would do -L chr20 -L file.vcf -isr INTERSECTION. */ - @Argument(fullName = "interval_set_rule", shortName = "isr", doc = "Indicates the set merging approach the interval parser should use to combine the various -L or -XL inputs", required = false) + @Argument(fullName = "interval_set_rule", shortName = "isr", doc = "Set merging approach to use for combining interval inputs", required = false) public IntervalSetRule intervalSetRule = IntervalSetRule.UNION; /** - * Should abutting (but not overlapping) intervals be treated as separate intervals? + * By default, the program merges abutting intervals (i.e. intervals that are directly side-by-side but do not + * actually overlap) into a single continuous interval. However you can change this behavior if you want them to be + * treated as separate intervals instead. */ - @Argument(fullName = "interval_merging", shortName = "im", doc = "Indicates the interval merging rule we should use for abutting intervals", required = false) + @Argument(fullName = "interval_merging", shortName = "im", doc = "Interval merging rule for abutting intervals", required = false) public IntervalMergingRule intervalMerging = IntervalMergingRule.ALL; /** - * For example, '-L chr1:100' with a padding value of 20 would turn into '-L chr1:80-120'. + * Use this to add padding to the intervals specified using -L and/or -XL. For example, '-L chr1:100' with a + * padding value of 20 would turn into '-L chr1:80-120'. This is typically used to add padding around exons when + * analyzing exomes. The general Broad exome calling pipeline uses 100 bp padding by default. */ - @Argument(fullName = "interval_padding", shortName = "ip", doc = "Indicates how many basepairs of padding to include around each of the intervals specified with the -L/--intervals argument", required = false) + @Argument(fullName = "interval_padding", shortName = "ip", doc = "Amount of padding (in bp) to add to each interval", required = false, minValue = 0) public int intervalPadding = 0; } diff --git a/public/java/src/org/broadinstitute/sting/gatk/CommandLineGATK.java b/public/java/src/org/broadinstitute/sting/gatk/CommandLineGATK.java index 5fc0ccd3e..728fee5c8 100644 --- a/public/java/src/org/broadinstitute/sting/gatk/CommandLineGATK.java +++ b/public/java/src/org/broadinstitute/sting/gatk/CommandLineGATK.java @@ -44,15 +44,31 @@ import java.util.*; /** * All command line parameters accepted by all tools in the GATK. * - * The GATK engine itself. Manages map/reduce data access and runs walkers. + *

Info for general users

* - * We run command line GATK programs using this class. It gets the command line args, parses them, and hands the - * gatk all the parsed out information. Pretty much anything dealing with the underlying system should go here, - * the gatk engine should deal with any data related information. + *

This is a list of options and parameters that are generally available to all tools in the GATK.

+ * + *

There may be a few restrictions, which are indicated in individual argument descriptions. For example the -BQSR + * argument is only meant to be used with a subset of tools, and the -pedigree argument will only be effectively used + * by a subset of tools as well. Some arguments conflict with others, and some conversely are dependent on others. This + * is all indicated in the detailed argument descriptions, so be sure to read those in their entirety rather than just + * skimming the one-line summaey in the table.

+ * + *

Info for developers

+ * + *

This class is the GATK engine itself, which manages map/reduce data access and runs walkers.

+ * + *

We run command line GATK programs using this class. It gets the command line args, parses them, and hands the + * gatk all the parsed out information. Pretty much anything dealing with the underlying system should go here; + * the GATK engine should deal with any data related information.

*/ @DocumentedGATKFeature(groupName = HelpConstants.DOCS_CAT_ENGINE) public class CommandLineGATK extends CommandLineExecutable { - @Argument(fullName = "analysis_type", shortName = "T", doc = "Type of analysis to run") + /** + * A complete list of tools (sometimes also called walkers because they "walk" through the data to perform analyses) + * is available in the online documentation. + */ + @Argument(fullName = "analysis_type", shortName = "T", doc = "Name of the tool to run") private String analysisName = null; // our argument collection, the collection of command line args we accept diff --git a/public/java/src/org/broadinstitute/sting/gatk/arguments/GATKArgumentCollection.java b/public/java/src/org/broadinstitute/sting/gatk/arguments/GATKArgumentCollection.java index 08f892f97..2bbc5482b 100644 --- a/public/java/src/org/broadinstitute/sting/gatk/arguments/GATKArgumentCollection.java +++ b/public/java/src/org/broadinstitute/sting/gatk/arguments/GATKArgumentCollection.java @@ -50,19 +50,20 @@ import java.util.concurrent.TimeUnit; */ public class GATKArgumentCollection { - /* our version number */ - private float versionNumber = 1; - private String description = "GATK Arguments"; - /** the constructor */ public GATKArgumentCollection() { } // parameters and their defaults - @Input(fullName = "input_file", shortName = "I", doc = "SAM or BAM file(s)", required = false) + /** + * An input file containing sequence data mapped to a reference, in SAM or BAM format, or a text file containing a + * list of input files (with extension .list). Note that the GATK requires an accompanying index for each SAM or + * BAM file. Please see our online documentation for more details on input formatting requirements. + */ + @Input(fullName = "input_file", shortName = "I", doc = "Input file containing sequence data (SAM or BAM)", required = false) public List samFiles = new ArrayList(); - @Argument(fullName = "read_buffer_size", shortName = "rbs", doc="Number of reads per SAM file to buffer in memory", required = false) + @Argument(fullName = "read_buffer_size", shortName = "rbs", doc="Number of reads per SAM file to buffer in memory", required = false, minValue = 0) public Integer readBufferSize = null; // -------------------------------------------------------------------------------------------------------------- @@ -71,21 +72,30 @@ public class GATKArgumentCollection { // // -------------------------------------------------------------------------------------------------------------- - @Argument(fullName = "phone_home", shortName = "et", doc="What kind of GATK run report should we generate? AWS is the default, can be NO_ET so nothing is posted to the run repository. Please see " + UserException.PHONE_HOME_DOCS_URL + " for details.", required = false) + /** + * By default, GATK generates a run report that is uploaded to a cloud-based service. This report contains basic + * non-identifying statistics (which tool was used, whether the run was successful etc.) that help us for debugging + * and development. You can use this option to turn off reporting if your run environment is not connected to the + * internet or if your data is subject to stringent confidentiality clauses (e.g. clinical patient data). + * To do so you will need to request a key using the online request form on our website. + */ + @Argument(fullName = "phone_home", shortName = "et", doc="Run reporting mode", required = false) public GATKRunReport.PhoneHomeOption phoneHomeType = GATKRunReport.PhoneHomeOption.AWS; - - @Argument(fullName = "gatk_key", shortName = "K", doc="GATK Key file. Required if running with -et NO_ET. Please see " + UserException.PHONE_HOME_DOCS_URL + " for details.", required = false) + /** + * Please see the online documentation FAQs for more details on the key system and how to request a key. + */ + @Argument(fullName = "gatk_key", shortName = "K", doc="GATK key file required to run with -et NO_ET", required = false) public File gatkKeyFile = null; /** - * The GATKRunReport supports (as of GATK 2.2) tagging GATK runs with an arbitrary String tag that can be + * The GATKRunReport supports (as of GATK 2.2) tagging GATK runs with an arbitrary tag that can be * used to group together runs during later analysis. One use of this capability is to tag runs as GATK * performance tests, so that the performance of the GATK over time can be assessed from the logs directly. * * Note that the tags do not conform to any ontology, so you are free to use any tags that you might find * meaningful. */ - @Argument(fullName = "tag", shortName = "tag", doc="Arbitrary tag string to identify this GATK run as part of a group of runs, for later analysis", required = false) + @Argument(fullName = "tag", shortName = "tag", doc="Tag to identify this GATK run as part of a group of runs", required = false) public String tag = "NA"; // -------------------------------------------------------------------------------------------------------------- @@ -94,26 +104,48 @@ public class GATKArgumentCollection { // // -------------------------------------------------------------------------------------------------------------- - @Argument(fullName = "read_filter", shortName = "rf", doc = "Specify filtration criteria to apply to each read individually", required = false) - public List readFilters = new ArrayList(); + /** + * Reads that fail the specified filters will not be used in the analysis. Multiple filters can be specified separately, + * e.g. you can do -rf MalformedRead -rf BadCigar and so on. Available read filters are listed in the online tool + * documentation. Note that the read name format is e.g. MalformedReadFilter, but at the command line the filter + * name should be given without the Filter suffix; e.g. -rf MalformedRead (NOT -rf MalformedReadFilter, which is not + * recognized by the program). Note also that some read filters are applied by default for some analysis tools; this + * is specified in each tool's documentation. The default filters cannot be disabled. + */ + @Argument(fullName = "read_filter", shortName = "rf", doc = "Filters to apply to reads before analysis", required = false) + public final List readFilters = new ArrayList(); @ArgumentCollection public IntervalArgumentCollection intervalArguments = new IntervalArgumentCollection(); - + /** + * The reference genome against which the sequence data was mapped. The GATK requires an index file and a dictionary + * file accompanying the reference (please see the online documentation FAQs for more details on these files). Although + * this argument is indicated as being optional, almost all GATK tools require a reference in order to run. + * Note also that while GATK can in theory process genomes from any organism with any number of chromosomes or contigs, + * it is not designed to process draft genome assemblies and performance will decrease as the number of contigs in + * the reference increases. We strongly discourage the use of unfinished genome assemblies containing more than a few + * hundred contigs. Contig numbers in the thousands will most probably cause memory-related crashes. + */ @Input(fullName = "reference_sequence", shortName = "R", doc = "Reference sequence file", required = false) public File referenceFile = null; - - @Argument(fullName = "nonDeterministicRandomSeed", shortName = "ndrs", doc = "Makes the GATK behave non deterministically, that is, the random numbers generated will be different in every run", required = false) + /** + * If this flag is enabled, the random numbers generated will be different in every run, causing GATK to behave non-deterministically. + */ + @Argument(fullName = "nonDeterministicRandomSeed", shortName = "ndrs", doc = "Use a non-deterministic random seed", required = false) public boolean nonDeterministicRandomSeed = false; - + /** + * To be used in the testing framework where dynamic parallelism can result in differing numbers of calls to the random generator. + */ @Hidden - @Argument(fullName = "disableDithering",doc="Completely eliminates randomized dithering from rank sum tests. To be used in the testing framework where dynamic parallelism can result in differing numbers of calls to the random generator.") + @Argument(fullName = "disableDithering",doc="Completely eliminates randomized dithering from rank sum tests.") public boolean disableDithering = false; - - @Argument(fullName = "maxRuntime", shortName = "maxRuntime", doc="If provided, that GATK will stop execution cleanly as soon after maxRuntime has been exceeded, truncating the run but not exiting with a failure. By default the value is interpreted in minutes, but this can be changed by maxRuntimeUnits", required = false) + /** + * This will truncate the run but without exiting with a failure. By default the value is interpreted in minutes, but this can be changed with the maxRuntimeUnits argument. + */ + @Argument(fullName = "maxRuntime", shortName = "maxRuntime", doc="Stop execution cleanly as soon as maxRuntime has been reached", required = false) public long maxRuntime = GenomeAnalysisEngine.NO_RUNTIME_LIMIT; - @Argument(fullName = "maxRuntimeUnits", shortName = "maxRuntimeUnits", doc="The TimeUnit for maxRuntime", required = false) + @Argument(fullName = "maxRuntimeUnits", shortName = "maxRuntimeUnits", doc="Unit of time used by maxRuntime", required = false) public TimeUnit maxRuntimeUnits = TimeUnit.MINUTES; // -------------------------------------------------------------------------------------------------------------- @@ -122,32 +154,47 @@ public class GATKArgumentCollection { // // -------------------------------------------------------------------------------------------------------------- /** - * Reads will be selected randomly to be removed from the pile based on the method described here. + * There are several ways to downsample reads, i.e. to removed reads from the pile of reads that will be used for analysis. + * See the documentation of the individual downsampling options for details on how they work. Note that Many GATK tools + * specify a default downsampling type and target, but this behavior can be overridden from command line using the + * downsampling arguments. */ - @Argument(fullName = "downsampling_type", shortName="dt", doc="Type of reads downsampling to employ at a given locus", required = false) + @Argument(fullName = "downsampling_type", shortName="dt", doc="Type of read downsampling to employ at a given locus", required = false) public DownsampleType downsamplingType = null; - - @Argument(fullName = "downsample_to_fraction", shortName = "dfrac", doc = "Fraction [0.0-1.0] of reads to downsample to", required = false) + /** + * Reads will be downsampled so the specified fraction remains; e.g. if you specify -dfrac 0.25, three-quarters of + * the reads will be removed, and the remaining one quarter will be used in the analysis. This method of downsampling + * is truly unbiased and random. It is typically used to simulate the effect of generating different amounts of + * sequence data for a given sample. For example, you can use this in a pilot experiment to evaluate how much target + * coverage you need to aim for in order to obtain enough coverage in all loci of interest. + */ + @Argument(fullName = "downsample_to_fraction", shortName = "dfrac", doc = "Fraction of reads to downsample to", required = false, minValue = 0.0, maxValue = 1.0) public Double downsampleFraction = null; /** - * For locus-based traversals (LocusWalkers and ActiveRegionWalkers), downsample_to_coverage controls the - * maximum depth of coverage at each locus. For read-based traversals (ReadWalkers), it controls the - * maximum number of reads sharing the same alignment start position. For ReadWalkers you will typically need to use - * much lower dcov values than you would with LocusWalkers to see an effect. Note that this downsampling option does - * not produce an unbiased random sampling from all available reads at each locus: instead, the primary goal of the - * to-coverage downsampler is to maintain an even representation of reads from all alignment start positions when - * removing excess coverage. For a truly unbiased random sampling of reads, use -dfrac instead. Also note - * that the coverage target is an approximate goal that is not guaranteed to be met exactly: the downsampling - * algorithm will under some circumstances retain slightly more or less coverage than requested. + * The principle of this downsampling type is to downsample reads to a given capping threshold coverage. Its purpose is to + * get rid of excessive coverage, because above a certain depth, having additional data is not informative and imposes + * unreasonable computational costs. The downsampling process takes two different forms depending on the type of + * analysis it is used with. + * + * For locus-based traversals (LocusWalkers like UnifiedGenotyper and ActiveRegionWalkers like HaplotypeCaller), + * downsample_to_coverage controls the maximum depth of coverage at each locus. For read-based traversals + * (ReadWalkers like BaseRecalibrator), it controls the maximum number of reads sharing the same alignment start + * position. For ReadWalkers you will typically need to use much lower dcov values than you would with LocusWalkers + * to see an effect. Note that this downsampling option does not produce an unbiased random sampling from all available + * reads at each locus: instead, the primary goal of the to-coverage downsampler is to maintain an even representation + * of reads from all alignment start positions when removing excess coverage. For a truly unbiased random sampling of + * reads, use -dfrac instead. Also note that the coverage target is an approximate goal that is not guaranteed to be + * met exactly: the downsampling algorithm will under some circumstances retain slightly more or less coverage than + * requested. */ @Argument(fullName = "downsample_to_coverage", shortName = "dcov", - doc = "Coverage [integer] to downsample to per locus (for locus walkers) or per alignment start position (for read walkers)", - required = false) + doc = "Target coverage threshold for downsampling to coverage", + required = false, minValue = 0) public Integer downsampleCoverage = null; /** - * Gets the downsampling method explicitly specified by the user. If the user didn't specify + * Gets the downsampling method explicitly specified by the user. If the user didn't specify * a default downsampling mechanism, return the default. * @return The explicitly specified downsampling mechanism, or the default if none exists. */ @@ -178,8 +225,10 @@ public class GATKArgumentCollection { // -------------------------------------------------------------------------------------------------------------- @Argument(fullName = "baq", shortName="baq", doc="Type of BAQ calculation to apply in the engine", required = false) public BAQ.CalculationMode BAQMode = BAQ.CalculationMode.OFF; - - @Argument(fullName = "baqGapOpenPenalty", shortName="baqGOP", doc="BAQ gap open penalty (Phred Scaled). Default value is 40. 30 is perhaps better for whole genome call sets", required = false) + /** + * Phred-scaled gap open penalty for BAQ calculation. Although the default value is 40, a value of 30 may be better for whole genome call sets. + */ + @Argument(fullName = "baqGapOpenPenalty", shortName="baqGOP", doc="BAQ gap open penalty", required = false, minValue = 0) public double BAQGOP = BAQ.DEFAULT_GOP; // -------------------------------------------------------------------------------------------------------------- @@ -189,19 +238,33 @@ public class GATKArgumentCollection { // -------------------------------------------------------------------------------------------------------------- /** - * Q0 == ASCII 33 according to the SAM specification, whereas Illumina encoding starts at Q64. The idea here is - * simple: we just iterate over all reads and subtract 31 from every quality score. + * By default the GATK assumes that base quality scores start at Q0 == ASCII 33 according to the SAM specification. + * However, encoding in some datasets (especially older Illumina ones) starts at Q64. This argument will fix the + * encodings on the fly (as the data is read in) by subtracting 31 from every quality score. Note that this argument should + * NEVER be used by default; you should only use it when you have confirmed that the quality scores in your data are + * not in the correct encoding. */ @Argument(fullName = "fix_misencoded_quality_scores", shortName="fixMisencodedQuals", doc="Fix mis-encoded base quality scores", required = false) public boolean FIX_MISENCODED_QUALS = false; - - @Argument(fullName = "allow_potentially_misencoded_quality_scores", shortName="allowPotentiallyMisencodedQuals", doc="Do not fail when encountering base qualities that are too high and that seemingly indicate a problem with the base quality encoding of the BAM file", required = false) + /** + * This flag tells GATK to ignore warnings when encountering base qualities that are too high and that seemingly + * indicate a problem with the base quality encoding of the BAM file. You should only use this if you really know + * what you are doing; otherwise you could seriously mess up your data and ruin your analysis. + */ + @Argument(fullName = "allow_potentially_misencoded_quality_scores", shortName="allowPotentiallyMisencodedQuals", doc="Ignore warnings about base quality score encoding", required = false) public boolean ALLOW_POTENTIALLY_MISENCODED_QUALS = false; - - @Argument(fullName="useOriginalQualities", shortName = "OQ", doc = "If set, use the original base quality scores from the OQ tag when present instead of the standard scores", required=false) + /** + * This flag tells GATK to use the original base qualities (that were in the data before BQSR/recalibration) which + * are stored in the OQ tag, if they are present, rather than use the post-recalibration quality scores. If no OQ + * tag is present for a read, the standard qual score will be used. + */ + @Argument(fullName="useOriginalQualities", shortName = "OQ", doc = "Use the base quality scores from the OQ tag", required=false) public Boolean useOriginalBaseQualities = false; - - @Argument(fullName="defaultBaseQualities", shortName = "DBQ", doc = "If reads are missing some or all base quality scores, this value will be used for all base quality scores", required=false) + /** + * If reads are missing some or all base quality scores, this value will be used for all base quality scores. + * By default this is set to -1 to disable default base quality assignment. + */ + @Argument(fullName="defaultBaseQualities", shortName = "DBQ", doc = "Assign a default base quality", required=false, minValue = 0, maxValue = Byte.MAX_VALUE) public byte defaultBaseQualities = -1; // -------------------------------------------------------------------------------------------------------------- @@ -213,9 +276,9 @@ public class GATKArgumentCollection { /** * The file name for the GATK performance log output, or null if you don't want to generate the * detailed performance logging table. This table is suitable for importing into R or any - * other analysis software that can read tsv files + * other analysis software that can read tsv files. */ - @Argument(fullName = "performanceLog", shortName="PF", doc="If provided, a GATK runtime performance log will be written to this file", required = false) + @Argument(fullName = "performanceLog", shortName="PF", doc="Write GATK runtime performance log to this file", required = false) public File performanceLog = null; // -------------------------------------------------------------------------------------------------------------- @@ -225,10 +288,11 @@ public class GATKArgumentCollection { // -------------------------------------------------------------------------------------------------------------- /** - * Enables on-the-fly recalibrate of base qualities. The covariates tables are produced by the BaseQualityScoreRecalibrator tool. - * Please be aware that one should only run recalibration with the covariates file created on the same input bam(s). + * Enables on-the-fly recalibrate of base qualities, intended primarily for use with BaseRecalibrator and PrintReads + * (see Best Practices workflow documentation). The covariates tables are produced by the BaseRecalibrator tool. + * Please be aware that you should only run recalibration with the covariates file created on the same input bam(s). */ - @Input(fullName="BQSR", shortName="BQSR", required=false, doc="The input covariates table file which enables on-the-fly base quality score recalibration (intended for use with BaseRecalibrator and PrintReads)") + @Input(fullName="BQSR", shortName="BQSR", required=false, doc="Input covariates table file for on-the-fly base quality score recalibration") public File BQSR_RECAL_FILE = null; /** @@ -243,36 +307,41 @@ public class GATKArgumentCollection { public int quantizationLevels = 0; /** - * Turns off printing of the base insertion and base deletion tags when using the -BQSR argument and only the base substitution qualities will be produced. + * Turns off printing of the base insertion and base deletion tags when using the -BQSR argument. Only the base substitution qualities will be produced. */ - @Argument(fullName="disable_indel_quals", shortName = "DIQ", doc = "If true, disables printing of base insertion and base deletion tags (with -BQSR)", required=false) + @Argument(fullName="disable_indel_quals", shortName = "DIQ", doc = "Disable printing of base insertion and deletion tags (with -BQSR)", required=false) public boolean disableIndelQuals = false; /** - * By default, the OQ tag in not emitted when using the -BQSR argument. + * By default, the OQ tag in not emitted when using the -BQSR argument. Use this flag to include OQ tags in the output BAM file. + * Note that this may results in significant file size increase. */ - @Argument(fullName="emit_original_quals", shortName = "EOQ", doc = "If true, enables printing of the OQ tag with the original base qualities (with -BQSR)", required=false) + @Argument(fullName="emit_original_quals", shortName = "EOQ", doc = "Emit the OQ tag with the original base qualities (with -BQSR)", required=false) public boolean emitOriginalQuals = false; /** - * Do not modify quality scores less than this value but rather just write them out unmodified in the recalibrated BAM file. + * This flag tells GATK not to modify quality scores less than this value. Instead they will be written out unmodified in the recalibrated BAM file. * In general it's unsafe to change qualities scores below < 6, since base callers use these values to indicate random or bad bases. * For example, Illumina writes Q2 bases when the machine has really gone wrong. This would be fine in and of itself, * but when you select a subset of these reads based on their ability to align to the reference and their dinucleotide effect, * your Q2 bin can be elevated to Q8 or Q10, leading to issues downstream. */ - @Argument(fullName = "preserve_qscores_less_than", shortName = "preserveQ", doc = "Bases with quality scores less than this threshold won't be recalibrated (with -BQSR)", required = false) + @Argument(fullName = "preserve_qscores_less_than", shortName = "preserveQ", doc = "Don't recalibrate bases with quality scores less than this threshold (with -BQSR)", required = false, minValue = 0, minRecommendedValue = QualityUtils.MIN_USABLE_Q_SCORE) public int PRESERVE_QSCORES_LESS_THAN = QualityUtils.MIN_USABLE_Q_SCORE; - - @Argument(fullName = "globalQScorePrior", shortName = "globalQScorePrior", doc = "The global Qscore Bayesian prior to use in the BQSR. If specified, this value will be used as the prior for all mismatch quality scores instead of the actual reported quality score", required = false) + /** + * If specified, this value will be used as the prior for all mismatch quality scores instead of the actual reported quality score. + */ + @Argument(fullName = "globalQScorePrior", shortName = "globalQScorePrior", doc = "Global Qscore Bayesian prior to use for BQSR", required = false) public double globalQScorePrior = -1.0; /** - * For the sake of your data, please only use this option if you know what you are doing. It is absolutely not recommended practice - * to run base quality score recalibration on reduced BAM files. + * It is absolutely not recommended practice to run base quality score recalibration on BAM files that have been + * processed with ReduceReads. By default, the GATK will error out if it detects that you are trying to recalibrate + * a reduced BAM file. However, this flag allows you to disable the warning and proceed anyway. For the sake of your + * data, please only use this option if you really know what you are doing. */ @Advanced - @Argument(fullName = "allow_bqsr_on_reduced_bams_despite_repeated_warnings", shortName="allowBqsrOnReducedBams", doc="Do not fail when running base quality score recalibration on a reduced BAM file even though we highly recommend against it", required = false) + @Argument(fullName = "allow_bqsr_on_reduced_bams_despite_repeated_warnings", shortName="allowBqsrOnReducedBams", doc="Ignore all warnings about how it's a really bad idea to run BQSR on a reduced BAM file (AT YOUR OWN RISK!)", required = false) public boolean ALLOW_BQSR_ON_REDUCED_BAMS = false; // -------------------------------------------------------------------------------------------------------------- @@ -281,35 +350,45 @@ public class GATKArgumentCollection { // // -------------------------------------------------------------------------------------------------------------- + /** + * Keep in mind that if you set this to LENIENT, we may refuse to provide you with support if anything goes wrong. + */ @Argument(fullName = "validation_strictness", shortName = "S", doc = "How strict should we be with validation", required = false) public SAMFileReader.ValidationStringency strictnessLevel = SAMFileReader.ValidationStringency.SILENT; - - @Argument(fullName = "remove_program_records", shortName = "rpr", doc = "Should we override the Walker's default and remove program records from the SAM header", required = false) + /** + * Some tools keep program records in the SAM header by default. Use this argument to override that behavior and discard program records for the SAM header. + */ + @Argument(fullName = "remove_program_records", shortName = "rpr", doc = "Remove program records from the SAM header", required = false) public boolean removeProgramRecords = false; - - @Argument(fullName = "keep_program_records", shortName = "kpr", doc = "Should we override the Walker's default and keep program records from the SAM header", required = false) + /** + * Some tools discard program records from the SAM header by default. Use this argument to override that behavior and keep program records in the SAM header. + */ + @Argument(fullName = "keep_program_records", shortName = "kpr", doc = "Keep program records in the SAM header", required = false) public boolean keepProgramRecords = false; - + /** + * This option requires that each BAM file listed in the mapping file have only a single sample specified in its header + * (though there may be multiple read groups for that sample). Each line of the mapping file must contain the absolute + * path to a BAM file, followed by whitespace, followed by the new sample name for that BAM file. + */ @Advanced - @Argument(fullName = "sample_rename_mapping_file", shortName = "sample_rename_mapping_file", - doc = "Rename sample IDs on-the-fly at runtime using the provided mapping file. This option requires that " + - "each BAM file listed in the mapping file have only a single sample specified in its header (though there " + - "may be multiple read groups for that sample). Each line of the mapping file must contain the absolute path " + - "to a BAM file, followed by whitespace, followed by the new sample name for that BAM file.", - required = false) + @Argument(fullName = "sample_rename_mapping_file", shortName = "sample_rename_mapping_file", doc = "Rename sample IDs on-the-fly at runtime using the provided mapping file", required = false) public File sampleRenameMappingFile = null; - - @Argument(fullName = "unsafe", shortName = "U", doc = "If set, enables unsafe operations: nothing will be checked at runtime. For expert users only who know what they are doing. We do not support usage of this argument.", required = false) + /** + * For expert users only who know what they are doing. We do not support usage of this argument, so we may refuse to help you if you use it and something goes wrong. + */ + @Argument(fullName = "unsafe", shortName = "U", doc = "Enable unsafe operations: nothing will be checked at runtime", required = false) public ValidationExclusion.TYPE unsafe; - + /** + * UNSAFE FOR GENERAL USE (FOR TEST SUITE USE ONLY). Disable both auto-generation of index files and index file locking + * when reading VCFs and other rods and an index isn't present or is out-of-date. The file locking necessary for auto index + * generation to work safely is prone to random failures/hangs on certain platforms, which makes it desirable to disable it + * for situations like test suite runs where the indices are already known to exist, however this option is unsafe in general + * because it allows reading from index files without first acquiring a lock. + */ @Hidden @Advanced @Argument(fullName = "disable_auto_index_creation_and_locking_when_reading_rods", shortName = "disable_auto_index_creation_and_locking_when_reading_rods", - doc = "UNSAFE FOR GENERAL USE (FOR TEST SUITE USE ONLY). Disable both auto-generation of index files and index file locking " + - "when reading VCFs and other rods and an index isn't present or is out-of-date. The file locking necessary for auto index " + - "generation to work safely is prone to random failures/hangs on certain platforms, which makes it desirable to disable it " + - "for situations like test suite runs where the indices are already known to exist, however this option is unsafe in general " + - "because it allows reading from index files without first acquiring a lock.", + doc = "Disable both auto-generation of index files and index file locking", required = false) public boolean disableAutoIndexCreationAndLockingWhenReadingRods = false; @@ -320,23 +399,22 @@ public class GATKArgumentCollection { // -------------------------------------------------------------------------------------------------------------- /** - * How many data threads should be allocated to this analysis? Data threads contains N cpu threads per - * data thread, and act as completely data parallel processing, increasing the memory usage of GATK - * by M data threads. Data threads generally scale extremely effectively, up to 24 cores + * Data threads contains N cpu threads per data thread, and act as completely data parallel processing, increasing + * the memory usage of GATK by M data threads. Data threads generally scale extremely effectively, up to 24 cores. + * See online documentation FAQs for more information. */ - @Argument(fullName = "num_threads", shortName = "nt", doc = "How many data threads should be allocated to running this analysis.", required = false) + @Argument(fullName = "num_threads", shortName = "nt", doc = "Number of data threads to allocate to this analysis", required = false, minValue = 1) public Integer numberOfDataThreads = 1; /** - * How many CPU threads should be allocated per data thread? Each CPU thread operates the map - * cycle independently, but may run into earlier scaling problems with IO than data threads. Has - * the benefit of not requiring X times as much memory per thread as data threads do, but rather - * only a constant overhead. + * Each CPU thread operates the map cycle independently, but may run into earlier scaling problems with IO than + * data threads. Has the benefit of not requiring X times as much memory per thread as data threads do, but rather + * only a constant overhead. See online documentation FAQs for more information. */ - @Argument(fullName="num_cpu_threads_per_data_thread", shortName = "nct", doc="How many CPU threads should be allocated per data thread to running this analysis?", required = false) + @Argument(fullName="num_cpu_threads_per_data_thread", shortName = "nct", doc="Number of CPU threads to allocate per data thread", required = false, minValue = 1) public int numberOfCPUThreadsPerDataThread = 1; - @Argument(fullName="num_io_threads", shortName = "nit", doc="How many of the given threads should be allocated to IO", required = false) + @Argument(fullName="num_io_threads", shortName = "nit", doc="Number of given threads to allocate to IO", required = false, minValue = 0) @Hidden public int numberOfIOThreads = 0; @@ -345,13 +423,15 @@ public class GATKArgumentCollection { * cost (< 0.1%) in runtime because of turning on the JavaBean. This is largely for * debugging purposes. Note that this argument is not compatible with -nt, it only works with -nct. */ - @Argument(fullName = "monitorThreadEfficiency", shortName = "mte", doc = "Enable GATK threading efficiency monitoring", required = false) + @Argument(fullName = "monitorThreadEfficiency", shortName = "mte", doc = "Enable threading efficiency monitoring", required = false) public Boolean monitorThreadEfficiency = false; - @Argument(fullName = "num_bam_file_handles", shortName = "bfh", doc="The total number of BAM file handles to keep open simultaneously", required=false) + @Argument(fullName = "num_bam_file_handles", shortName = "bfh", doc="Total number of BAM file handles to keep open simultaneously", required=false, minValue = 1) public Integer numberOfBAMFileHandles = null; - - @Input(fullName = "read_group_black_list", shortName="rgbl", doc="Filters out read groups matching : or a .txt file containing the filter strings one per line.", required = false) + /** + * This will filter out read groups matching : (e.g. SM:sample1) or a .txt file containing the filter strings one per line. + */ + @Input(fullName = "read_group_black_list", shortName="rgbl", doc="Exclude read groups based on tags", required = false) public List readGroupBlackList = null; // -------------------------------------------------------------------------------------------------------------- @@ -433,7 +513,7 @@ public class GATKArgumentCollection { /** * How strict should we be in parsing the PED files? */ - @Argument(fullName="pedigreeValidationType", shortName = "pedValidationType", doc="How strict should we be in validating the pedigree information?",required=false) + @Argument(fullName="pedigreeValidationType", shortName = "pedValidationType", doc="Validation strictness for pedigree information",required=false) public PedigreeValidationType pedigreeValidationType = PedigreeValidationType.STRICT; // -------------------------------------------------------------------------------------------------------------- @@ -441,8 +521,10 @@ public class GATKArgumentCollection { // BAM indexing and sharding arguments // // -------------------------------------------------------------------------------------------------------------- - - @Argument(fullName="allow_intervals_with_unindexed_bam",doc="Allow interval processing with an unsupported BAM. NO INTEGRATION TESTS are available. Use at your own risk.",required=false) + /** + * NO INTEGRATION TESTS are available. Use at your own risk. + */ + @Argument(fullName="allow_intervals_with_unindexed_bam",doc="Allow interval processing with an unsupported BAM",required=false) @Hidden public boolean allowIntervalsWithUnindexedBAM = false; @@ -451,8 +533,10 @@ public class GATKArgumentCollection { // testing BCF2 // // -------------------------------------------------------------------------------------------------------------- - - @Argument(fullName="generateShadowBCF",shortName = "generateShadowBCF",doc="If provided, whenever we create a VCFWriter we will also write out a BCF file alongside it, for testing purposes",required=false) + /** + * If provided, whenever we create a VCFWriter we will also write out a BCF file alongside it, for testing purposes. + */ + @Argument(fullName="generateShadowBCF",shortName = "generateShadowBCF",doc="Write a BCF copy of the output VCF",required=false) @Hidden public boolean generateShadowBCF = false; // TODO -- remove all code tagged with TODO -- remove me when argument generateShadowBCF is removed @@ -471,12 +555,13 @@ public class GATKArgumentCollection { * DYNAMIC_SEEK attempts to optimize for minimal seek time by choosing an appropriate strategy and parameter (user-supplied parameter is ignored) * DYNAMIC_SIZE attempts to optimize for minimal index size by choosing an appropriate strategy and parameter (user-supplied parameter is ignored) */ - - @Argument(fullName="variant_index_type",shortName = "variant_index_type",doc="which type of IndexCreator to use for VCF/BCF indices",required=false) + @Argument(fullName="variant_index_type",shortName = "variant_index_type",doc="Type of IndexCreator to use for VCF/BCF indices",required=false) @Advanced public GATKVCFIndexType variant_index_type = GATKVCFUtils.DEFAULT_INDEX_TYPE; - - @Argument(fullName="variant_index_parameter",shortName = "variant_index_parameter",doc="the parameter (bin width or features per bin) to pass to the VCF/BCF IndexCreator",required=false) + /** + * This is either the bin width or the number of features per bin, depending on the indexing strategy + */ + @Argument(fullName="variant_index_parameter",shortName = "variant_index_parameter",doc="Parameter to pass to the VCF/BCF IndexCreator",required=false) @Advanced public int variant_index_parameter = GATKVCFUtils.DEFAULT_INDEX_PARAMETER; } diff --git a/public/java/src/org/broadinstitute/sting/gatk/executive/MicroScheduler.java b/public/java/src/org/broadinstitute/sting/gatk/executive/MicroScheduler.java index 7077db49c..405c07392 100644 --- a/public/java/src/org/broadinstitute/sting/gatk/executive/MicroScheduler.java +++ b/public/java/src/org/broadinstitute/sting/gatk/executive/MicroScheduler.java @@ -147,7 +147,7 @@ public abstract class MicroScheduler implements MicroSchedulerMBean { if ( threadAllocation.getNumDataThreads() > 1 ) { if (walker.isReduceByInterval()) - throw new UserException.BadArgumentValue("nt", String.format("The analysis %s aggregates results by interval. Due to a current limitation of the GATK, analyses of this type do not currently support parallel execution. Please run your analysis without the -nt option.", engine.getWalkerName(walker.getClass()))); + throw new UserException.BadArgumentValue("nt", String.format("This run of %s is set up to aggregate results by interval. Due to a current limitation of the GATK, analyses of this type do not currently support parallel execution. Please run your analysis without the -nt option or check if this tool has an option to disable per-interval calculations.", engine.getWalkerName(walker.getClass()))); if ( ! (walker instanceof TreeReducible) ) { throw badNT("nt", engine, walker); diff --git a/public/java/src/org/broadinstitute/sting/gatk/walkers/coverage/DepthOfCoverage.java b/public/java/src/org/broadinstitute/sting/gatk/walkers/coverage/DepthOfCoverage.java index ca3255097..3a51a9a6a 100644 --- a/public/java/src/org/broadinstitute/sting/gatk/walkers/coverage/DepthOfCoverage.java +++ b/public/java/src/org/broadinstitute/sting/gatk/walkers/coverage/DepthOfCoverage.java @@ -57,10 +57,10 @@ import java.io.PrintStream; import java.util.*; /** - * Toolbox for assessing sequence coverage by a wide array of metrics, partitioned by sample, read group, or library + * Assess sequence coverage by a wide array of metrics, partitioned by sample, read group, or library * *

- * Coverage processes a set of bam files to determine coverage at different levels of partitioning and + * This tool processes a set of bam files to determine coverage at different levels of partitioning and * aggregation. Coverage can be analyzed per locus, per interval, per gene, or in total; can be partitioned by * sample, by read group, by technology, by center, or by library; and can be summarized by mean, median, quartiles, * and/or percentage of bases covered to or beyond a threshold. @@ -73,7 +73,7 @@ import java.util.*; *

*(Optional) A REFSEQ Rod to aggregate coverage to the gene level *

- * (for information about creating the REFSEQ Rod, please consult the RefSeqCodec documentation) + * (for information about creating the REFSEQ Rod, please consult the online documentation) *

*

Output

*

@@ -117,7 +117,7 @@ import java.util.*; // todo -- alter logarithmic scaling to spread out bins more // todo -- allow for user to set linear binning (default is logarithmic) // todo -- formatting --> do something special for end bins in getQuantile(int[] foo), this gets mushed into the end+-1 bins for now -@DocumentedGATKFeature( groupName = HelpConstants.DOCS_CAT_QC, extraDocs = {CommandLineGATK.class} ) +@DocumentedGATKFeature( groupName = HelpConstants.DOCS_CAT_QC, extraDocs = {CommandLineGATK.class}, gotoDev = HelpConstants.MC) @By(DataSource.REFERENCE) @PartitionBy(PartitionType.NONE) @Downsample(by= DownsampleType.NONE, toCoverage=Integer.MAX_VALUE) @@ -125,53 +125,63 @@ public class DepthOfCoverage extends LocusWalker out; - - @Argument(fullName = "minMappingQuality", shortName = "mmq", doc = "Minimum mapping quality of reads to count towards depth. Defaults to -1.", required = false) + /** + * Reads with mapping quality values lower than this threshold will be skipped. This is set to -1 by default to disable the evaluation and ignore this threshold. + */ + @Argument(fullName = "minMappingQuality", shortName = "mmq", doc = "Minimum mapping quality of reads to count towards depth", required = false, minValue = 0, maxValue = Integer.MAX_VALUE) int minMappingQuality = -1; - @Argument(fullName = "maxMappingQuality", doc = "Maximum mapping quality of reads to count towards depth. Defaults to 2^31-1 (Integer.MAX_VALUE).", required = false) + /** + * Reads with mapping quality values higher than this threshold will be skipped. The default value is the largest number that can be represented as an integer by the program. + */ + @Argument(fullName = "maxMappingQuality", doc = "Maximum mapping quality of reads to count towards depth", required = false, minValue = 0, maxValue = Integer.MAX_VALUE) int maxMappingQuality = Integer.MAX_VALUE; - - @Argument(fullName = "minBaseQuality", shortName = "mbq", doc = "Minimum quality of bases to count towards depth. Defaults to -1.", required = false) + /** + * Bases with quality scores lower than this threshold will be skipped. This is set to -1 by default to disable the evaluation and ignore this threshold. + */ + @Argument(fullName = "minBaseQuality", shortName = "mbq", doc = "Minimum quality of bases to count towards depth", required = false, minValue = 0, maxValue = Byte.MAX_VALUE) byte minBaseQuality = -1; - @Argument(fullName = "maxBaseQuality", doc = "Maximum quality of bases to count towards depth. Defaults to 127 (Byte.MAX_VALUE).", required = false) + /** + * Bases with quality scores higher than this threshold will be skipped. The default value is the largest number that can be represented as a byte. + */ + @Argument(fullName = "maxBaseQuality", doc = "Maximum quality of bases to count towards depth", required = false, minValue = 0, maxValue = Byte.MAX_VALUE) byte maxBaseQuality = Byte.MAX_VALUE; @Argument(fullName = "countType", doc = "How should overlapping reads from the same fragment be handled?", required = false) CoverageUtils.CountPileupType countType = CoverageUtils.CountPileupType.COUNT_READS; /** - * Instead of reporting depth, report the base pileup at each locus + * Instead of reporting depth, the program will report the base pileup at each locus */ - @Argument(fullName = "printBaseCounts", shortName = "baseCounts", doc = "Will add base counts to per-locus output.", required = false) + @Argument(fullName = "printBaseCounts", shortName = "baseCounts", doc = "Add base counts to per-locus output", required = false) boolean printBaseCounts = false; /** - * Do not tabulate locus statistics (# loci covered by sample by coverage) + * Disabling the tabulation of locus statistics (# loci covered by sample by coverage) should speed up processing. */ - @Argument(fullName = "omitLocusTable", shortName = "omitLocusTable", doc = "Will not calculate the per-sample per-depth counts of loci, which should result in speedup", required = false) + @Argument(fullName = "omitLocusTable", shortName = "omitLocusTable", doc = "Do not calculate per-sample per-depth counts of loci", required = false) boolean omitLocusTable = false; /** - * Do not tabulate interval statistics (mean, median, quartiles AND # intervals by sample by coverage) + * Disabling the tabulation of interval statistics (mean, median, quartiles AND # intervals by sample by coverage) should speed up processing. This option is required in order to use -nt parallelism. */ - @Argument(fullName = "omitIntervalStatistics", shortName = "omitIntervals", doc = "Will omit the per-interval statistics section, which should result in speedup", required = false) + @Argument(fullName = "omitIntervalStatistics", shortName = "omitIntervals", doc = "Do not calculate per-interval statistics", required = false) boolean omitIntervals = false; /** - * Do not print the total coverage at every base + * Disabling the tabulation of total coverage at every base should speed up processing. */ - @Argument(fullName = "omitDepthOutputAtEachBase", shortName = "omitBaseOutput", doc = "Will omit the output of the depth of coverage at each base, which should result in speedup", required = false) + @Argument(fullName = "omitDepthOutputAtEachBase", shortName = "omitBaseOutput", doc = "Do not output depth of coverage at each base", required = false) boolean omitDepthOutput = false; /** - * Path to the RefSeq file for use in aggregating coverage statistics over genes + * Specify a RefSeq file for use in aggregating coverage statistics over genes. */ - @Argument(fullName = "calculateCoverageOverGenes", shortName = "geneList", doc = "Calculate the coverage statistics over this list of genes. Currently accepts RefSeq.", required = false) + @Argument(fullName = "calculateCoverageOverGenes", shortName = "geneList", doc = "Calculate coverage statistics over this list of genes", required = false) File refSeqGeneList = null; /** - * The format of the output file + * Output file format (e.g. csv, table, rtable); defaults to r-readable table. */ - @Argument(fullName = "outputFormat", doc = "the format of the output file (e.g. csv, table, rtable); defaults to r-readable table", required = false) + @Argument(fullName = "outputFormat", doc = "The format of the output file", required = false) String outputFormat = "rtable"; @@ -180,42 +190,47 @@ public class DepthOfCoverage extends LocusWalker END are counted in the last bin. + * Sets the high-coverage cutoff for granular binning. All loci with depth > STOP are counted in the last bin. */ @Advanced - @Argument(fullName = "stop", doc = "Ending (right endpoint) for granular binning", required = false) + @Argument(fullName = "stop", doc = "Ending (right endpoint) for granular binning", required = false, minValue = 1) int stop = 500; /** * Sets the number of bins for granular binning */ @Advanced - @Argument(fullName = "nBins", doc = "Number of bins to use for granular binning", required = false) + @Argument(fullName = "nBins", doc = "Number of bins to use for granular binning", required = false, minValue = 0, minRecommendedValue = 1) int nBins = 499; /** - * Do not tabulate the sample summary statistics (total, mean, median, quartile coverage per sample) + * This option simply disables writing separate files for per-sample summary statistics (total, mean, median, quartile coverage per sample). These statistics are still calculated internally, so enabling this option will not improve runtime. */ - @Argument(fullName = "omitPerSampleStats", shortName = "omitSampleSummary", doc = "Omits the summary files per-sample. These statistics are still calculated, so this argument will not improve runtime.", required = false) + @Argument(fullName = "omitPerSampleStats", shortName = "omitSampleSummary", doc = "Do not output the summary files per-sample", required = false) boolean omitSampleSummary = false; /** - * A way of partitioning reads into groups. Can be sample, readgroup, or library. + * By default, coverage is partitioning by sample, but it can be any combination of sample, readgroup and/or library. */ - @Argument(fullName = "partitionType", shortName = "pt", doc = "Partition type for depth of coverage. Defaults to sample. Can be any combination of sample, readgroup, library.", required = false) + @Argument(fullName = "partitionType", shortName = "pt", doc = "Partition type for depth of coverage", required = false) Set partitionTypes = EnumSet.of(DoCOutputType.Partition.sample); /** @@ -230,10 +245,10 @@ public class DepthOfCoverage extends LocusWalker= CT for each sample) + * For summary file outputs, report the percentage of bases covered to an amount equal to or greater than this number (e.g. % bases >= CT for each sample). Defaults to 15; can take multiple arguments. */ @Advanced - @Argument(fullName = "summaryCoverageThreshold", shortName = "ct", doc = "for summary file outputs, report the % of bases coverd to >= this number. Defaults to 15; can take multiple arguments.", required = false) + @Argument(fullName = "summaryCoverageThreshold", shortName = "ct", doc = "Coverage threshold (in percent) for summarizing statistics", required = false) int[] coverageThresholds = {15}; String[] OUTPUT_FORMATS = {"table","rtable","csv"}; @@ -425,7 +440,7 @@ public class DepthOfCoverage extends LocusWalker implements TreeReducible, NanoSchedulable { @Input(fullName="exception", shortName = "E", doc="Java class of exception to throw", required=true) public String exceptionToThrow; diff --git a/public/java/src/org/broadinstitute/sting/gatk/walkers/variantutils/GenotypeConcordance.java b/public/java/src/org/broadinstitute/sting/gatk/walkers/variantutils/GenotypeConcordance.java index 724578a09..8c8961cb5 100755 --- a/public/java/src/org/broadinstitute/sting/gatk/walkers/variantutils/GenotypeConcordance.java +++ b/public/java/src/org/broadinstitute/sting/gatk/walkers/variantutils/GenotypeConcordance.java @@ -51,23 +51,50 @@ import java.util.*; * *

* GenotypeConcordance takes in two callsets (vcfs) and tabulates the number of sites which overlap and share alleles, - * and for each sample, the genotype-by-genotype counts (for instance, the number of sites at which a sample was - * called homozygous reference in the EVAL callset, but homozygous variant in the COMP callset). It outputs these + * and for each sample, the genotype-by-genotype counts (e.g. the number of sites at which a sample was + * called homozygous-reference in the EVAL callset, but homozygous-variant in the COMP callset). It outputs these * counts as well as convenient proportions (such as the proportion of het calls in the EVAL which were called REF in * the COMP) and metrics (such as NRD and NRS). + *

* *

Input

*

* Genotype concordance requires two callsets (as it does a comparison): an EVAL and a COMP callset, specified via - * the -eval and -comp arguments. - * + * the -eval and -comp arguments. Typically, the EVAL callset is an experimental set you want to evaluate, while the + * COMP callset is a previously existing set used as a standard for comparison (taken to represent "truth"). + *

+ *

* (Optional) Jexl expressions for genotype-level filtering of EVAL or COMP genotypes, specified via the -gfe and * -cfe arguments, respectively. *

* *

Output

- * Genotype Concordance writes a GATK report to the specified file (via -o) , consisting of multiple tables of counts - * and proportions. These tables may be optionally moltenized via the -moltenize argument. That is, the standard table + *

+ * Genotype Concordance writes a GATK report to the specified file (via -o), consisting of multiple tables of counts + * and proportions. These tables are constructed on a per-sample basis, and include counts of EVAL vs COMP genotype states, and the + * number of times the alternate alleles between the EVAL and COMP sample did not match up. + *

+ * + *

Term and metrics definitions

+ *

+ *

    + *
  • HET: heterozygous
  • + *
  • HOM_REF: homozygous reference
  • + *
  • HOM_VAR: homozygous variant
  • + *
  • MIXED: something like ./1
  • + *
  • ALLELES_MATCH: counts of calls at the same site where the alleles match
  • + *
  • ALLELES_DO_NOT_MATCH: counts of calls at the same location with different alleles, such as the eval set calling a 'G' alternate allele, and the comp set calling a 'T' alternate allele
  • + *
  • EVAL_ONLY: counts of sites present only in the EVAL set, not in the COMP set
  • + *
  • TRUTH_ONLY: counts of sites present only in the COMP set, not in the EVAL set
  • + *
  • Non-Reference_Discrepancy (NRD): genotype concordance excluding concordant reference sites
  • + *
  • Non-Reference_Sensitivity (NRS): sensitivity of the EVAL calls to polymorphic calls in the COMP set, calculated by (# true positive)/(# true polymorphic)
  • + *
  • Overall_Genotype_Concordance: overall concordance calculated by (# concordant genotypes)/(# genotypes)
  • + *
+ *

+ * + *

Moltenized tables

+ * + *

These tables may be optionally moltenized via the -moltenize argument. That is, the standard table * *

  *  Sample   NO_CALL_HOM_REF  NO_CALL_HET  NO_CALL_HOM_VAR   (...)
@@ -87,30 +114,32 @@ import java.util.*;
  *  (...)
  *  
* + *

Site-level allelic concordance

* - * These tables are constructed on a per-sample basis, and include counts of eval vs comp genotype states, and the - * number of times the alternate alleles between the eval and comp sample did not match up. - * - * In addition, Genotype Concordance produces site-level allelic concordance. For strictly bi-allelic VCFs, - * only the ALLELES_MATCH, EVAL_ONLY, TRUTH_ONLY fields will be populated, but where multi-allelic sites are involved - * counts for EVAL_SUBSET_TRUTH and EVAL_SUPERSET_TRUTH will be generated. - * + *

+ * For strictly bi-allelic VCFs, only the ALLELES_MATCH, EVAL_ONLY, TRUTH_ONLY fields will be populated, + * but where multi-allelic sites are involved counts for EVAL_SUBSET_TRUTH and EVAL_SUPERSET_TRUTH will be generated. + *

+ *

* For example, in the following situation *

  *    eval:  ref - A   alt - C
  *    comp:  ref - A   alt - C,T
  *  
* then the site is tabulated as EVAL_SUBSET_TRUTH. Were the situation reversed, it would be EVAL_SUPERSET_TRUTH. - * However, in the case where eval has both C and T alternate alleles, both must be observed in the genotypes + * However, in the case where EVAL has both C and T alternate alleles, both must be observed in the genotypes * (that is, there must be at least one of (0/1,1/1) and at least one of (0/2,1/2,2/2) in the genotype field). If - * one of the alleles has no observations in the genotype fields of the eval, the site-level concordance is + * one of the alleles has no observations in the genotype fields of the EVAL, the site-level concordance is * tabulated as though that allele were not present in the record. + *

* - *

Monomorphic Records

+ *

Monomorphic Records

+ *

* A site which has an alternate allele, but which is monomorphic in samples, is treated as not having been - * discovered, and will be recorded in the TRUTH_ONLY column (if a record exists in the comp VCF), or not at all - * (if no record exists in the comp VCF). - * + * discovered, and will be recorded in the TRUTH_ONLY column (if a record exists in the COMP set), or not at all + * (if no record exists in the COMP set). + *

+ *

* That is, in the situation *

  *   eval:  ref - A   alt - C   genotypes - 0/0  0/0  0/0 ... 0/0
@@ -121,14 +150,18 @@ import java.util.*;
  *   eval:  ref - A   alt - .   genotypes - 0/0  0/0  0/0 ... 0/0
  *   comp:  ref - A   alt - C   ...         0/0  0/0  ...
  *  
- * - * When a record is present in the comp VCF the *genotypes* for the monomorphic site will still be used to evaluate + *

+ *

+ * When a record is present in the COMP set the *genotypes* for the monomorphic site will still be used to evaluate * per-sample genotype concordance counts. + *

* - *

Filtered Records

+ *

Filtered Records

* Filtered records are treated as though they were not present in the VCF, unless -ignoreSiteFilters is provided, * in which case all records are used. There is currently no way to assess concordance metrics on filtered sites * exclusively. SelectVariants can be used to extract filtered sites, and VariantFiltration used to un-filter them. + * + */ @DocumentedGATKFeature( groupName = HelpConstants.DOCS_CAT_VARMANIP, extraDocs = {CommandLineGATK.class} ) public class GenotypeConcordance extends RodWalker>,ConcordanceMetrics> { diff --git a/public/java/src/org/broadinstitute/sting/utils/exceptions/DynamicClassResolutionException.java b/public/java/src/org/broadinstitute/sting/utils/exceptions/DynamicClassResolutionException.java index 4d280423e..0f1b473c3 100644 --- a/public/java/src/org/broadinstitute/sting/utils/exceptions/DynamicClassResolutionException.java +++ b/public/java/src/org/broadinstitute/sting/utils/exceptions/DynamicClassResolutionException.java @@ -29,10 +29,6 @@ import java.lang.reflect.InvocationTargetException; /** * Class for handling common failures of dynamic class resolution - * - * User: depristo - * Date: Sep 3, 2010 - * Time: 2:24:09 PM */ public class DynamicClassResolutionException extends UserException { public DynamicClassResolutionException(Class c, Exception ex) { diff --git a/public/java/src/org/broadinstitute/sting/utils/exceptions/UserException.java b/public/java/src/org/broadinstitute/sting/utils/exceptions/UserException.java index 40a730029..4db6e3d69 100644 --- a/public/java/src/org/broadinstitute/sting/utils/exceptions/UserException.java +++ b/public/java/src/org/broadinstitute/sting/utils/exceptions/UserException.java @@ -42,10 +42,6 @@ import java.io.File; * Represents the common user errors detected by Sting / GATK * * Root class for all GATK user errors, as well as the container for errors themselves - * - * User: depristo - * Date: Sep 3, 2010 - * Time: 2:24:09 PM */ @DocumentedGATKFeature( groupName = HelpConstants.DOCS_CAT_USRERR, diff --git a/public/java/src/org/broadinstitute/sting/utils/help/DocumentedGATKFeature.java b/public/java/src/org/broadinstitute/sting/utils/help/DocumentedGATKFeature.java index 0390e32d7..0afcdae02 100644 --- a/public/java/src/org/broadinstitute/sting/utils/help/DocumentedGATKFeature.java +++ b/public/java/src/org/broadinstitute/sting/utils/help/DocumentedGATKFeature.java @@ -37,7 +37,7 @@ import java.lang.annotation.*; @Retention(RetentionPolicy.RUNTIME) @Target(ElementType.TYPE) public @interface DocumentedGATKFeature { - /** Should we actually document this feature, even through it's annotated? */ + /** Should we actually document this feature, even though it's annotated? */ public boolean enable() default true; /** The overall group name (walkers, readfilters) this feature is associated with */ public String groupName(); @@ -45,4 +45,6 @@ public @interface DocumentedGATKFeature { public String summary() default ""; /** Are there links to other docs that we should include? CommandLineGATK.class for walkers, for example? */ public Class[] extraDocs() default {}; + /** Who is the go-to developer for operation/documentation issues? */ + public String gotoDev() default "NA"; } diff --git a/public/java/src/org/broadinstitute/sting/utils/help/DocumentedGATKFeatureObject.java b/public/java/src/org/broadinstitute/sting/utils/help/DocumentedGATKFeatureObject.java index 7d6819f39..ad0959bfe 100644 --- a/public/java/src/org/broadinstitute/sting/utils/help/DocumentedGATKFeatureObject.java +++ b/public/java/src/org/broadinstitute/sting/utils/help/DocumentedGATKFeatureObject.java @@ -36,19 +36,20 @@ class DocumentedGATKFeatureObject { private final Class classToDoc; /** Are we enabled? */ private final boolean enable; - private final String groupName, summary; + private final String groupName, summary, gotoDev; private final Class[] extraDocs; - public DocumentedGATKFeatureObject(Class classToDoc, final boolean enable, final String groupName, final String summary, final Class[] extraDocs) { + public DocumentedGATKFeatureObject(Class classToDoc, final boolean enable, final String groupName, final String summary, final Class[] extraDocs, final String gotoDev) { this.classToDoc = classToDoc; this.enable = enable; this.groupName = groupName; this.summary = summary; this.extraDocs = extraDocs; + this.gotoDev = gotoDev; } - public DocumentedGATKFeatureObject(Class classToDoc, final String groupName, final String summary) { - this(classToDoc, true, groupName, summary, new Class[]{}); + public DocumentedGATKFeatureObject(Class classToDoc, final String groupName, final String summary, final String gotoDev) { + this(classToDoc, true, groupName, summary, new Class[]{}, gotoDev); } public Class getClassToDoc() { return classToDoc; } @@ -56,4 +57,5 @@ class DocumentedGATKFeatureObject { public String groupName() { return groupName; } public String summary() { return summary; } public Class[] extraDocs() { return extraDocs; } + public String gotoDev() { return gotoDev; } } diff --git a/public/java/src/org/broadinstitute/sting/utils/help/GATKDoclet.java b/public/java/src/org/broadinstitute/sting/utils/help/GATKDoclet.java index 63cb0900a..6468fe51d 100644 --- a/public/java/src/org/broadinstitute/sting/utils/help/GATKDoclet.java +++ b/public/java/src/org/broadinstitute/sting/utils/help/GATKDoclet.java @@ -118,7 +118,8 @@ public class GATKDoclet { static { STATIC_DOCS.add(new DocumentedGATKFeatureObject(FeatureCodec.class, HelpConstants.DOCS_CAT_RODCODECS, - "Tribble codecs for reading reference ordered data (ROD) files such as VCF or BED")); + "Tribble codecs for reading reference ordered data (ROD) files such as VCF or BED", + "NA")); } @@ -332,11 +333,11 @@ public class GATKDoclet { if (docClass.isAnnotationPresent(DocumentedGATKFeature.class)) { DocumentedGATKFeature f = docClass.getAnnotation(DocumentedGATKFeature.class); - return new DocumentedGATKFeatureObject(docClass, f.enable(), f.groupName(), f.summary(), f.extraDocs()); + return new DocumentedGATKFeatureObject(docClass, f.enable(), f.groupName(), f.summary(), f.extraDocs(), f.gotoDev()); } else { for (DocumentedGATKFeatureObject staticDocs : STATIC_DOCS) { if (staticDocs.getClassToDoc().isAssignableFrom(docClass)) { - return new DocumentedGATKFeatureObject(docClass, staticDocs.enable(), staticDocs.groupName(), staticDocs.summary(), staticDocs.extraDocs()); + return new DocumentedGATKFeatureObject(docClass, staticDocs.enable(), staticDocs.groupName(), staticDocs.summary(), staticDocs.extraDocs(), staticDocs.gotoDev()); } } return null; @@ -446,6 +447,7 @@ public class GATKDoclet { if (annotation.groupName().endsWith(" Tools")) supercatValue = "tools"; else if (annotation.groupName().endsWith(" Utilities")) supercatValue = "utilities"; else if (annotation.groupName().startsWith("Engine ")) supercatValue = "engine"; + else if (annotation.groupName().endsWith(" (DevZone)")) supercatValue = "dev"; else supercatValue = "other"; root.put("supercat", supercatValue); diff --git a/public/java/src/org/broadinstitute/sting/utils/help/GenericDocumentationHandler.java b/public/java/src/org/broadinstitute/sting/utils/help/GenericDocumentationHandler.java index 893a8349b..06c0e1c26 100644 --- a/public/java/src/org/broadinstitute/sting/utils/help/GenericDocumentationHandler.java +++ b/public/java/src/org/broadinstitute/sting/utils/help/GenericDocumentationHandler.java @@ -123,6 +123,8 @@ public class GenericDocumentationHandler extends DocumentedGATKFeatureHandler { for (Tag tag : toProcess.classDoc.tags()) { root.put(tag.name(), tag.text()); } + + root.put("gotoDev", toProcess.annotation.gotoDev()); } /** @@ -160,17 +162,29 @@ public class GenericDocumentationHandler extends DocumentedGATKFeatureHandler { try { // loop over all of the arguments according to the parsing engine for (final ArgumentSource argumentSource : parsingEngine.extractArgumentSources(DocletUtils.getClassForDoc(toProcess.classDoc))) { - // todo -- why can you have multiple ones? ArgumentDefinition argDef = argumentSource.createArgumentDefinitions().get(0); FieldDoc fieldDoc = getFieldDoc(toProcess.classDoc, argumentSource.field.getName()); Map argBindings = docForArgument(fieldDoc, argumentSource, argDef); if (!argumentSource.isHidden() || getDoclet().showHiddenFeatures()) { final String kind = docKindOfArg(argumentSource); - + // Retrieve default value final Object value = argumentValue(toProcess.clazz, argumentSource); if (value != null) argBindings.put("defaultValue", prettyPrintValueString(value)); - + // Retrieve min and max / hard and soft value thresholds for numeric args + if (value instanceof Number) { + if (argumentSource.field.isAnnotationPresent(Argument.class)) { + argBindings.put("minValue", argumentSource.field.getAnnotation(Argument.class).minValue()); + argBindings.put("maxValue", argumentSource.field.getAnnotation(Argument.class).maxValue()); + if (argumentSource.field.getAnnotation(Argument.class).minRecommendedValue() != Double.NEGATIVE_INFINITY) { + argBindings.put("minRecValue", argumentSource.field.getAnnotation(Argument.class).minRecommendedValue()); + } + if (argumentSource.field.getAnnotation(Argument.class).maxRecommendedValue() != Double.POSITIVE_INFINITY) { + argBindings.put("maxRecValue", argumentSource.field.getAnnotation(Argument.class).maxRecommendedValue()); + } + } + } + // Finalize argument bindings args.get(kind).add(argBindings); args.get("all").add(argBindings); } @@ -742,8 +756,14 @@ public class GenericDocumentationHandler extends DocumentedGATKFeatureHandler { /** * Returns a Pair of (main, synonym) names for argument with fullName s1 and - * shortName s2. The main is selected to be the longest of the two, provided - * it doesn't exceed MAX_DISPLAY_NAME, in which case the shorter is taken. + * shortName s2. + * + * Previously we had it so the main name was selected to be the longest of the two, provided + * it didn't exceed MAX_DISPLAY_NAME, in which case the shorter was taken. But we now disable + * the length-based name rearrangement in order to maintain consistency in the GATKDocs table. + * + * This may cause messed up spacing in the CLI-help display but we don't care as much about that + * since more users use the online GATKDocs for looking up arguments. * * @param s1 the short argument name without -, or null if not provided * @param s2 the long argument name without --, or null if not provided @@ -758,13 +778,7 @@ public class GenericDocumentationHandler extends DocumentedGATKFeatureHandler { if (s1 == null) return new Pair(s2, null); if (s2 == null) return new Pair(s1, null); - String l = s1.length() > s2.length() ? s1 : s2; - String s = s1.length() > s2.length() ? s2 : s1; - - if (l.length() > MAX_DISPLAY_NAME) - return new Pair(s, l); - else - return new Pair(l, s); + return new Pair(s2, s1); } /** diff --git a/public/java/src/org/broadinstitute/sting/utils/help/HelpConstants.java b/public/java/src/org/broadinstitute/sting/utils/help/HelpConstants.java index 2ed35d848..783e7aa90 100644 --- a/public/java/src/org/broadinstitute/sting/utils/help/HelpConstants.java +++ b/public/java/src/org/broadinstitute/sting/utils/help/HelpConstants.java @@ -50,15 +50,34 @@ public class HelpConstants { public final static String DOCS_CAT_RF = "Read Filters"; public final static String DOCS_CAT_REFUTILS = "Reference Utilities"; public final static String DOCS_CAT_RODCODECS = "ROD Codecs"; - public final static String DOCS_CAT_USRERR = "User Exceptions"; + public final static String DOCS_CAT_USRERR = "User Exceptions (DevZone)"; public final static String DOCS_CAT_VALIDATION = "Validation Utilities"; public final static String DOCS_CAT_ANNOT = "Variant Annotations"; public final static String DOCS_CAT_VARDISC = "Variant Discovery Tools"; public final static String DOCS_CAT_VARMANIP = "Variant Evaluation and Manipulation Tools"; - public final static String DOCS_CAT_TEST = "Testing Tools"; + public final static String DOCS_CAT_TOY = "Toy Walkers (DevZone)"; public final static String DOCS_CAT_HELPUTILS = "Help Utilities"; public static String forumPost(String post) { return GATK_FORUM_URL + post; } + + /** + * Go-to developer name codes for tracking and display purposes. Only current team members should be in this list. + * When someone leaves, their charges should be redistributed. The actual string should be closest to the dev's + * abbreviated name or two/three-letter nickname as possible. The code can be something else if necessary to + * disambiguate from other variable. + */ + public final static String MC = "MC"; // Mauricio Carneiro + public final static String EB = "EB"; // Eric Banks + public final static String RP = "RP"; // Ryan Poplin + public final static String GVDA = "GG"; // Geraldine Van der Auwera + public final static String VRR = "VRR"; // Valentin Ruano-Rubio + public final static String ALM = "ALM"; // Ami Levy-Moonshine + public final static String BH = "BH"; // Bertrand Haas + public final static String JoT = "JT"; // Joel Thibault + public final static String DR = "DR"; // David Roazen + public final static String KS = "KS"; // Khalid Shakir + + } \ No newline at end of file diff --git a/settings/helpTemplates/common.html b/settings/helpTemplates/common.html index f4fb74af1..ff9df5eea 100644 --- a/settings/helpTemplates/common.html +++ b/settings/helpTemplates/common.html @@ -86,7 +86,13 @@ Support Forum

-

GATK version ${version} built at ${timestamp}.

+

GATK version ${version} built at ${timestamp}. + <#-- closing P tag in next macro --> + + + <#macro footerClose> + <#-- ugly little hack to enable adding tool-specific info inline --> +

<#macro pageFooter> diff --git a/settings/helpTemplates/generic.index.template.html b/settings/helpTemplates/generic.index.template.html index a5650d55e..0398b829d 100644 --- a/settings/helpTemplates/generic.index.template.html +++ b/settings/helpTemplates/generic.index.template.html @@ -58,7 +58,7 @@ ${version}
- <#assign seq = ["engine", "tools", "utilities", "other"]> + <#assign seq = ["engine", "tools", "utilities", "other", "dev"]> <#list seq as supercat>
<#list groups?sort_by("name") as group> @@ -70,4 +70,5 @@
<@footerInfo /> +<@footerClose /> <@pageFooter /> diff --git a/settings/helpTemplates/generic.template.html b/settings/helpTemplates/generic.template.html index eea741669..d4aa7c7f9 100644 --- a/settings/helpTemplates/generic.template.html +++ b/settings/helpTemplates/generic.template.html @@ -31,45 +31,70 @@ <#list myargs as arg> - ${arg.name} - ${arg.type} + ${arg.name}
+ <#if arg.synonyms??> + <#if arg.name[2..] != arg.synonyms[1..]> +  ${arg.synonyms} + + + + ${arg.defaultValue!"NA"} ${arg.summary} - <#-- - < - td>${arg.required} - --> <#macro argumentDetails arg> -

${arg.name} - <#if arg.synonyms??> / ${arg.synonyms} - - ( - <#if arg.attributes??>${arg.attributes} - ${arg.type} - <#if arg.defaultValue??> with default value ${arg.defaultValue} - ) -

-

- ${arg.summary}. ${arg.fulltext} - <#if arg.rodTypes??>${arg.name} binds reference ordered data. This argument supports ROD files of the - following types: ${arg.rodTypes} - - <#if arg.options??> -
- The ${arg.name} argument is an enumerated type (${arg.type}), which can have one of the following values: -

- <#list arg.options as option> -
${option.name}
-
${option.summary}
- -
- -

+
+

${arg.name} + <#if arg.synonyms??> / ${arg.synonyms} +

+

+ ${arg.summary}
+ ${arg.fulltext} +

+ + + <#if arg.rodTypes??> +

${arg.name} binds reference ordered data. This argument supports ROD files of the following types: ${arg.rodTypes}

+ + <#if arg.options??> +

+ The ${arg.name} argument is an enumerated type (${arg.type}), which can have one of the following values: +

+ <#list arg.options as option> +
${option.name}
+
${option.summary}
+ +
+

+ +

<#if arg.required??> + <#if arg.required == "yes"> + R + + + ${arg.type} + <#if arg.defaultValue??> +  ${arg.defaultValue} + + <#if arg.minValue??> +  [ [ ${arg.minValue} + + <#if arg.minRecValue??> +  [ ${arg.minRecValue} + + <#if arg.maxRecValue??> +  ${arg.maxRecValue} ] + + <#if arg.maxValue??> +  ${arg.maxValue} ] ] + +

<#macro relatedByType name type> <#list relatedDocs as relatedDoc> @@ -103,11 +128,12 @@

${name}

${summary}

+ <#-- using goto dev annotation instead, see above footer <#if author??>

Author ${author}

- + --> <#if group?? >

Category ${group} @@ -229,12 +255,12 @@ <#-- Create the argument summary --> <#if arguments.all?size != 0>

${name} specific arguments

-

This table summarizes the command-line arguments that are specific to this tool. For details, see the list further down below the table.

+

This table summarizes the command-line arguments that are specific to this tool. For more details on each argument, see the list further down below the table or click on an argument name to jump directly to that entry in the list.

- - + + @@ -267,6 +293,11 @@ <@argumentDetails arg=arg/> - + <@footerInfo /> + <#-- Specify go-to developer (for internal use) --> + <#if gotoDev??> + GTD: ${gotoDev} + + <@footerClose /> <@pageFooter /> \ No newline at end of file
NameTypeArgument name(s) Default value Summary