Merge pull request #1239 from broadinstitute/gvda_straggler_doc_fixes_1237

Improve doc block of GatherBqsrReports Annotation doc enhancements (QD, InbreedingCoeff, ExcessHet and AS versions where applicable)
2015-11-22 13:58:20 -05:00 · 2015-11-22 13:58:20 -05:00 · b0730c2b81
parent 0babe6abf7 a7748368f8
commit b0730c2b81
6 changed files with 55 additions and 22 deletions
--- a/protected/gatk-tools-protected/src/main/java/org/broadinstitute/gatk/tools/GatherBqsrReports.java
+++ b/protected/gatk-tools-protected/src/main/java/org/broadinstitute/gatk/tools/GatherBqsrReports.java
@ -62,17 +62,38 @@ import java.io.File;
 import java.util.List;

 /**
- * relevant javadoc from the underlying gatk class:
+ * Gather recalibration reports from parallelized base recalibration runs
 *
- * Gathers recalibration reports by adding all observations and errors
+ * This tool is intended to be used to combine recalibration tables from runs of BaseRecalibrator parallelized per-interval.
+ * The combination is done simply by adding up all observations and errors.
 *
- * Note: This method DOES NOT recalculate the empirical qualities and quantized qualities. You have to recalculate
+ * <h3>Usage</h3>
+ * <p>Note that this is a command-line utility that bypasses the GATK engine. As a result, the command-line you must use to
+ * invoke it is a little different from other GATK tools (see example below), and it does not accept any of the
+ * classic "CommandLineGATK" arguments.</p>
+ *
+ * <h4>Input</h4>
+ * List of scattered BQSR files
+ *
+ * <h4>Output</h4>
+ * Combined recalibration table in GATKReport format.
+ *
+ * <h4>Command</h4>
+ * <pre>
+ *     java -cp GenomeAnalysisTK.jar org.broadinstitute.gatk.tools.GatherBqsrReports \
+ *          -I input.list \
+ *          -O output.grp
+ * </pre>
+ *
+ * <h3>Caveats</h3>
+ * <ul>
+ * <li>This method DOES NOT recalculate the empirical qualities and quantized qualities. You have to recalculate
 * them after combining. The reason for not calculating it is because this function is intended for combining a
 * series of recalibration reports, and it only makes sense to calculate the empirical qualities and quantized
- * qualities after all the recalibration reports have been combined. Having the user recalculate when appropriate,
- * makes this method faster
- *
- * Note2: The empirical quality reported, however, is recalculated given its simplicity.
+ * qualities after all the recalibration reports have been combined. This is done to make the tool faster.
+ * </li>
+ * <li>The reported empirical quality is recalculated (because it is so simple to do).</li>
+ * </ul>
 *
 */

--- a/protected/gatk-tools-protected/src/main/java/org/broadinstitute/gatk/tools/walkers/annotator/AS_InbreedingCoeff.java
+++ b/protected/gatk-tools-protected/src/main/java/org/broadinstitute/gatk/tools/walkers/annotator/AS_InbreedingCoeff.java
@ -72,7 +72,7 @@ import org.broadinstitute.gatk.utils.variant.GATKVCFHeaderLines;
 import java.util.*;

 /**
- * Allele-specific, likelihood-based test for the inbreeding among samples
+ * Allele-specific likelihood-based test for the inbreeding among samples
 *
 * <p>This annotation estimates whether there is evidence of inbreeding in a population. The higher the score, the higher the chance that there is inbreeding.</p>
 *
@ -81,8 +81,14 @@ import java.util.*;
 *
 * <h3>Caveats</h3>
 * <ul>
- * <li>The Inbreeding Coefficient can only be calculated for cohorts containing at least 10 founder samples.</li>
- * <li>This annotation can take a valid pedigree file to specify founders.</li>
+ * <li>The inbreeding coefficient can only be calculated for cohorts containing at least 10 founder samples.</li>
+ * <li>This annotation can take a valid pedigree file to specify founders. If not specified, all samples will be considered as founders.</li>
+ * </ul>
+ *
+ * <h3>Related annotations</h3>
+ * <ul>
+ *     <li><b><a href="https://www.broadinstitute.org/gatk/guide/tooldocs/org_broadinstitute_gatk_tools_walkers_annotator_InbreedingCoeff.php">InbreedingCoeff</a></b> outputs a version of this annotation that includes all alternate alleles in a single calculation.</li>
+ *     <li><b><a href="https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_annotator_ExcessHet.php">ExcessHet</a></b> estimates excess heterozygosity in a population of samples.</li>
 * </ul>
 *
 */
--- a/protected/gatk-tools-protected/src/main/java/org/broadinstitute/gatk/tools/walkers/annotator/AS_QualByDepth.java
+++ b/protected/gatk-tools-protected/src/main/java/org/broadinstitute/gatk/tools/walkers/annotator/AS_QualByDepth.java
@ -71,16 +71,16 @@ import org.broadinstitute.gatk.utils.variant.GATKVCFHeaderLines;
 import java.util.*;

 /**
- * Per-allele call confidence normalized by depth of sample reads supporting the allele
+ * Allele-specific call confidence normalized by depth of sample reads supporting the allele
 *
 * <p>This annotation puts the variant confidence QUAL score into perspective by normalizing for the amount of coverage available. Because each read contributes a little to the QUAL score, variants in regions with deep coverage can have artificially inflated QUAL scores, giving the impression that the call is supported by more evidence than it really is. To compensate for this, we normalize the variant confidence by depth, which gives us a more objective picture of how well supported the call is.</p>
 *
 * <h3>Statistical notes</h3>
 * <p>The QD is the QUAL score normalized by allele depth (AD) for a variant. For a single sample, the HaplotypeCaller calculates the QD by taking QUAL/AD. For multiple samples, HaplotypeCaller and GenotypeGVCFs calculate the QD by taking QUAL/AD of samples with a non hom-ref genotype call. The reason we leave out the samples with a hom-ref call is to not penalize the QUAL for the other samples with the variant call.</p>
- * <p>Here is a single sample example:</p>
+ * <h4>Here is a single-sample example:</h4>
 * <pre>2	37629	.	C	G	1063.77	.	AC=2;AF=1.00;AN=2;DP=31;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=58.50;QD=34.32;SOR=2.376	GT:AD:DP:GQ:PL:QSS	1/1:0,31:31:93:1092,93,0:0,960</pre>
 <p>QUAL/AD = 1063.77/31 = 34.32 = QD</p>
- * <p>Here is a multi-sample example:</p>
+ * <h4>Here is a multi-sample example:</h4>
 * <pre>10	8046	.	C	T	4107.13	.	AC=1;AF=0.167;AN=6;BaseQRankSum=-3.717;DP=1063;FS=1.616;MLEAC=1;MLEAF=0.167;QD=11.54
 GT:AD:DP:GQ:PL:QSS	0/0:369,4:373:99:0,1007,12207:10548,98	    0/0:331,1:332:99:0,967,11125:9576,27	    0/1:192,164:356:99:4138,0,5291:5501,4505</pre>
 * <p>QUAL/AD = 4107.13/356 = 11.54 = QD</p>
@ -91,6 +91,7 @@ import java.util.*;
 *
 * <h3>Related annotations</h3>
 * <ul>
+ *     <li><b><a href="https://www.broadinstitute.org/gatk/guide/tooldocs/org_broadinstitute_gatk_tools_walkers_annotator_AS_QualByDepth.php">AS_QualByDepth</a></b> outputs a version of this annotation that includes all alternate alleles in a single calculation.</li>
 *     <li><b><a href="https://www.broadinstitute.org/gatk/guide/tooldocs/org_broadinstitute_gatk_tools_walkers_annotator_Coverage.php">Coverage</a></b> gives the filtered depth of coverage for each sample and the unfiltered depth across all samples.</li>
 *     <li><b><a href="https://www.broadinstitute.org/gatk/guide/tooldocs/org_broadinstitute_gatk_tools_walkers_annotator_DepthPerAlleleBySample.php">DepthPerAlleleBySample</a></b> calculates depth of coverage for each allele per sample (AD).</li>
 * </ul>
--- a/protected/gatk-tools-protected/src/main/java/org/broadinstitute/gatk/tools/walkers/annotator/ExcessHet.java
+++ b/protected/gatk-tools-protected/src/main/java/org/broadinstitute/gatk/tools/walkers/annotator/ExcessHet.java
@ -79,20 +79,23 @@ import java.util.*;
 /**
 * Phred-scaled p-value for exact test of excess heterozygosity
 *
- * <p>This annotation is a one-sided phred-scaled p-value using an exact test of the Hardy-Weinberg Equilibrium. The null hypothesis is that the number of heterozygotes follows the Hardy-Weinberg Equilibrium. The p-value is the probability of getting the same or more heterozygotes as was observed, given the null hypothesis. The implementation used is adapted from Wigginton JE, Cutler DJ, Abecasis GR. A Note on Exact Tests of Hardy-Weinberg Equilibrium. American Journal of Human Genetics. 2005;76(5):887-893.</p>
+ * This annotation estimates excess heterozygosity in a population of samples. It is related to but distinct from InbreedingCoeff, which estimates evidence for inbreeding in a population. ExcessHet scales more reliably to large cohort sizes.
 *
 * <h3>Statistical notes</h3>
+ * <p>This annotation is a one-sided phred-scaled p-value using an exact test of the Hardy-Weinberg Equilibrium. The null hypothesis is that the number of heterozygotes follows the Hardy-Weinberg Equilibrium. The p-value is the probability of getting the same or more heterozygotes as was observed, given the null hypothesis. </p>
+ * <p>The implementation used is adapted from Wigginton JE, Cutler DJ, Abecasis GR. A Note on Exact Tests of Hardy-Weinberg Equilibrium. American Journal of Human Genetics. 2005;76(5):887-893.</p>
 * <p>The p-value is calculated exactly by using the Levene-Haldane distribution. This implementation also uses a mid-p correction as described by Graffelman, J. & Moreno, V. (2013). The mid p-value in exact tests for Hardy-Weinberg equilibrium. Statistical Applications in Genetics and Molecular Biology, 12(4), pp. 433-448. </p>
 *
 * <h3>Caveats</h3>
 * <ul>
- * <li>The annotation is not accurate for very small p-values. Beyond 1.0E-16 there is no guarantee that the p-value is accurate, just that it is in fact smaller than 1.0E-16 </li>
- * <li>For multiallelic sites all non reference alleles are treated as a single alternate allele.</li>
+ * <li>The annotation is not accurate for very small p-values. Beyond 1.0E-16 there is no guarantee that the p-value is accurate, just that it is in fact smaller than 1.0E-16. </li>
+ * <li>For multiallelic sites, all non-reference alleles are treated as a single alternate allele.</li>
 * </ul>
 *
 * <h3>Related annotations</h3>
 * <ul>
- *     <li><b><a href="https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_annotator_InbreedingCoeff.php">Inbreeding Coefficient</a></b> </li>
+ *     <li><b><a href="https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_annotator_InbreedingCoeff.php">InbreedingCoeff</a></b> estimates whether there is evidence of inbreeding in a population</li>
+ *     <li><b><a href="https://www.broadinstitute.org/gatk/guide/tooldocs/org_broadinstitute_gatk_tools_walkers_annotator_AS_InbreedingCoeff.php">AS_InbreedingCoeff</a></b> outputs an allele-specific version of the InbreedingCoeff annotation.</li>
 * </ul>
 *
 */
--- a/protected/gatk-tools-protected/src/main/java/org/broadinstitute/gatk/tools/walkers/annotator/InbreedingCoeff.java
+++ b/protected/gatk-tools-protected/src/main/java/org/broadinstitute/gatk/tools/walkers/annotator/InbreedingCoeff.java
@ -80,14 +80,15 @@ import java.util.*;
 *
 * <h3>Caveats</h3>
 * <ul>
- * <li>The Inbreeding Coefficient can only be calculated for cohorts containing at least 10 founder samples.</li>
+ * <li>The inbreeding coefficient can only be calculated for cohorts containing at least 10 founder samples.</li>
 * <li>This annotation is used in variant filtering, but may not be appropriate for that purpose if the cohort being analyzed contains many closely related individuals.</li>
 * <li>This annotation can take a valid pedigree file to specify founders.</li>
 * </ul>
 *
 * <h3>Related annotations</h3>
 * <ul>
- *     <li><b><a href="https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_annotator_ExcessHet.php">Excess Heterozygosity</a></b> </li>
+ *     <li><b><a href="https://www.broadinstitute.org/gatk/guide/tooldocs/org_broadinstitute_gatk_tools_walkers_annotator_AS_InbreedingCoeff.php">AS_InbreedingCoeff</a></b> outputs an allele-specific version of this annotation.</li>
+ *     <li><b><a href="https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_annotator_ExcessHet.php">ExcessHet</a></b> estimates excess heterozygosity in a population of samples.</li>
 * </ul>
 *
 */
--- a/protected/gatk-tools-protected/src/main/java/org/broadinstitute/gatk/tools/walkers/annotator/QualByDepth.java
+++ b/protected/gatk-tools-protected/src/main/java/org/broadinstitute/gatk/tools/walkers/annotator/QualByDepth.java
@ -73,16 +73,16 @@ import htsjdk.variant.variantcontext.VariantContext;
 import java.util.*;

 /**
- * Variant confidence normalized by depth of variant samples
+ * Variant call confidence normalized by depth of sample reads supporting a variant
 *
 * <p>This annotation puts the variant confidence QUAL score into perspective by normalizing for the amount of coverage available. Because each read contributes a little to the QUAL score, variants in regions with deep coverage can have artificially inflated QUAL scores, giving the impression that the call is supported by more evidence than it really is. To compensate for this, we normalize the variant confidence by depth, which gives us a more objective picture of how well supported the call is.</p>
 *
 * <h3>Statistical notes</h3>
 * <p>The QD is the QUAL score normalized by allele depth (AD) for a variant. For a single sample, the HaplotypeCaller calculates the QD by taking QUAL/AD. For multiple samples, HaplotypeCaller and GenotypeGVCFs calculate the QD by taking QUAL/AD of samples with a non hom-ref genotype call. The reason we leave out the samples with a hom-ref call is to not penalize the QUAL for the other samples with the variant call.</p>
- * <p>Here is a single sample example:</p>
+ * <h4>Here is a single-sample example:</h4>
 * <pre>2	37629	.	C	G	1063.77	.	AC=2;AF=1.00;AN=2;DP=31;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=58.50;QD=34.32;SOR=2.376	GT:AD:DP:GQ:PL:QSS	1/1:0,31:31:93:1092,93,0:0,960</pre>
   <p>QUAL/AD = 1063.77/31 = 34.32 = QD</p>
- * <p>Here is a multi-sample example:</p>
+ * <h4>Here is a multi-sample example:</h4>
 * <pre>10	8046	.	C	T	4107.13	.	AC=1;AF=0.167;AN=6;BaseQRankSum=-3.717;DP=1063;FS=1.616;MLEAC=1;MLEAF=0.167;QD=11.54
   GT:AD:DP:GQ:PL:QSS	0/0:369,4:373:99:0,1007,12207:10548,98	    0/0:331,1:332:99:0,967,11125:9576,27	    0/1:192,164:356:99:4138,0,5291:5501,4505</pre>
 * <p>QUAL/AD = 4107.13/356 = 11.54 = QD</p>
@ -92,6 +92,7 @@ import java.util.*;
 *
 * <h3>Related annotations</h3>
 * <ul>
+ *     <li><b><a href="https://www.broadinstitute.org/gatk/guide/tooldocs/org_broadinstitute_gatk_tools_walkers_annotator_AS_QualByDepth.php">AS_QualByDepth</a></b> outputs an allele-specific version of this annotation.</li>
 *     <li><b><a href="https://www.broadinstitute.org/gatk/guide/tooldocs/org_broadinstitute_gatk_tools_walkers_annotator_Coverage.php">Coverage</a></b> gives the filtered depth of coverage for each sample and the unfiltered depth across all samples.</li>
 *     <li><b><a href="https://www.broadinstitute.org/gatk/guide/tooldocs/org_broadinstitute_gatk_tools_walkers_annotator_DepthPerAlleleBySample.php">DepthPerAlleleBySample</a></b> calculates depth of coverage for each allele per sample (AD).</li>
 * </ul>