Extend the documentation of GenotypeConcordance to include notes about Monomorphic and Filtered VCF records.
Address Geraldine's comments - information on moltenization and explanation of fields Fix paren
This commit is contained in:
parent
28a8d74290
commit
af275fdf10
|
|
@ -67,8 +67,58 @@ import java.util.*;
|
||||||
*
|
*
|
||||||
* <h3>Output</h3>
|
* <h3>Output</h3>
|
||||||
* Genotype Concordance writes a GATK report to the specified file (via -o) , consisting of multiple tables of counts
|
* Genotype Concordance writes a GATK report to the specified file (via -o) , consisting of multiple tables of counts
|
||||||
* and proportions. These tables may be optionally moltenized via the -moltenize argument.
|
* and proportions. These tables may be optionally moltenized via the -moltenize argument. That is, the standard table
|
||||||
*
|
*
|
||||||
|
* Sample NO_CALL_HOM_REF NO_CALL_HET NO_CALL_HOM_VAR (...)
|
||||||
|
* NA12878 0.003 0.001 0.000 (...)
|
||||||
|
* NA12891 0.005 0.000 0.000 (...)
|
||||||
|
*
|
||||||
|
* would instead be displayed
|
||||||
|
*
|
||||||
|
* NA12878 NO_CALL_HOM_REF 0.003
|
||||||
|
* NA12878 NO_CALL_HET 0.001
|
||||||
|
* NA12878 NO_CALL_HOM_VAR 0.000
|
||||||
|
* NA12891 NO_CALL_HOM_REF 0.005
|
||||||
|
* NA12891 NO_CALL_HET 0.000
|
||||||
|
* NA12891 NO_CALL_HOM_VAR 0.000
|
||||||
|
* (...)
|
||||||
|
*
|
||||||
|
*
|
||||||
|
* These tables are constructed on a per-sample basis, and include counts of eval vs comp genotype states, and the
|
||||||
|
* number of times the alternate alleles between the eval and comp sample did not match up.
|
||||||
|
*
|
||||||
|
* In addition, Genotype Concordance produces site-level allelic concordance. For strictly bi-allelic VCFs,
|
||||||
|
* only the ALLELES_MATCH, EVAL_ONLY, TRUTH_ONLY fields will be populated, but where multi-allelic sites are involved
|
||||||
|
* counts for EVAL_SUBSET_TRUTH and EVAL_SUPERSET_TRUTH will be generated.
|
||||||
|
*
|
||||||
|
* For example, in the following situation
|
||||||
|
* eval: ref - A alt - C
|
||||||
|
* comp: ref - A alt - C,T
|
||||||
|
* then the site is tabulated as EVAL_SUBSET_TRUTH. Were the situation reversed, it would be EVAL_SUPERSET_TRUTH.
|
||||||
|
* However, in the case where eval has both C and T alternate alleles, both must be observed in the genotypes
|
||||||
|
* (that is, there must be at least one of (0/1,1/1) and at least one of (0/2,1/2,2/2) in the genotype field). If
|
||||||
|
* one of the alleles has no observations in the genotype fields of the eval, the site-level concordance is
|
||||||
|
* tabulated as though that allele were not present in the record.
|
||||||
|
*
|
||||||
|
* <h3>Monomorphic Records</h3>
|
||||||
|
* A site which has an alternate allele, but which is monomorphic in samples, is treated as not having been
|
||||||
|
* discovered, and will be recorded in the TRUTH_ONLY column (if a record exists in the comp VCF), or not at all
|
||||||
|
* (if no record exists in the comp VCF).
|
||||||
|
*
|
||||||
|
* That is, in the situation
|
||||||
|
* eval: ref - A alt - C genotypes - 0/0 0/0 0/0 ... 0/0
|
||||||
|
* comp: ref - A alt - C ... 0/0 0/0 ...
|
||||||
|
* is equivalent to
|
||||||
|
* eval: ref - A alt - . genotypes - 0/0 0/0 0/0 ... 0/0
|
||||||
|
* comp: ref - A alt - C ... 0/0 0/0 ...
|
||||||
|
*
|
||||||
|
* When a record is present in the comp VCF the *genotypes* for the monomorphic site will still be used to evaluate
|
||||||
|
* per-sample genotype concordance counts.
|
||||||
|
*
|
||||||
|
* <h3>Filtered Records</h3>
|
||||||
|
* Filtered records are treated as though they were not present in the VCF, unless -ignoreSiteFilters is provided,
|
||||||
|
* in which case all records are used. There is currently no way to assess concordance metrics on filtered sites
|
||||||
|
* exclusively. SelectVariants can be used to extract filtered sites, and VariantFiltration used to un-filter them.
|
||||||
*/
|
*/
|
||||||
@DocumentedGATKFeature( groupName = HelpConstants.DOCS_CAT_VARMANIP, extraDocs = {CommandLineGATK.class} )
|
@DocumentedGATKFeature( groupName = HelpConstants.DOCS_CAT_VARMANIP, extraDocs = {CommandLineGATK.class} )
|
||||||
public class GenotypeConcordance extends RodWalker<List<Pair<VariantContext,VariantContext>>,ConcordanceMetrics> {
|
public class GenotypeConcordance extends RodWalker<List<Pair<VariantContext,VariantContext>>,ConcordanceMetrics> {
|
||||||
|
|
|
||||||
Loading…
Reference in New Issue