## What do the VariantEval modules do? http://gatkforums.broadinstitute.org/gatk/discussion/2361/what-do-the-varianteval-modules-do

VariantEval accepts two types of modules: stratification and evaluation modules.

Stratification modules will stratify (group) the variants based on certain properties.
Evaluation modules will compute certain metrics for the variants

CpG

CpG is a three-state stratification:

The locus is a CpG site ("CpG")
The locus is not a CpG site ("non_CpG")
The locus is either a CpG or not a CpG site ("all")

A CpG site is defined as a site where the reference base at a locus is a C and the adjacent reference base in the 3' direction is a G.

EvalRod

EvalRod is an N-state stratification, where N is the number of eval rods bound to VariantEval.

Sample

Sample is an N-state stratification, where N is the number of samples in the eval files.

Filter

Filter is a three-state stratification:

The locus passes QC filters ("called")
The locus fails QC filters ("filtered")
The locus either passes or fails QC filters ("raw")

FunctionalClass

FunctionalClass is a four-state stratification:

The locus is a synonymous site ("silent")
The locus is a missense site ("missense")
The locus is a nonsense site ("nonsense")
The locus is of any functional class ("any")

CompRod

CompRod is an N-state stratification, where N is the number of comp tracks bound to VariantEval.

Degeneracy

Degeneracy is a six-state stratification:

The underlying base position in the codon is 1-fold degenerate ("1-fold")
The underlying base position in the codon is 2-fold degenerate ("2-fold")
The underlying base position in the codon is 3-fold degenerate ("3-fold")
The underlying base position in the codon is 4-fold degenerate ("4-fold")
The underlying base position in the codon is 6-fold degenerate ("6-fold")
The underlying base position in the codon is degenerate at any level ("all")

See the [http://en.wikipedia.org/wiki/Genetic_code#Degeneracy Wikipedia page on degeneracy] for more information.

JexlExpression

JexlExpression is an N-state stratification, where N is the number of JEXL expressions supplied to VariantEval. See [[Using JEXL expressions]]

Novelty

Novelty is a three-state stratification:

The locus overlaps the knowns comp track (usually the dbSNP track) ("known")
The locus does not overlap the knowns comp track ("novel")
The locus either overlaps or does not overlap the knowns comp track ("all")

CountVariants

CountVariants is an evaluation module that computes the following metrics:

Metric	Definition
nProcessedLoci	Number of processed loci
nCalledLoci	Number of called loci
nRefLoci	Number of reference loci
nVariantLoci	Number of variant loci
variantRate	Variants per loci rate
variantRatePerBp	Number of variants per base
nSNPs	Number of snp loci
nInsertions	Number of insertion
nDeletions	Number of deletions
nComplex	Number of complex loci
nNoCalls	Number of no calls loci
nHets	Number of het loci
nHomRef	Number of hom ref loci
nHomVar	Number of hom var loci
nSingletons	Number of singletons
heterozygosity	heterozygosity per locus rate
heterozygosityPerBp	heterozygosity per base pair
hetHomRatio	heterozygosity to homozygosity ratio
indelRate	indel rate (insertion count + deletion count)
indelRatePerBp	indel rate per base pair
deletionInsertionRatio	deletion to insertion ratio

CompOverlap

CompOverlap is an evaluation module that computes the following metrics:

Metric	Definition
nEvalSNPs	number of eval SNP sites
nCompSNPs	number of comp SNP sites
novelSites	number of eval sites outside of comp sites
nVariantsAtComp	number of eval sites at comp sites (that is, sharing the same locus as a variant in the comp track, regardless of whether the alternate allele is the same)
compRate	percentage of eval sites at comp sites
nConcordant	number of concordant sites (that is, for the sites that share the same locus as a variant in the comp track, those that have the same alternate allele)
concordantRate	the concordance rate

Understanding the output of CompOverlap

A SNP in the detection set is said to be 'concordant' if the position exactly matches an entry in dbSNP and the allele is the same. To understand this and other output of CompOverlap, we shall examine a detailed example. First, consider a fake dbSNP file (headers are suppressed so that one can see the important things):

 $ grep -v '##' dbsnp.vcf
 #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
 1       10327   rs112750067     T       C       .       .       ASP;R5;VC=SNP;VP=050000020005000000000100;WGT=1;dbSNPBuildID=132

Now, a detection set file with a single sample, where the variant allele is the same as listed in dbSNP:

 $ grep -v '##' eval_correct_allele.vcf
 #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT            001-6
 1       10327   .       T       C       5168.52 PASS    ...     GT:AD:DP:GQ:PL    0/1:357,238:373:99:3959,0,4059

Finally, a detection set file with a single sample, but the alternate allele differs from that in dbSNP:

 $ grep -v '##' eval_incorrect_allele.vcf
 #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT            001-6
 1       10327   .       T       A       5168.52 PASS    ...     GT:AD:DP:GQ:PL    0/1:357,238:373:99:3959,0,4059

Running VariantEval with just the CompOverlap module:

 $ java -jar $STING_DIR/dist/GenomeAnalysisTK.jar -T VariantEval \
        -R /seq/references/Homo_sapiens_assembly19/v1/Homo_sapiens_assembly19.fasta \
        -L 1:10327 \
        -B:dbsnp,VCF dbsnp.vcf \
        -B:eval_correct_allele,VCF eval_correct_allele.vcf \
        -B:eval_incorrect_allele,VCF eval_incorrect_allele.vcf \
        -noEV \
        -EV CompOverlap \
        -o eval.table

We find that the eval.table file contains the following:

 $ grep -v '##' eval.table | column -t 
 CompOverlap  CompRod  EvalRod                JexlExpression  Novelty  nEvalVariants  nCompVariants  novelSites  nVariantsAtComp  compRate      nConcordant  concordantRate
 CompOverlap  dbsnp    eval_correct_allele    none            all      1              1              0           1                100.00000000  1            100.00000000
 CompOverlap  dbsnp    eval_correct_allele    none            known    1              1              0           1                100.00000000  1            100.00000000
 CompOverlap  dbsnp    eval_correct_allele    none            novel    0              0              0           0                0.00000000    0            0.00000000
 CompOverlap  dbsnp    eval_incorrect_allele  none            all      1              1              0           1                100.00000000  0            0.00000000
 CompOverlap  dbsnp    eval_incorrect_allele  none            known    1              1              0           1                100.00000000  0            0.00000000
 CompOverlap  dbsnp    eval_incorrect_allele  none            novel    0              0              0           0                0.00000000    0            0.00000000

As you can see, the detection set variant was listed under nVariantsAtComp (meaning the variant was seen at a position listed in dbSNP), but only the eval_correct_allele dataset is shown to be concordant at that site, because the allele listed in this dataset and dbSNP match.

TiTvVariantEvaluator

TiTvVariantEvaluator is an evaluation module that computes the following metrics:

Metric	Definition
nTi	number of transition loci
nTv	number of transversion loci
tiTvRatio	the transition to transversion ratio
nTiInComp	number of comp transition sites
nTvInComp	number of comp transversion sites
TiTvRatioStandard	the transition to transversion ratio for comp sites