From a8935c99fc0bedbb7851b1223511dd58a1ad048d Mon Sep 17 00:00:00 2001
From: Chris Hartl
+ * DepthOfCoverage processes a set of bam files to determine coverage at different levels of partitioning and
+ * aggregation. Coverage can be analyzed per locus, per interval, per gene, or in total; can be partitioned by
+ * sample, by read group, by technology, by center, or by library; and can be summarized by mean, median, quartiles,
+ * and/or percentage of bases covered to or beyond a threshold.
+ * Additionally, reads and bases can be filtered by mapping or base quality score.
+ *
+ *
+ * One or more bam files (with proper headers) to be analyzed for coverage statistics
+ * (Optional) A REFSEQ Rod to aggregate coverage to the gene level
+ *
+ * Tables pertaining to different coverage summaries. Suffix on the table files declares the contents:
+ * - no suffix: per locus coverage
+ * - _summary: total, mean, median, quartiles, and threshold proportions, aggregated over all bases
+ * - _statistics: coverage histograms (# locus with X coverage), aggregated over all bases
+ * - _interval_summary: total, mean, median, quartiles, and threshold proportions, aggregated per interval
+ * - _interval_statistics: 2x2 table of # of intervals covered to >= X depth in >=Y samples
+ * - _gene_summary: total, mean, median, quartiles, and threshold proportions, aggregated per gene
+ * - _gene_statistics: 2x2 table of # of genes covered to >= X depth in >= Y samples
+ * - _cumulative_coverage_counts: coverage histograms (# locus with >= X coverage), aggregated over all bases
+ * - _cumulative_coverage_proportions: proprotions of loci with >= X coverage, aggregated over all bases
+ *
+ * ValidationAmplicons consumes a VCF and an Interval list and produces FASTA sequences from which PCR primers or probe
+ * sequences can be designed. In addition, ValidationAmplicons uses BWA to check for specificity of tracts of bases within
+ * the output amplicon, lower-casing non-specific tracts, allows for users to provide sites to mask out, and specifies
+ * reasons why the site may fail validation (nearby variation, for example).
+ *
+ * Requires a VCF containing alleles to design amplicons towards, a VCF of variants to mask out of the amplicons, and an
+ * interval list defining the size of the amplicons around the sites to be validated
+ *
+ * Output is a FASTA-formatted file with some modifications at probe sites. For instance:
+ * Input
+ * Output
+ * Examples
+ *
+ * java -Xmx2g -jar GenomeAnalysisTK.jar \
+ * -R ref.fasta \
+ * -T VariantEval \
+ * -o file_name_base \
+ * -I input_bams.list
+ * [-geneList refSeq.sorted.txt] \
+ * [-pt readgroup] \
+ * [-ct 4 -ct 6 -ct 10] \
+ * [-L my_capture_genes.interval_list]
+ *
*
- * @Author chartl
- * @Date Feb 22, 2010
*/
// todo -- cache the map from sample names to means in the print functions, rather than regenerating each time
// todo -- support for granular histograms for total depth; maybe n*[start,stop], bins*sqrt(n)
diff --git a/public/java/src/org/broadinstitute/sting/gatk/walkers/validation/ValidationAmplicons.java b/public/java/src/org/broadinstitute/sting/gatk/walkers/validation/ValidationAmplicons.java
index 61149e5d9..cd2891874 100755
--- a/public/java/src/org/broadinstitute/sting/gatk/walkers/validation/ValidationAmplicons.java
+++ b/public/java/src/org/broadinstitute/sting/gatk/walkers/validation/ValidationAmplicons.java
@@ -30,21 +30,77 @@ import java.util.LinkedList;
import java.util.List;
/**
- * Created by IntelliJ IDEA.
- * User: chartl
- * Date: 6/13/11
- * Time: 2:12 PM
- * To change this template use File | Settings | File Templates.
+ * Creates FASTA sequences for use in Seqenom or PCR utilities for site amplification and subsequent validation
+ *
+ * Input
+ * Output
+ *
+ * >20:207414 INSERTION=1,VARIANT_TOO_NEAR_PROBE=1, 20_207414
+ * CCAACGTTAAGAAAGAGACATGCGACTGGGTgcggtggctcatgcctggaaccccagcactttgggaggccaaggtgggc[A/G*]gNNcacttgaggtcaggagtttgagaccagcctggccaacatggtgaaaccccgtctctactgaaaatacaaaagttagC
+ * >20:792122 Valid 20_792122
+ * TTTTTTTTTagatggagtctcgctcttatcgcccaggcNggagtgggtggtgtgatcttggctNactgcaacttctgcct[-/CCC*]cccaggttcaagtgattNtcctgcctcagccacctgagtagctgggattacaggcatccgccaccatgcctggctaatTT
+ * >20:994145 Valid 20_994145
+ * TCCATGGCCTCCCCCTGGCCCACGAAGTCCTCAGCCACCTCCTTCCTGGAGGGCTCAGCCAAAATCAGACTGAGGAAGAAG[AAG/-*]TGGTGGGCACCCACCTTCTGGCCTTCCTCAGCCCCTTATTCCTAGGACCAGTCCCCATCTAGGGGTCCTCACTGCCTCCC
+ * >20:1074230 SITE_IS_FILTERED=1, 20_1074230
+ * ACCTGATTACCATCAATCAGAACTCATTTCTGTTCCTATCTTCCACCCACAATTGTAATGCCTTTTCCATTTTAACCAAG[T/C*]ACTTATTATAtactatggccataacttttgcagtttgaggtatgacagcaaaaTTAGCATACATTTCATTTTCCTTCTTC
+ * >20:1084330 DELETION=1, 20_1084330
+ * CACGTTCGGcttgtgcagagcctcaaggtcatccagaggtgatAGTTTAGGGCCCTCTCAAGTCTTTCCNGTGCGCATGG[GT/AC*]CAGCCCTGGGCACCTGTNNNNNNNNNNNNNTGCTCATGGCCTTCTAGATTCCCAGGAAATGTCAGAGCTTTTCAAAGCCC
+ *
+ * are amplicon sequences resulting from running the tool. The flags (preceding the sequence itself) can be:
+ *
+ * Valid // amplicon is valid
+ * SITE_IS_FILTERED=1 // validation site is not marked 'PASS' or '.' in its filter field ("you are trying to validate a filtered variant")
+ * VARIANT_TOO_NEAR_PROBE=1 // there is a variant too near to the variant to be validated, potentially shifting the mass-spec peak
+ * MULTIPLE_PROBES=1, // multiple variants to be validated found inside the same amplicon
+ * DELETION=6,INSERTION=5, // 6 deletions and 5 insertions found inside the amplicon region (from the "mask" VCF), will be potentially difficult to validate
+ * DELETION=1, // deletion found inside the amplicon region, could shift mass-spec peak
+ * START_TOO_CLOSE, // variant is too close to the start of the amplicon region to give sequenom a good chance to find a suitable primer
+ * END_TOO_CLOSE, // variant is too close to the end of the amplicon region to give sequenom a good chance to find a suitable primer
+ * NO_VARIANTS_FOUND, // no variants found within the amplicon region
+ * INDEL_OVERLAPS_VALIDATION_SITE, // an insertion or deletion interferes directly with the site to be validated (i.e. insertion directly preceding or postceding, or a deletion that spans the site itself)
+ *