Commit Graph

60 Commits (b63d64bbbc73ab9ef2f8b60c0f63b35f9882c813)

Author SHA1 Message Date
depristo b63d64bbbc Beautiful labels, better choice of dimension ranges. Supports fast loading of just first N records for testing.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3964 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-06 23:17:32 +00:00
depristo d3bebe0f2c Reasonable comment
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3963 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-06 22:03:55 +00:00
depristo bb5dfd7e5e Slightly nicer plotting; not yet complete
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3961 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-06 20:01:31 +00:00
depristo 70f492a6e8 Prints out trivial debugging info
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3957 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-06 13:24:21 +00:00
kiran 1a36cb9296 Can now set the maximum number of variants to see in a cluster plot (useful when you don't need to see a billion points to get an idea of what's going on. Limit applies to known and novel variants separately.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3937 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-04 18:45:24 +00:00
kiran bd27287fe7 An R module that takes in a Variant Recalibration cluster file (file with '@!CLUSTER' lines in it), a tabularized VCF, and optionally a set of loci that should be examined more carefully, and emits a tremendous number of plots. For every annotation used in clustering, the distributions and pair-wise comparison (with ellipses denoting the 2-sigma cluster boundaries) are shown. Each cluster is shaded with a color proportional to its mixture coefficient.
To use this module, you'll first have to take your VCF and create an R-readable table out of it with the following command:

python /path/to/Sting/trunk/python/vcf2table.py -f CHROM,POS,ID,AC,AF,AN,DB,DP,HRun,MQ,MQ0,MyHaplotypeScore,QD,SB my.vcf > my.vcf.table

Then, simply invoke this module with the command:

Rscript /path/to/Sting/trunk/R/VariantRecalibratorReport/VariantRecalibratorReport.R /path/to/output/prefix /path/to/my/my.clusters /path/to/my.vcf.table [/path/to/my.suspicious.loci]

This will create a number of plots all with the prefix "/path/to/output/prefix".  For instance, if you used QD, SB, HRun, and MyHaplotypeScore annotations during clustering, you should see output like this:

    /path/to/output/prefix.anndist.HRun.pdf
    /path/to/output/prefix.anndist.MyHaplotypeScore.pdf
    /path/to/output/prefix.anndist.QD.pdf
    /path/to/output/prefix.anndist.SB.pdf
    /path/to/output/prefix.cluster.HRun_vs_MyHaplotypeScore.pdf
    /path/to/output/prefix.cluster.HRun_vs_QD.pdf
    /path/to/output/prefix.cluster.HRun_vs_SB.pdf
    /path/to/output/prefix.cluster.MyHaplotypeScore_vs_HRun.pdf
    /path/to/output/prefix.cluster.MyHaplotypeScore_vs_QD.pdf
    /path/to/output/prefix.cluster.MyHaplotypeScore_vs_SB.pdf
    /path/to/output/prefix.cluster.QD_vs_HRun.pdf
    /path/to/output/prefix.cluster.QD_vs_MyHaplotypeScore.pdf
    /path/to/output/prefix.cluster.QD_vs_SB.pdf
    /path/to/output/prefix.cluster.SB_vs_HRun.pdf
    /path/to/output/prefix.cluster.SB_vs_MyHaplotypeScore.pdf
    /path/to/output/prefix.cluster.SB_vs_QD.pdf



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3936 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-04 18:35:14 +00:00
kiran b990a22bac A very nice way of automatically plotting the results of a VariantEval run. All of the hard work is actually in the common R repository, gsacommons.R, including methods for creating a Venn diagram. It also provides a mechanism for the output of a VariantEval run to be loaded into a single list object.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3828 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-19 12:38:26 +00:00
depristo 6ffcaa0afe Can run R scripts on the command line
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3750 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-09 00:13:18 +00:00
depristo 66931d433c useful routines for R
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3685 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-30 16:38:49 +00:00
corin bcab0eba01 This replaces tearsheet.r, neatens up graphics, and allows the script to be used in R's interactive environment
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3625 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-24 01:02:58 +00:00
corin ae88630d52 This script produces tearsheet and data processing report figures and tables when given Squid and Firehose produced data
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3594 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-18 21:36:29 +00:00
corin a2c266bda3 This script accpets file paths to analysis metrics tables and produces tearsheet data and data processing report graphs
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3585 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-18 13:02:25 +00:00
corin 266a47d83d This file automaticially generates data and graphics for tearsheets and data processing reports
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3551 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-14 18:39:23 +00:00
rpoplin 290771a8c2 Automatic cutting of recalibrated variant calls using ApplyVariantCuts. VariantRecalibrator produces the tranches plot alongside the optimization curve. Specify the levels using -tranche 1.0 -tranche 5.0 etc
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3472 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-02 15:03:00 +00:00
rpoplin 33a9549896 Variant Optimizer accepts a dbSNP rod arugment to use in determining known/novel status as opposed to using the rsID in the vcf record. VO generates plots of annotation values used in clustering broken out by knowns and novels. Useful for showing which annotations are approximately Gaussian.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3332 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-09 16:48:07 +00:00
chartl dc802aa26f Moved CoverageStatistics to core. This will be (soon) renamed DepthOfCoverage; so please use CoverageStatistics
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3090 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-29 13:32:00 +00:00
rpoplin 06a212e612 Adding VariantConcordanceROCCurveWalker to create ROC curves comparing concordance between optimized call sets and validation truth sets in VCF format in order to evaluate performance of variant optimizer independently of achieving a particular novel ti/tv ratio. Added option to ignore only the specified filters in the input call sets via --ignore_filter <String>. Added option to provide a prior estimate of error for known snps via --known_prior <qual>. The het and hom calls are clustered independently. Infrastructure in place to use titv of known snps to inform p(true) of novel snps. Tweaked protection against overfitting based on suggestions from several people. Minor edits to AnalyzeAnnotations.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3071 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-24 19:43:10 +00:00
rpoplin c78fc23ec5 Minor updates to output of variant optimizer.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3031 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-18 12:46:47 +00:00
rpoplin 58a31bab6a Variant optimizer now outputs VCF files via ApplyVariantClustersWalker. Documentation to be added to the wiki. It is ready to be used by other people but only with great caution.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3028 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-17 20:41:42 +00:00
rpoplin 933823c8bc Removed the StingException when mkdir fails for Sendu in AnalyzeCovariates. Incremental updates to VariantOptimizer.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3013 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-16 19:45:02 +00:00
chartl ee68e38e02 Eliminate the shell items, as FH will be calling this with /broad/tools/apps/R-2.72/bin/Rscript
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2968 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-09 20:15:21 +00:00
chartl aa7191353a PlotDepthOfCoverage now produces a set of useful QC plots. Currently a first-draft, and it is unclear how the visualization will scale with increasing sample size and/or depth.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2962 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-09 16:42:35 +00:00
chartl 81ffb8243d Waypoint commit of plotting R script for Depth Of Coverage/Coverage Statistics
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2958 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-08 21:42:51 +00:00
kshakir 36129e01e4 Using bitmap() instead of png() since the former doesn't rely on X11.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2873 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-23 05:31:51 +00:00
kshakir 3738b76320 Added a playground concordance analyzer for summarizing VariantEval across a group.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2867 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-22 20:28:52 +00:00
chartl f02e94ab6f Eliminate the rescale factor -- heatmap automatically normalizes the data
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2845 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-16 16:34:33 +00:00
chartl 37fa1bf0cc Added heatmap function
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2843 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-16 15:12:54 +00:00
chartl 951b7a2433 First of what will be an increasingly useful set of tools, compiled into one command-line runnable library -- the goal is to have one plotting library that's callable because of limitations on the number of files you can package with a GenePattern module.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2841 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-15 16:51:47 +00:00
rpoplin 233a652161 Making the dotted quartile lines more clear.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2772 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-03 22:23:09 +00:00
rpoplin 64fc76e4bf Added an option to AnalyzeCovariates to set the max value of the histograms to make them easier to directly compare.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2753 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-31 23:13:57 +00:00
rpoplin 16da5011c0 Added a new option for indicating the mean number of variants on the AnalyzeAnnotations plots. This way one can say, for example, filtering at this point will keep 75 percent of all the variants.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2744 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-29 21:58:31 +00:00
rpoplin c6cc844e55 Added -name argument to AnalyzeAnnotations that allows one to specify the name of the annotation to be used on the plots. Instead of seeing AB and DP, one can add -name AB,AlleleBalance -name DP,Depth
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2742 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-29 20:48:53 +00:00
rpoplin 4f29a1d4f6 AnalyzeAnnotations now plots true positive rate instead of percentage of variants found in the truth set. Committing GCContentCovariate to help people experiment with correcting the pilot3/Kristian base calling error mode in slx.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2740 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-29 20:01:56 +00:00
rpoplin 79c4cc1db7 AnalyzeAnnotations now breaks out titv by calls in hapmap and also plots true positive rates. Any RODs passed in whose name starts with 'truth' is considered to be the truth set.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2726 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-28 21:41:23 +00:00
rpoplin b8ae083d1b AnalyzeAnnotations creates a plot of dbsnp rate as a function of the annotations.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2711 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-27 21:08:33 +00:00
rpoplin fc4285f9fd AnalyzeAnnotations seems to be popular so I've rewritten the guts to be easier to extend and maintain.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2707 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-27 19:30:31 +00:00
rpoplin 4bcdab580c --output_dir has been changed to --output_prefix to give the user more control over the names of the resulting mass of files in AnalyzeAnnotations. The fontsize of the axes is increased. Cumulative filtering plots are removed since the binned filtering plots are much more useful.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2700 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-27 04:50:54 +00:00
rpoplin 24d4082925 AnalyzeAnnotations can now process only variants that are found in samples that match the -sampleName argument. X-axis of plots no longer use annoying scientific notation.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2684 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-25 20:52:11 +00:00
rpoplin 2b51cf18f0 AnalyzeAnnotations now outputs plots with log x-axis in addition to standard x-axis so things like DP and MQ0 are easier to see. AnalyzeAnnotations now skips over all annotations that aren't floating point values. Recalibrator now warns users if PL tags are missing and so therefore it is reverting to illumina.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2681 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-25 19:39:18 +00:00
rpoplin a11503819a AnalyzeAnnotations now breaks out its TiTv plots into novel SNPs, dbSNP sites, and combined.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2659 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-22 19:00:23 +00:00
rpoplin d9df72e1b5 AnalyzeAnnotations now bins variants per each annotation and outputs plots of TiTv ratio as a function of the annotation's value.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2654 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-21 21:15:11 +00:00
rpoplin ba19afd529 Draft version of AnalyzeAnnotations which creates plots of cumulative TiTv ratio versus filter value per each annotation in the input VCF rod. Minor cleanup of recalibration walkers.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2623 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-18 20:47:10 +00:00
rpoplin 7f97041875 Update to AnalyzeCovariates to make the histogram of PairedReadOrder look a little nicer
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2575 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-13 20:26:31 +00:00
rpoplin cea544871d Fixed an issue with recalibrating original quality scores above Q40. There is a new option -maxQ which sets the maximum quality score possible for when a RecalDatum tries to compute its quality score from the mismatch rate. The same option was added to AnalyzeCovariates to help with plotting q scores above Q40. Added an integration test which makes use of this new -maxQ option.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2534 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-07 13:50:30 +00:00
rpoplin 562db45fa5 Sites that were marked NO_DINUC no longer get dinuc-corrected but are still recalibrated using the other available covariates. Solid cycle is now the same as Illumina cycle pending an analysis that looks at the effect of PrimerRoundCovariate. Solid color space methods cleaned up to reduce number of calls to read.getAttribute(). Polished NHashMap sort method in preparation for move to core/utils. Added additional plots in AnalyzeCovariates to look at reported quality as a function of the covariate.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2451 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-28 20:19:37 +00:00
aaron 1ae333a1c1 R script for graphing depth of coverage by sample name, and generating a loess curve for each sample's data.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2317 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-10 21:58:01 +00:00
rpoplin 088363ce42 Added entropy calculation to histogram of quality scores
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2316 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-10 21:57:35 +00:00
rpoplin 12ec154f01 Make the AnalyzeCovariate plots look a little nicer when there are a small number of data points
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2298 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-09 21:22:40 +00:00
rpoplin 855face681 Histogram of covariate values now goes from 0 to max value which makes it look nicer in most cases.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2259 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-04 14:44:03 +00:00
rpoplin 985daec76e Fixed problem with integer overflow in R scripts.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2258 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-04 14:24:49 +00:00