depristo
b57a0a0310
improvements to the report code
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4280 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-15 00:45:13 +00:00
kiran
dfdd0b69a9
Removed unused dependency (it was causing a problem by looking for an X11 connection that didn't necessarily exist).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4244 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-09 19:56:00 +00:00
depristo
594fb4a547
More plots in report
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4225 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-08 02:56:51 +00:00
kiran
19e22cfa87
Fixed a bug where the script looked for the wrong column name. Also, all results are now returned in a single plot.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4216 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-06 14:19:57 +00:00
depristo
0c54bf4195
Better reporting and now with a special mode for listing exceptions
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4183 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-01 16:19:51 +00:00
corin
cdad243645
updated version of the DPR. Now produces part of the tearsheet as well as good depth of coverage figures
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4182 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-01 15:38:58 +00:00
depristo
fc5caa98a5
Improved reporting now with metrics by day/week/etc.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4180 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-01 02:43:13 +00:00
depristo
8683087756
Suppl. tools for working with and displaying GATK run reports
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4176 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-31 20:32:22 +00:00
kiran
e14a347e2e
Now prints cluster report to a single PDF, rather than a dozen different PDFs.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4164 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-29 18:58:39 +00:00
kiran
fd19c63aaf
A data structure that allows data to be collected over the course of a walker's computation, then have that data written to a PrintStream such that it's human-readable, AWK-able, and R-friendly (given that you load it using the GATKReport loader module).
...
This object designed to be both the structure that holds data during the execution of the walker, as well as the object that properly formats and emits the data so that it can be easily loaded into R. In the end, you get a table that looks like this:
##:GATKReport.v0.1 ErrorRatePerCycle : The error rate per sequenced position in the reads
cycle errorrate.61PA8.7 qualavg.61PA8.7
0 0.007451835696110506 25.474613284804366
1 0.002362777171937477 29.844949954504095
2 9.087604507451836E-4 32.87590975254731
3 5.452562704471102E-4 34.498999090081895
4 9.087604507451836E-4 35.14831665150137
5 5.452562704471102E-4 36.07223435225619
6 5.452562704471102E-4 36.1217248908297
7 5.452562704471102E-4 36.1910480349345
8 5.452562704471102E-4 36.00345705967977
...
A GATKReport object can hold multiple tables, and the write() method will emit all tables in succession. Each table starts with its own ##:GATKReport.v0.1 table header, so each table can stand alone. This allows for tables to be mixed and matched in a single file, or for the output from different walkers to be combined into a single file with no ill effect.
The display property of individual columns can be turned off. This is useful when a column is used to store intermediate results, necesary for the computation of some later value, but the contents of the intermediate column itself are not required in the final output file.
Finally, the GATKReportTable allows for some simple, mathematical, element-wise and column-wise operations. For instance, two whole columns can be divided, the results of the operation being stored in a third column. This mimics the most basic of R operations, where whole vectors can be added, subtracted, multiplied or divided without requiring the developer to explicitly write a loop.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4159 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-29 05:39:24 +00:00
corin
8931a63588
updated a whole bunch of column names to work like i want them to and added more informative figures for DOC
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4131 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-26 18:19:09 +00:00
kiran
fba71e3c15
Placeholder commit. Implements a loader for a new multi-part GATK reporting format. See what it looks like at /home/radon01/kiran/scr1/projects/NewVariantEvalOutput/results/v1/tableexample.txt . Still need to address the issue where numeric columns are being interpreted as a vector of strings, not numbers.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4115 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-25 18:48:44 +00:00
corin
8054b6b295
Changing a name of a column for variantevals output for easier reading by R--let me know if this needs to be updated elsewhere; it's just a space to an underscore.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4062 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-19 19:18:16 +00:00
depristo
ede87a03c2
Nicer plotting routine for tranches. Add a third arg to suppress the legend.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4049 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-17 19:20:58 +00:00
depristo
e0abb73fd7
plot now assumes 1 / 1000 is the min error rate, not 1/100
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4010 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-11 14:48:22 +00:00
kiran
6037443e55
Handle interactive and non-interactive modes more elegantly.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4009 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-11 02:38:53 +00:00
kiran
a7409df1a6
Be more robust to missing or empty files in VariantEval output.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4008 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-11 02:22:50 +00:00
depristo
67063deb16
Removed coloring by mixture weight. Each cluster gets a distinct color, and the legend indicates which cluster has which id and its weight
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4001 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-10 14:28:24 +00:00
depristo
672bee295c
now plots tranches separately from optimizer
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4000 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-10 12:02:52 +00:00
depristo
41fee2d75e
Publication tranches report is now the default output
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3967 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-07 13:58:59 +00:00
depristo
f4ffef4479
Default max variants is now 5000
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3966 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-07 13:58:32 +00:00
depristo
b63d64bbbc
Beautiful labels, better choice of dimension ranges. Supports fast loading of just first N records for testing.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3964 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-06 23:17:32 +00:00
depristo
d3bebe0f2c
Reasonable comment
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3963 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-06 22:03:55 +00:00
depristo
bb5dfd7e5e
Slightly nicer plotting; not yet complete
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3961 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-06 20:01:31 +00:00
depristo
70f492a6e8
Prints out trivial debugging info
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3957 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-06 13:24:21 +00:00
kiran
1a36cb9296
Can now set the maximum number of variants to see in a cluster plot (useful when you don't need to see a billion points to get an idea of what's going on. Limit applies to known and novel variants separately.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3937 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-04 18:45:24 +00:00
kiran
bd27287fe7
An R module that takes in a Variant Recalibration cluster file (file with '@!CLUSTER' lines in it), a tabularized VCF, and optionally a set of loci that should be examined more carefully, and emits a tremendous number of plots. For every annotation used in clustering, the distributions and pair-wise comparison (with ellipses denoting the 2-sigma cluster boundaries) are shown. Each cluster is shaded with a color proportional to its mixture coefficient.
...
To use this module, you'll first have to take your VCF and create an R-readable table out of it with the following command:
python /path/to/Sting/trunk/python/vcf2table.py -f CHROM,POS,ID,AC,AF,AN,DB,DP,HRun,MQ,MQ0,MyHaplotypeScore,QD,SB my.vcf > my.vcf.table
Then, simply invoke this module with the command:
Rscript /path/to/Sting/trunk/R/VariantRecalibratorReport/VariantRecalibratorReport.R /path/to/output/prefix /path/to/my/my.clusters /path/to/my.vcf.table [/path/to/my.suspicious.loci]
This will create a number of plots all with the prefix "/path/to/output/prefix". For instance, if you used QD, SB, HRun, and MyHaplotypeScore annotations during clustering, you should see output like this:
/path/to/output/prefix.anndist.HRun.pdf
/path/to/output/prefix.anndist.MyHaplotypeScore.pdf
/path/to/output/prefix.anndist.QD.pdf
/path/to/output/prefix.anndist.SB.pdf
/path/to/output/prefix.cluster.HRun_vs_MyHaplotypeScore.pdf
/path/to/output/prefix.cluster.HRun_vs_QD.pdf
/path/to/output/prefix.cluster.HRun_vs_SB.pdf
/path/to/output/prefix.cluster.MyHaplotypeScore_vs_HRun.pdf
/path/to/output/prefix.cluster.MyHaplotypeScore_vs_QD.pdf
/path/to/output/prefix.cluster.MyHaplotypeScore_vs_SB.pdf
/path/to/output/prefix.cluster.QD_vs_HRun.pdf
/path/to/output/prefix.cluster.QD_vs_MyHaplotypeScore.pdf
/path/to/output/prefix.cluster.QD_vs_SB.pdf
/path/to/output/prefix.cluster.SB_vs_HRun.pdf
/path/to/output/prefix.cluster.SB_vs_MyHaplotypeScore.pdf
/path/to/output/prefix.cluster.SB_vs_QD.pdf
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3936 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-04 18:35:14 +00:00
kiran
b990a22bac
A very nice way of automatically plotting the results of a VariantEval run. All of the hard work is actually in the common R repository, gsacommons.R, including methods for creating a Venn diagram. It also provides a mechanism for the output of a VariantEval run to be loaded into a single list object.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3828 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-19 12:38:26 +00:00
depristo
6ffcaa0afe
Can run R scripts on the command line
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3750 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-09 00:13:18 +00:00
depristo
66931d433c
useful routines for R
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3685 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-30 16:38:49 +00:00
corin
bcab0eba01
This replaces tearsheet.r, neatens up graphics, and allows the script to be used in R's interactive environment
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3625 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-24 01:02:58 +00:00
corin
ae88630d52
This script produces tearsheet and data processing report figures and tables when given Squid and Firehose produced data
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3594 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-18 21:36:29 +00:00
corin
a2c266bda3
This script accpets file paths to analysis metrics tables and produces tearsheet data and data processing report graphs
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3585 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-18 13:02:25 +00:00
corin
266a47d83d
This file automaticially generates data and graphics for tearsheets and data processing reports
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3551 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-14 18:39:23 +00:00
rpoplin
290771a8c2
Automatic cutting of recalibrated variant calls using ApplyVariantCuts. VariantRecalibrator produces the tranches plot alongside the optimization curve. Specify the levels using -tranche 1.0 -tranche 5.0 etc
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3472 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-02 15:03:00 +00:00
rpoplin
33a9549896
Variant Optimizer accepts a dbSNP rod arugment to use in determining known/novel status as opposed to using the rsID in the vcf record. VO generates plots of annotation values used in clustering broken out by knowns and novels. Useful for showing which annotations are approximately Gaussian.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3332 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-09 16:48:07 +00:00
chartl
dc802aa26f
Moved CoverageStatistics to core. This will be (soon) renamed DepthOfCoverage; so please use CoverageStatistics
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3090 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-29 13:32:00 +00:00
rpoplin
06a212e612
Adding VariantConcordanceROCCurveWalker to create ROC curves comparing concordance between optimized call sets and validation truth sets in VCF format in order to evaluate performance of variant optimizer independently of achieving a particular novel ti/tv ratio. Added option to ignore only the specified filters in the input call sets via --ignore_filter <String>. Added option to provide a prior estimate of error for known snps via --known_prior <qual>. The het and hom calls are clustered independently. Infrastructure in place to use titv of known snps to inform p(true) of novel snps. Tweaked protection against overfitting based on suggestions from several people. Minor edits to AnalyzeAnnotations.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3071 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-24 19:43:10 +00:00
rpoplin
c78fc23ec5
Minor updates to output of variant optimizer.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3031 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-18 12:46:47 +00:00
rpoplin
58a31bab6a
Variant optimizer now outputs VCF files via ApplyVariantClustersWalker. Documentation to be added to the wiki. It is ready to be used by other people but only with great caution.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3028 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-17 20:41:42 +00:00
rpoplin
933823c8bc
Removed the StingException when mkdir fails for Sendu in AnalyzeCovariates. Incremental updates to VariantOptimizer.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3013 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-16 19:45:02 +00:00
chartl
ee68e38e02
Eliminate the shell items, as FH will be calling this with /broad/tools/apps/R-2.72/bin/Rscript
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2968 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-09 20:15:21 +00:00
chartl
aa7191353a
PlotDepthOfCoverage now produces a set of useful QC plots. Currently a first-draft, and it is unclear how the visualization will scale with increasing sample size and/or depth.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2962 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-09 16:42:35 +00:00
chartl
81ffb8243d
Waypoint commit of plotting R script for Depth Of Coverage/Coverage Statistics
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2958 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-08 21:42:51 +00:00
kshakir
36129e01e4
Using bitmap() instead of png() since the former doesn't rely on X11.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2873 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-23 05:31:51 +00:00
kshakir
3738b76320
Added a playground concordance analyzer for summarizing VariantEval across a group.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2867 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-22 20:28:52 +00:00
chartl
f02e94ab6f
Eliminate the rescale factor -- heatmap automatically normalizes the data
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2845 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-16 16:34:33 +00:00
chartl
37fa1bf0cc
Added heatmap function
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2843 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-16 15:12:54 +00:00
chartl
951b7a2433
First of what will be an increasingly useful set of tools, compiled into one command-line runnable library -- the goal is to have one plotting library that's callable because of limitations on the number of files you can package with a GenePattern module.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2841 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-15 16:51:47 +00:00
rpoplin
233a652161
Making the dotted quartile lines more clear.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2772 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-03 22:23:09 +00:00