gatk-3.8/R
kiran fd19c63aaf A data structure that allows data to be collected over the course of a walker's computation, then have that data written to a PrintStream such that it's human-readable, AWK-able, and R-friendly (given that you load it using the GATKReport loader module).
This object designed to be both the structure that holds data during the execution of the walker, as well as the object that properly formats and emits the data so that it can be easily loaded into R.  In the end, you get a table that looks like this:

##:GATKReport.v0.1 ErrorRatePerCycle : The error rate per sequenced position in the reads
cycle  errorrate.61PA8.7         qualavg.61PA8.7
0      0.007451835696110506      25.474613284804366
1      0.002362777171937477      29.844949954504095
2      9.087604507451836E-4      32.87590975254731
3      5.452562704471102E-4      34.498999090081895
4      9.087604507451836E-4      35.14831665150137
5      5.452562704471102E-4      36.07223435225619
6      5.452562704471102E-4      36.1217248908297
7      5.452562704471102E-4      36.1910480349345
8      5.452562704471102E-4      36.00345705967977
...

A GATKReport object can hold multiple tables, and the write() method will emit all tables in succession.  Each table starts with its own ##:GATKReport.v0.1 table header, so each table can stand alone.  This allows for tables to be mixed and matched in a single file, or for the output from different walkers to be combined into a single file with no ill effect.

The display property of individual columns can be turned off.  This is useful when a column is used to store intermediate results, necesary for the computation of some later value, but the contents of the intermediate column itself are not required in the final output file.

Finally, the GATKReportTable allows for some simple, mathematical, element-wise and column-wise operations.  For instance, two whole columns can be divided, the results of the operation being stored in a third column.  This mimics the most basic of R operations, where whole vectors can be added, subtracted, multiplied or divided without requiring the developer to explicitly write a loop.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4159 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-29 05:39:24 +00:00
..
VariantRecalibratorReport Removed coloring by mixture weight. Each cluster gets a distinct color, and the legend indicates which cluster has which id and its weight 2010-08-10 14:28:24 +00:00
VariantReport Handle interactive and non-interactive modes more elegantly. 2010-08-11 02:38:53 +00:00
analyzeConcordance Using bitmap() instead of png() since the former doesn't rely on X11. 2010-02-23 05:31:51 +00:00
Data.Processing.Report.r updated a whole bunch of column names to work like i want them to and added more informative figures for DOC 2010-08-26 18:19:09 +00:00
GATKReport.R A data structure that allows data to be collected over the course of a walker's computation, then have that data written to a PrintStream such that it's human-readable, AWK-able, and R-friendly (given that you load it using the GATKReport loader module). 2010-08-29 05:39:24 +00:00
PlotDepthOfCoverage.R Moved CoverageStatistics to core. This will be (soon) renamed DepthOfCoverage; so please use CoverageStatistics 2010-03-29 13:32:00 +00:00
generateBySamplePlot.R R script for graphing depth of coverage by sample name, and generating a loess curve for each sample's data. 2009-12-10 21:58:01 +00:00
gsacommons.R Be more robust to missing or empty files in VariantEval output. 2010-08-11 02:22:50 +00:00
plot_Annotations_BinnedTruthMetrics.R Can run R scripts on the command line 2010-07-09 00:13:18 +00:00
plot_ClusterReport.R Can run R scripts on the command line 2010-07-09 00:13:18 +00:00
plot_OptimizationCurve.R now plots tranches separately from optimizer 2010-08-10 12:02:52 +00:00
plot_Tranches.R Nicer plotting routine for tranches. Add a third arg to suppress the legend. 2010-08-17 19:20:58 +00:00
plot_residualError_OtherCovariate.R Can run R scripts on the command line 2010-07-09 00:13:18 +00:00
plot_residualError_QualityScoreCovariate.R Can run R scripts on the command line 2010-07-09 00:13:18 +00:00
plot_variantROCCurve.R Can run R scripts on the command line 2010-07-09 00:13:18 +00:00
plotting_library.R Can run R scripts on the command line 2010-07-09 00:13:18 +00:00
tearsheet.r This script produces tearsheet and data processing report figures and tables when given Squid and Firehose produced data 2010-06-18 21:36:29 +00:00
titvFPEst.R now plots tranches separately from optimizer 2010-08-10 12:02:52 +00:00
whole_exome_bait_selection.R R script for selecting a variety of baits (using %GC content and normalized coverage) for Nanostring assessment from those used in the Agilent whole exome hybrid selection design. 2009-09-22 18:10:14 +00:00