kiran
3b76034d50
Namespace changes to avoid conflicts with other packages.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4950 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-06 15:33:26 +00:00
kiran
ab143c82af
Selects only the project requested via the Oracle command, rather than selecting everything and then subsetting.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4949 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-06 15:32:32 +00:00
depristo
5583fa179b
Now uses PNGs and a very high downsampling value to more clearly display the information
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4928 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-03 13:57:51 +00:00
depristo
dba30c4118
Subsampling of points, for the case where we have enormous numbers of points
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4927 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-03 13:48:11 +00:00
depristo
54adbd2581
Added instantaneous rate plotting routine to performance log plotter
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4926 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-03 13:33:35 +00:00
depristo
5539c2d9f3
--performanceLog (-PF) X.dat argument now enabled. Writes out a table (R-friendly) of the performance of the GATK over time, exactly as a more detailed version of the INFO progress meter. R script for useful plotting of the performance of the GATK over time. Will be helpful for upcoming scalability testing and debugging of memory leaks and other incremental performance problems
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4921 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-02 23:34:21 +00:00
depristo
586d3f05d9
Better cumhist function
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4920 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-02 23:32:20 +00:00
depristo
b024b85798
misc.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4899 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-22 20:16:41 +00:00
fromer
cc909602c7
Vectorize() pDirectlyPhaseHetPairAtDistanceUsingDepth; deal with minor precision issues
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4890 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-21 17:18:26 +00:00
fromer
a829284d84
Return analyzed window sizes as well
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4884 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-20 17:00:17 +00:00
fromer
6310a524d9
Do not abort integration (over het-het distances) on errors, but warn about them
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4855 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-16 17:20:25 +00:00
fromer
b1f0df0047
Handle case where read lengths are longer than fragment size
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4852 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-16 02:19:55 +00:00
fromer
a5e1854b3a
Forgot to pass correct parameters to calcPhasingProbsForWindowDistances()
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4851 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-16 01:58:06 +00:00
fromer
2b0dc8625c
Updated RBP theoretical model as per Mark's insights regarding the correct understanding of insert sizes being calculated post-hoc from the distance between read lengths. The correct way to think of it is: 1) There's a fragment of length F. 2) Each of it's two ends are read for L bases. 3) The insert size = i = F - 2 * L, after the fragment's assumed identity is determined by mapping the read mates to a reference sequence. Therefore, the external user-defined distribution is on the FRAGMENT SIZES
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4850 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-16 01:45:20 +00:00
fromer
aea481ae01
Trivial bug fix
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4848 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-15 18:29:15 +00:00
depristo
46cd227613
Stabilitity improvements to GATK run report system. R code is now robust. XML parser uses the C backend in python, 10x faster. Added shell script that runs the daily reports, and linked the /humgen/ runme.csh to this script. Script now emails the group the daily PDFs to gsamembers
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4845 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-15 14:56:12 +00:00
fromer
c167c6f9eb
Calculate the phasing probabilities for particular intra-het distances
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4838 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-14 18:44:59 +00:00
fromer
4dbdf7a13d
Added ability to sample from intra-het distance distribution
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4836 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-14 18:09:03 +00:00
fromer
4403b9d276
Added probability bound on phasing paths, which slightly speeds up calculations. It seems that a real speed-up can only be achieved by considering fewer paths by doing some form of caching of sub-problems (e.g., dynamic programming or matrix multiplication, as Mark suggested)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4832 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-13 23:53:56 +00:00
depristo
a6397ed8c3
Default R script now plots sensitivity/specificity curve
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4825 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-13 16:55:11 +00:00
fromer
2bf4fc94f0
Try to use more sampling to get a "correct" estimate of multivariate integral
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4815 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-10 02:58:46 +00:00
fromer
c64bf80b57
Added theoretical model of read-backed phasing (RBP)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4814 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-10 01:46:09 +00:00
delangel
2ac938fe4e
1)
...
Minor fixes to avoid crashes vs CG indel files:
- Add count for complex events, not just insertions and deletions
- Handle correctly cases of large indels falling out of bounds of histogram array: added a count of indels ouf of bounds and avoid exceptions.
2) Cosmetic fix for R script assessing UG calling performance: draw red y=x line on top of Simulated vs Estimated AC to get a better view of under/over-estimation of AC.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4758 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-30 21:08:25 +00:00
kiran
9cca14acc5
Changed VCF subsetting procedure.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4742 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-28 00:46:29 +00:00
kiran
ecd496cf51
Modifications to reflect changes to gsalib. Smarter about figuring out the names of the filtered parts of the callset.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4739 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-27 23:26:03 +00:00
kiran
247f33a553
Prefixed all the functions with gsa. in order to distinguish the methods from other possible methods of the same name in the namespace.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4738 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-27 23:24:42 +00:00
depristo
8768e1a240
Useful profiling tool that reads in a single rod and evalutes the time it takes to read the file by byte, by line, into pieces, just the sites of the vcf, and finally the full vcf. Emits a useful table for plotting with the associated R script that can be run like Rscript R/analyzeRodProfile.R table.txt table.pdf titleString
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4728 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-24 14:59:16 +00:00
kiran
d2fc30d188
Added a debugging statement to plot.venn
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4718 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-23 01:19:31 +00:00
kiran
d492eb94ad
Actually subsets the resulting table now, like it was supposed to all along.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4696 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-17 16:18:23 +00:00
kiran
50dbbdb8ab
Retrieves per-sample or per-lane metrics from the SQUID database and populates a dataframe with the results.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4693 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-16 22:46:07 +00:00
depristo
44d0cb6cde
New version of cutting routines for VQSR. Old code removed. Working unit tests. Best practice with testng integration test (everyone look at it). Walker test now allows you to not specify no. input files, if it can infer input counts from MD5s
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4664 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-13 16:19:56 +00:00
depristo
4f4eec12dd
Minor improvement
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4659 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-12 19:30:54 +00:00
depristo
760f06cf8c
now prints a nice report, can be invoked from command line
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4641 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-09 19:44:10 +00:00
depristo
3c08a1c061
Basic script for assessing simulation sensitivity and specificity
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4638 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-08 21:02:10 +00:00
kiran
1d68b28bbd
Takes a list of BAMs, looks up the read group information in the sequencing platform's SQUID database, and computes the tearsheet stats. Also takes the VariantEval output (R format) and outputs the variant stats and some plots for the tearsheet. This script requires the gsalib library to be in the R library path (add the line '.libPaths('/path/to/Sting/R/')' to your ~/.Rprofile).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4584 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-27 19:06:22 +00:00
depristo
0508dd0c31
Better reporting -- figured out how to drop unused levels in subset
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4438 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-06 14:31:51 +00:00
kiran
24cf6f9e36
Fix to handle situation where there are no filtered variants.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4424 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-04 18:34:01 +00:00
kiran
62f5383859
* Added an R package, "gsalib", providing a place to store common, useful, documented R methods. To use this module, you must follow three steps:
...
1) Build the module with the following command:
$ ant gsalib
2) Add the module path to your ~/.Rprofile file:
.libPaths("/path/to/Sting/trunk/R/")
3) At the top of each R script that will use the library, include the line:
library(gsalib)
You can now use the package like any other R package. To get high-level documentation, supply the following command to R:
help(gsalib)
The methods contained herein are:
getargs : A method to easily provide arguments to interactive and non-interactive scripts.
Prints out a help message specifying how the script should be run if no arguments
or "-h" is provided. Very helpful when you're writing an R-script piecemeal in
interactive mode, then want to make it a command-line program.
plot.venn : Plots a two-way or three-way proportional Venn diagram.
read.eval : Reads VariantEval output that's formatted in R style.
read.gatkreport : Reads GATKReport output.
gsa.message : Emits a message with the prefix "[gsalib]" to stdout.
gsa.warn : Emits a warning message with the prefix "[gsalib] Warning:" to stdout.
gsa.error : Emits an error message with the prefix "[gsalib] Error: to stdout, calls traceback()
and halts execution.
Documentation on each of these methods can be obtained by typing "help(method_name)" at the R prompt.
* Retired GATKReport.R, as that functionality has now been moved to gsalib.
* Retired gsacommons, as that functionality has been split between gsalib and VariantReport.R.
* Modified VariantReport.R to make use of gsalib. The script now uses the getargs() method to provide the user with some information as to the proper way to run the script. Documentation on how to prepare output is given at http://www.broadinstitute.org/gsa/wiki/index.php/VariantEval .
* Added 'gsalib' target to build.xml file. Running "ant gsalib" will compile this module and place the R-ready package in R/gsalib .
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4416 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-04 00:27:59 +00:00
kiran
40b2f62a83
Changed precision on Ti/Tv in venn diagrams
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4413 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-02 05:27:13 +00:00
kiran
d0e44b7a8e
Lower precision on Ti/Tv in variant summary matrix
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4412 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-02 05:18:48 +00:00
kiran
6deb755164
Ti/Tv plots are restricted to a Ti/Tv range of 0.0-4.0. Added column to variant summary specifying the total variant counts (known+novel). Allele spectrum plots now show neutral expectation.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4411 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-02 05:15:34 +00:00
kiran
1d7e48c4b0
Venn diagrams are now oriented properly when a < b. Added a slide with callset summary table. All plots now show the present-in-a, filtered-in-b metrics. Added title page with project name, author, and timestamp.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4407 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-01 22:17:21 +00:00
kiran
fe29c8b09c
Placeholder commit: improvements to VariantReport (now shows stats for variants that are called in one set and filtered in another). Better command-line argument support.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4404 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-01 18:46:53 +00:00
corin
9cf079e1bb
Ready for integration with queue script
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4346 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-24 19:46:01 +00:00
corin
d6bd1debeb
This is an updated version of the automated data processing report. Each page in the report is a stand alone function, which are linked together with a function which pulls all appropriate data (assuming a standard naming convention) and generates the pdf. This script still need to respond appropriately when it doesn't find the data it needs, database access, and a way of getting some information from sequencing for the tearsheet.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4335 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-23 18:08:16 +00:00
depristo
b57a0a0310
improvements to the report code
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4280 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-15 00:45:13 +00:00
kiran
dfdd0b69a9
Removed unused dependency (it was causing a problem by looking for an X11 connection that didn't necessarily exist).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4244 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-09 19:56:00 +00:00
depristo
594fb4a547
More plots in report
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4225 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-08 02:56:51 +00:00
kiran
19e22cfa87
Fixed a bug where the script looked for the wrong column name. Also, all results are now returned in a single plot.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4216 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-06 14:19:57 +00:00
depristo
0c54bf4195
Better reporting and now with a special mode for listing exceptions
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4183 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-01 16:19:51 +00:00
corin
cdad243645
updated version of the DPR. Now produces part of the tearsheet as well as good depth of coverage figures
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4182 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-01 15:38:58 +00:00
depristo
fc5caa98a5
Improved reporting now with metrics by day/week/etc.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4180 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-01 02:43:13 +00:00
depristo
8683087756
Suppl. tools for working with and displaying GATK run reports
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4176 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-31 20:32:22 +00:00
kiran
e14a347e2e
Now prints cluster report to a single PDF, rather than a dozen different PDFs.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4164 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-29 18:58:39 +00:00
kiran
fd19c63aaf
A data structure that allows data to be collected over the course of a walker's computation, then have that data written to a PrintStream such that it's human-readable, AWK-able, and R-friendly (given that you load it using the GATKReport loader module).
...
This object designed to be both the structure that holds data during the execution of the walker, as well as the object that properly formats and emits the data so that it can be easily loaded into R. In the end, you get a table that looks like this:
##:GATKReport.v0.1 ErrorRatePerCycle : The error rate per sequenced position in the reads
cycle errorrate.61PA8.7 qualavg.61PA8.7
0 0.007451835696110506 25.474613284804366
1 0.002362777171937477 29.844949954504095
2 9.087604507451836E-4 32.87590975254731
3 5.452562704471102E-4 34.498999090081895
4 9.087604507451836E-4 35.14831665150137
5 5.452562704471102E-4 36.07223435225619
6 5.452562704471102E-4 36.1217248908297
7 5.452562704471102E-4 36.1910480349345
8 5.452562704471102E-4 36.00345705967977
...
A GATKReport object can hold multiple tables, and the write() method will emit all tables in succession. Each table starts with its own ##:GATKReport.v0.1 table header, so each table can stand alone. This allows for tables to be mixed and matched in a single file, or for the output from different walkers to be combined into a single file with no ill effect.
The display property of individual columns can be turned off. This is useful when a column is used to store intermediate results, necesary for the computation of some later value, but the contents of the intermediate column itself are not required in the final output file.
Finally, the GATKReportTable allows for some simple, mathematical, element-wise and column-wise operations. For instance, two whole columns can be divided, the results of the operation being stored in a third column. This mimics the most basic of R operations, where whole vectors can be added, subtracted, multiplied or divided without requiring the developer to explicitly write a loop.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4159 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-29 05:39:24 +00:00
corin
8931a63588
updated a whole bunch of column names to work like i want them to and added more informative figures for DOC
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4131 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-26 18:19:09 +00:00
kiran
fba71e3c15
Placeholder commit. Implements a loader for a new multi-part GATK reporting format. See what it looks like at /home/radon01/kiran/scr1/projects/NewVariantEvalOutput/results/v1/tableexample.txt . Still need to address the issue where numeric columns are being interpreted as a vector of strings, not numbers.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4115 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-25 18:48:44 +00:00
corin
8054b6b295
Changing a name of a column for variantevals output for easier reading by R--let me know if this needs to be updated elsewhere; it's just a space to an underscore.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4062 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-19 19:18:16 +00:00
depristo
ede87a03c2
Nicer plotting routine for tranches. Add a third arg to suppress the legend.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4049 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-17 19:20:58 +00:00
depristo
e0abb73fd7
plot now assumes 1 / 1000 is the min error rate, not 1/100
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4010 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-11 14:48:22 +00:00
kiran
6037443e55
Handle interactive and non-interactive modes more elegantly.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4009 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-11 02:38:53 +00:00
kiran
a7409df1a6
Be more robust to missing or empty files in VariantEval output.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4008 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-11 02:22:50 +00:00
depristo
67063deb16
Removed coloring by mixture weight. Each cluster gets a distinct color, and the legend indicates which cluster has which id and its weight
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4001 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-10 14:28:24 +00:00
depristo
672bee295c
now plots tranches separately from optimizer
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4000 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-10 12:02:52 +00:00
depristo
41fee2d75e
Publication tranches report is now the default output
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3967 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-07 13:58:59 +00:00
depristo
f4ffef4479
Default max variants is now 5000
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3966 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-07 13:58:32 +00:00
depristo
b63d64bbbc
Beautiful labels, better choice of dimension ranges. Supports fast loading of just first N records for testing.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3964 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-06 23:17:32 +00:00
depristo
d3bebe0f2c
Reasonable comment
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3963 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-06 22:03:55 +00:00
depristo
bb5dfd7e5e
Slightly nicer plotting; not yet complete
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3961 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-06 20:01:31 +00:00
depristo
70f492a6e8
Prints out trivial debugging info
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3957 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-06 13:24:21 +00:00
kiran
1a36cb9296
Can now set the maximum number of variants to see in a cluster plot (useful when you don't need to see a billion points to get an idea of what's going on. Limit applies to known and novel variants separately.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3937 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-04 18:45:24 +00:00
kiran
bd27287fe7
An R module that takes in a Variant Recalibration cluster file (file with '@!CLUSTER' lines in it), a tabularized VCF, and optionally a set of loci that should be examined more carefully, and emits a tremendous number of plots. For every annotation used in clustering, the distributions and pair-wise comparison (with ellipses denoting the 2-sigma cluster boundaries) are shown. Each cluster is shaded with a color proportional to its mixture coefficient.
...
To use this module, you'll first have to take your VCF and create an R-readable table out of it with the following command:
python /path/to/Sting/trunk/python/vcf2table.py -f CHROM,POS,ID,AC,AF,AN,DB,DP,HRun,MQ,MQ0,MyHaplotypeScore,QD,SB my.vcf > my.vcf.table
Then, simply invoke this module with the command:
Rscript /path/to/Sting/trunk/R/VariantRecalibratorReport/VariantRecalibratorReport.R /path/to/output/prefix /path/to/my/my.clusters /path/to/my.vcf.table [/path/to/my.suspicious.loci]
This will create a number of plots all with the prefix "/path/to/output/prefix". For instance, if you used QD, SB, HRun, and MyHaplotypeScore annotations during clustering, you should see output like this:
/path/to/output/prefix.anndist.HRun.pdf
/path/to/output/prefix.anndist.MyHaplotypeScore.pdf
/path/to/output/prefix.anndist.QD.pdf
/path/to/output/prefix.anndist.SB.pdf
/path/to/output/prefix.cluster.HRun_vs_MyHaplotypeScore.pdf
/path/to/output/prefix.cluster.HRun_vs_QD.pdf
/path/to/output/prefix.cluster.HRun_vs_SB.pdf
/path/to/output/prefix.cluster.MyHaplotypeScore_vs_HRun.pdf
/path/to/output/prefix.cluster.MyHaplotypeScore_vs_QD.pdf
/path/to/output/prefix.cluster.MyHaplotypeScore_vs_SB.pdf
/path/to/output/prefix.cluster.QD_vs_HRun.pdf
/path/to/output/prefix.cluster.QD_vs_MyHaplotypeScore.pdf
/path/to/output/prefix.cluster.QD_vs_SB.pdf
/path/to/output/prefix.cluster.SB_vs_HRun.pdf
/path/to/output/prefix.cluster.SB_vs_MyHaplotypeScore.pdf
/path/to/output/prefix.cluster.SB_vs_QD.pdf
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3936 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-04 18:35:14 +00:00
kiran
b990a22bac
A very nice way of automatically plotting the results of a VariantEval run. All of the hard work is actually in the common R repository, gsacommons.R, including methods for creating a Venn diagram. It also provides a mechanism for the output of a VariantEval run to be loaded into a single list object.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3828 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-19 12:38:26 +00:00
depristo
6ffcaa0afe
Can run R scripts on the command line
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3750 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-09 00:13:18 +00:00
depristo
66931d433c
useful routines for R
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3685 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-30 16:38:49 +00:00
corin
bcab0eba01
This replaces tearsheet.r, neatens up graphics, and allows the script to be used in R's interactive environment
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3625 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-24 01:02:58 +00:00
corin
ae88630d52
This script produces tearsheet and data processing report figures and tables when given Squid and Firehose produced data
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3594 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-18 21:36:29 +00:00
corin
a2c266bda3
This script accpets file paths to analysis metrics tables and produces tearsheet data and data processing report graphs
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3585 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-18 13:02:25 +00:00
corin
266a47d83d
This file automaticially generates data and graphics for tearsheets and data processing reports
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3551 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-14 18:39:23 +00:00
rpoplin
290771a8c2
Automatic cutting of recalibrated variant calls using ApplyVariantCuts. VariantRecalibrator produces the tranches plot alongside the optimization curve. Specify the levels using -tranche 1.0 -tranche 5.0 etc
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3472 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-02 15:03:00 +00:00
rpoplin
33a9549896
Variant Optimizer accepts a dbSNP rod arugment to use in determining known/novel status as opposed to using the rsID in the vcf record. VO generates plots of annotation values used in clustering broken out by knowns and novels. Useful for showing which annotations are approximately Gaussian.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3332 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-09 16:48:07 +00:00
chartl
dc802aa26f
Moved CoverageStatistics to core. This will be (soon) renamed DepthOfCoverage; so please use CoverageStatistics
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3090 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-29 13:32:00 +00:00
rpoplin
06a212e612
Adding VariantConcordanceROCCurveWalker to create ROC curves comparing concordance between optimized call sets and validation truth sets in VCF format in order to evaluate performance of variant optimizer independently of achieving a particular novel ti/tv ratio. Added option to ignore only the specified filters in the input call sets via --ignore_filter <String>. Added option to provide a prior estimate of error for known snps via --known_prior <qual>. The het and hom calls are clustered independently. Infrastructure in place to use titv of known snps to inform p(true) of novel snps. Tweaked protection against overfitting based on suggestions from several people. Minor edits to AnalyzeAnnotations.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3071 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-24 19:43:10 +00:00
rpoplin
c78fc23ec5
Minor updates to output of variant optimizer.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3031 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-18 12:46:47 +00:00
rpoplin
58a31bab6a
Variant optimizer now outputs VCF files via ApplyVariantClustersWalker. Documentation to be added to the wiki. It is ready to be used by other people but only with great caution.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3028 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-17 20:41:42 +00:00
rpoplin
933823c8bc
Removed the StingException when mkdir fails for Sendu in AnalyzeCovariates. Incremental updates to VariantOptimizer.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3013 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-16 19:45:02 +00:00
chartl
ee68e38e02
Eliminate the shell items, as FH will be calling this with /broad/tools/apps/R-2.72/bin/Rscript
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2968 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-09 20:15:21 +00:00
chartl
aa7191353a
PlotDepthOfCoverage now produces a set of useful QC plots. Currently a first-draft, and it is unclear how the visualization will scale with increasing sample size and/or depth.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2962 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-09 16:42:35 +00:00
chartl
81ffb8243d
Waypoint commit of plotting R script for Depth Of Coverage/Coverage Statistics
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2958 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-08 21:42:51 +00:00
kshakir
36129e01e4
Using bitmap() instead of png() since the former doesn't rely on X11.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2873 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-23 05:31:51 +00:00
kshakir
3738b76320
Added a playground concordance analyzer for summarizing VariantEval across a group.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2867 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-22 20:28:52 +00:00
chartl
f02e94ab6f
Eliminate the rescale factor -- heatmap automatically normalizes the data
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2845 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-16 16:34:33 +00:00
chartl
37fa1bf0cc
Added heatmap function
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2843 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-16 15:12:54 +00:00
chartl
951b7a2433
First of what will be an increasingly useful set of tools, compiled into one command-line runnable library -- the goal is to have one plotting library that's callable because of limitations on the number of files you can package with a GenePattern module.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2841 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-15 16:51:47 +00:00
rpoplin
233a652161
Making the dotted quartile lines more clear.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2772 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-03 22:23:09 +00:00
rpoplin
64fc76e4bf
Added an option to AnalyzeCovariates to set the max value of the histograms to make them easier to directly compare.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2753 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-31 23:13:57 +00:00
rpoplin
16da5011c0
Added a new option for indicating the mean number of variants on the AnalyzeAnnotations plots. This way one can say, for example, filtering at this point will keep 75 percent of all the variants.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2744 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-29 21:58:31 +00:00
rpoplin
c6cc844e55
Added -name argument to AnalyzeAnnotations that allows one to specify the name of the annotation to be used on the plots. Instead of seeing AB and DP, one can add -name AB,AlleleBalance -name DP,Depth
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2742 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-29 20:48:53 +00:00
rpoplin
4f29a1d4f6
AnalyzeAnnotations now plots true positive rate instead of percentage of variants found in the truth set. Committing GCContentCovariate to help people experiment with correcting the pilot3/Kristian base calling error mode in slx.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2740 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-29 20:01:56 +00:00
rpoplin
79c4cc1db7
AnalyzeAnnotations now breaks out titv by calls in hapmap and also plots true positive rates. Any RODs passed in whose name starts with 'truth' is considered to be the truth set.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2726 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-28 21:41:23 +00:00