fromer
6310a524d9
Do not abort integration (over het-het distances) on errors, but warn about them
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4855 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-16 17:20:25 +00:00
fromer
b1f0df0047
Handle case where read lengths are longer than fragment size
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4852 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-16 02:19:55 +00:00
fromer
a5e1854b3a
Forgot to pass correct parameters to calcPhasingProbsForWindowDistances()
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4851 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-16 01:58:06 +00:00
fromer
2b0dc8625c
Updated RBP theoretical model as per Mark's insights regarding the correct understanding of insert sizes being calculated post-hoc from the distance between read lengths. The correct way to think of it is: 1) There's a fragment of length F. 2) Each of it's two ends are read for L bases. 3) The insert size = i = F - 2 * L, after the fragment's assumed identity is determined by mapping the read mates to a reference sequence. Therefore, the external user-defined distribution is on the FRAGMENT SIZES
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4850 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-16 01:45:20 +00:00
fromer
aea481ae01
Trivial bug fix
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4848 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-15 18:29:15 +00:00
depristo
46cd227613
Stabilitity improvements to GATK run report system. R code is now robust. XML parser uses the C backend in python, 10x faster. Added shell script that runs the daily reports, and linked the /humgen/ runme.csh to this script. Script now emails the group the daily PDFs to gsamembers
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4845 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-15 14:56:12 +00:00
fromer
c167c6f9eb
Calculate the phasing probabilities for particular intra-het distances
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4838 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-14 18:44:59 +00:00
fromer
4dbdf7a13d
Added ability to sample from intra-het distance distribution
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4836 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-14 18:09:03 +00:00
fromer
4403b9d276
Added probability bound on phasing paths, which slightly speeds up calculations. It seems that a real speed-up can only be achieved by considering fewer paths by doing some form of caching of sub-problems (e.g., dynamic programming or matrix multiplication, as Mark suggested)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4832 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-13 23:53:56 +00:00
depristo
a6397ed8c3
Default R script now plots sensitivity/specificity curve
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4825 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-13 16:55:11 +00:00
fromer
2bf4fc94f0
Try to use more sampling to get a "correct" estimate of multivariate integral
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4815 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-10 02:58:46 +00:00
fromer
c64bf80b57
Added theoretical model of read-backed phasing (RBP)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4814 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-10 01:46:09 +00:00
delangel
2ac938fe4e
1)
...
Minor fixes to avoid crashes vs CG indel files:
- Add count for complex events, not just insertions and deletions
- Handle correctly cases of large indels falling out of bounds of histogram array: added a count of indels ouf of bounds and avoid exceptions.
2) Cosmetic fix for R script assessing UG calling performance: draw red y=x line on top of Simulated vs Estimated AC to get a better view of under/over-estimation of AC.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4758 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-30 21:08:25 +00:00
kiran
9cca14acc5
Changed VCF subsetting procedure.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4742 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-28 00:46:29 +00:00
kiran
ecd496cf51
Modifications to reflect changes to gsalib. Smarter about figuring out the names of the filtered parts of the callset.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4739 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-27 23:26:03 +00:00
kiran
247f33a553
Prefixed all the functions with gsa. in order to distinguish the methods from other possible methods of the same name in the namespace.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4738 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-27 23:24:42 +00:00
depristo
8768e1a240
Useful profiling tool that reads in a single rod and evalutes the time it takes to read the file by byte, by line, into pieces, just the sites of the vcf, and finally the full vcf. Emits a useful table for plotting with the associated R script that can be run like Rscript R/analyzeRodProfile.R table.txt table.pdf titleString
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4728 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-24 14:59:16 +00:00
kiran
d2fc30d188
Added a debugging statement to plot.venn
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4718 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-23 01:19:31 +00:00
kiran
d492eb94ad
Actually subsets the resulting table now, like it was supposed to all along.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4696 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-17 16:18:23 +00:00
kiran
50dbbdb8ab
Retrieves per-sample or per-lane metrics from the SQUID database and populates a dataframe with the results.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4693 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-16 22:46:07 +00:00
depristo
44d0cb6cde
New version of cutting routines for VQSR. Old code removed. Working unit tests. Best practice with testng integration test (everyone look at it). Walker test now allows you to not specify no. input files, if it can infer input counts from MD5s
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4664 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-13 16:19:56 +00:00
depristo
4f4eec12dd
Minor improvement
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4659 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-12 19:30:54 +00:00
depristo
760f06cf8c
now prints a nice report, can be invoked from command line
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4641 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-09 19:44:10 +00:00
depristo
3c08a1c061
Basic script for assessing simulation sensitivity and specificity
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4638 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-08 21:02:10 +00:00
kiran
1d68b28bbd
Takes a list of BAMs, looks up the read group information in the sequencing platform's SQUID database, and computes the tearsheet stats. Also takes the VariantEval output (R format) and outputs the variant stats and some plots for the tearsheet. This script requires the gsalib library to be in the R library path (add the line '.libPaths('/path/to/Sting/R/')' to your ~/.Rprofile).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4584 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-27 19:06:22 +00:00
depristo
0508dd0c31
Better reporting -- figured out how to drop unused levels in subset
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4438 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-06 14:31:51 +00:00
kiran
24cf6f9e36
Fix to handle situation where there are no filtered variants.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4424 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-04 18:34:01 +00:00
kiran
62f5383859
* Added an R package, "gsalib", providing a place to store common, useful, documented R methods. To use this module, you must follow three steps:
...
1) Build the module with the following command:
$ ant gsalib
2) Add the module path to your ~/.Rprofile file:
.libPaths("/path/to/Sting/trunk/R/")
3) At the top of each R script that will use the library, include the line:
library(gsalib)
You can now use the package like any other R package. To get high-level documentation, supply the following command to R:
help(gsalib)
The methods contained herein are:
getargs : A method to easily provide arguments to interactive and non-interactive scripts.
Prints out a help message specifying how the script should be run if no arguments
or "-h" is provided. Very helpful when you're writing an R-script piecemeal in
interactive mode, then want to make it a command-line program.
plot.venn : Plots a two-way or three-way proportional Venn diagram.
read.eval : Reads VariantEval output that's formatted in R style.
read.gatkreport : Reads GATKReport output.
gsa.message : Emits a message with the prefix "[gsalib]" to stdout.
gsa.warn : Emits a warning message with the prefix "[gsalib] Warning:" to stdout.
gsa.error : Emits an error message with the prefix "[gsalib] Error: to stdout, calls traceback()
and halts execution.
Documentation on each of these methods can be obtained by typing "help(method_name)" at the R prompt.
* Retired GATKReport.R, as that functionality has now been moved to gsalib.
* Retired gsacommons, as that functionality has been split between gsalib and VariantReport.R.
* Modified VariantReport.R to make use of gsalib. The script now uses the getargs() method to provide the user with some information as to the proper way to run the script. Documentation on how to prepare output is given at http://www.broadinstitute.org/gsa/wiki/index.php/VariantEval .
* Added 'gsalib' target to build.xml file. Running "ant gsalib" will compile this module and place the R-ready package in R/gsalib .
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4416 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-04 00:27:59 +00:00
kiran
40b2f62a83
Changed precision on Ti/Tv in venn diagrams
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4413 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-02 05:27:13 +00:00
kiran
d0e44b7a8e
Lower precision on Ti/Tv in variant summary matrix
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4412 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-02 05:18:48 +00:00
kiran
6deb755164
Ti/Tv plots are restricted to a Ti/Tv range of 0.0-4.0. Added column to variant summary specifying the total variant counts (known+novel). Allele spectrum plots now show neutral expectation.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4411 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-02 05:15:34 +00:00
kiran
1d7e48c4b0
Venn diagrams are now oriented properly when a < b. Added a slide with callset summary table. All plots now show the present-in-a, filtered-in-b metrics. Added title page with project name, author, and timestamp.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4407 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-01 22:17:21 +00:00
kiran
fe29c8b09c
Placeholder commit: improvements to VariantReport (now shows stats for variants that are called in one set and filtered in another). Better command-line argument support.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4404 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-01 18:46:53 +00:00
corin
9cf079e1bb
Ready for integration with queue script
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4346 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-24 19:46:01 +00:00
corin
d6bd1debeb
This is an updated version of the automated data processing report. Each page in the report is a stand alone function, which are linked together with a function which pulls all appropriate data (assuming a standard naming convention) and generates the pdf. This script still need to respond appropriately when it doesn't find the data it needs, database access, and a way of getting some information from sequencing for the tearsheet.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4335 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-23 18:08:16 +00:00
depristo
b57a0a0310
improvements to the report code
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4280 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-15 00:45:13 +00:00
kiran
dfdd0b69a9
Removed unused dependency (it was causing a problem by looking for an X11 connection that didn't necessarily exist).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4244 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-09 19:56:00 +00:00
depristo
594fb4a547
More plots in report
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4225 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-08 02:56:51 +00:00
kiran
19e22cfa87
Fixed a bug where the script looked for the wrong column name. Also, all results are now returned in a single plot.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4216 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-06 14:19:57 +00:00
depristo
0c54bf4195
Better reporting and now with a special mode for listing exceptions
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4183 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-01 16:19:51 +00:00
corin
cdad243645
updated version of the DPR. Now produces part of the tearsheet as well as good depth of coverage figures
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4182 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-01 15:38:58 +00:00
depristo
fc5caa98a5
Improved reporting now with metrics by day/week/etc.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4180 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-01 02:43:13 +00:00
depristo
8683087756
Suppl. tools for working with and displaying GATK run reports
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4176 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-31 20:32:22 +00:00
kiran
e14a347e2e
Now prints cluster report to a single PDF, rather than a dozen different PDFs.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4164 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-29 18:58:39 +00:00
kiran
fd19c63aaf
A data structure that allows data to be collected over the course of a walker's computation, then have that data written to a PrintStream such that it's human-readable, AWK-able, and R-friendly (given that you load it using the GATKReport loader module).
...
This object designed to be both the structure that holds data during the execution of the walker, as well as the object that properly formats and emits the data so that it can be easily loaded into R. In the end, you get a table that looks like this:
##:GATKReport.v0.1 ErrorRatePerCycle : The error rate per sequenced position in the reads
cycle errorrate.61PA8.7 qualavg.61PA8.7
0 0.007451835696110506 25.474613284804366
1 0.002362777171937477 29.844949954504095
2 9.087604507451836E-4 32.87590975254731
3 5.452562704471102E-4 34.498999090081895
4 9.087604507451836E-4 35.14831665150137
5 5.452562704471102E-4 36.07223435225619
6 5.452562704471102E-4 36.1217248908297
7 5.452562704471102E-4 36.1910480349345
8 5.452562704471102E-4 36.00345705967977
...
A GATKReport object can hold multiple tables, and the write() method will emit all tables in succession. Each table starts with its own ##:GATKReport.v0.1 table header, so each table can stand alone. This allows for tables to be mixed and matched in a single file, or for the output from different walkers to be combined into a single file with no ill effect.
The display property of individual columns can be turned off. This is useful when a column is used to store intermediate results, necesary for the computation of some later value, but the contents of the intermediate column itself are not required in the final output file.
Finally, the GATKReportTable allows for some simple, mathematical, element-wise and column-wise operations. For instance, two whole columns can be divided, the results of the operation being stored in a third column. This mimics the most basic of R operations, where whole vectors can be added, subtracted, multiplied or divided without requiring the developer to explicitly write a loop.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4159 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-29 05:39:24 +00:00
corin
8931a63588
updated a whole bunch of column names to work like i want them to and added more informative figures for DOC
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4131 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-26 18:19:09 +00:00
kiran
fba71e3c15
Placeholder commit. Implements a loader for a new multi-part GATK reporting format. See what it looks like at /home/radon01/kiran/scr1/projects/NewVariantEvalOutput/results/v1/tableexample.txt . Still need to address the issue where numeric columns are being interpreted as a vector of strings, not numbers.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4115 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-25 18:48:44 +00:00
corin
8054b6b295
Changing a name of a column for variantevals output for easier reading by R--let me know if this needs to be updated elsewhere; it's just a space to an underscore.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4062 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-19 19:18:16 +00:00
depristo
ede87a03c2
Nicer plotting routine for tranches. Add a third arg to suppress the legend.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4049 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-17 19:20:58 +00:00
depristo
e0abb73fd7
plot now assumes 1 / 1000 is the min error rate, not 1/100
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4010 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-11 14:48:22 +00:00