Commit Graph

36 Commits (ffbd4d85f2e0112b32df0bbba00330b00a0806cf)

Author SHA1 Message Date
Mark DePristo 4a84ff4fce Fix a nasty bug in reading GATK reports with a single line
-- Old version would break during reading with (as usual) a cryptic error message
-- Fixed by avoiding collapsing into a single vector type from a matrix when you subset to a single row.  I believe this code confirms thats R is truly the worst programming language ever
2012-09-10 20:14:13 -04:00
Mark DePristo 4b8d9c3915 Actually load the library necessary to compactPDF
-- Old version was buggy in that if you didn't load "tools" package in your script it wouldn't compact the resulting PDF!  Fixed
2012-08-28 08:06:47 -04:00
Mark DePristo bf6c0aaa57 Fix for missing formatter in R 2.15
-- VariantCallQC now works on newest ESP call set
2012-08-17 11:49:02 -04:00
Mark DePristo 4cbd11faf5 Fixed spelling error in BQSR.R 2012-08-13 10:01:33 -04:00
Mark DePristo 243af0adb1 Expanded the BQSR reporting script
-- Includes header page
-- Table of arguments (Arguments)
-- Summary of counts (RecalData0)
-- Summary of counts by qual (RecalData1)
-- Fixed bug in output that resulted in covariates list always being null (updated md5s accordingly)
-- BQSR.R loads all relevant libaries now, include gplots, grid, and gsalib to run correctly
2012-08-12 13:45:14 -04:00
Ryan Poplin 2be29ebd22 Merged bug fix from Stable into Unstable 2012-08-01 14:35:30 -04:00
Ryan Poplin 4093909a56 Updating VQSR docs. Removing references to old best practices pages. 2012-08-01 14:30:24 -04:00
Mark DePristo 762a3d9b50 Move BQSR.R to utils/recalibration in R 2012-07-31 08:11:04 -04:00
Khalid Shakir 46ca49b63d Removed 'Walker' suffix from packages/GATKEngine.xml that were breaking the packaged release.
Archived AnalyzeCovariates scripts and removed references in build packages / GATK extensions.
2012-07-23 16:32:31 -04:00
Ryan Poplin 41df9bd2a2 Moving BQSR plotting script to public so that it can be used with the substiution-model-only version. 2012-07-23 11:46:07 -04:00
Mauricio Carneiro d446d34227 GATK Error messages now point to the new website instead of GetSatisfaction. 2012-07-20 17:27:11 -04:00
Mark DePristo a822087f11 Bugfix for R 2.15 not including reshape 2012-06-14 16:42:29 -04:00
Mark DePristo dab25afc88 Add warning message about ratios in variantQCreport, give ratio for MAF > 10% 2012-04-25 12:22:32 -04:00
Mark DePristo 87be63c7e4 Improve variantCallQC.R
-- Refactor plotting utilities into master utility in gsalib.  Everyone can use it now
-- Better plots for standard variantCallQC
2012-04-13 17:00:37 -04:00
Mark DePristo e4d49357ce Further cleanup of R 2012-03-22 21:24:37 -04:00
Mark DePristo 6c2290fb6e Performance optimization for gsa.read.gatkreport.R
-- instead of using y = rbind(x, y), which is O(n^2) in a loop when processing lines into a data structure in R, preallocate a matrix and explicitly assign each row to x.  This results in a radical performance improvement when reading large tables into R.  It's possible with this optimization to read in a 70MB table for variantQCReport.R with 200K lines for 800 samples.
2012-03-22 21:24:36 -04:00
Roger Zurawicki 7887a06703 GATKReport v1.0
GATKReport format changes:

 - All non-data header lines are preceeded with a single pound ( #:)
 - Every report now has a report header containing the version number and number of tables
 - Every table has two lines of table header: The first explains the size of the table and the data types of each column, the second contains the table name and description.
 - This new format will allow reports in the future to be gatherable.
 - Changed the header format to include an end-of-line string ":;"

Added features:

 - Simplified GATK Reports:

	The constructor for a simplified GATK Report. Simplified GATK report are designed for reports that do not need the advanced functionality of a full GATK Report.

	A simple GATK Report consists of:
		- A single table
		- No primary key ( it is hidden )
	    Optional:
		- Only untyped columns. As long as the data is an Object, it will be accepted.
		- Default column values being empty strings.
	Limitations:
		- A simple GATK report cannot contain multiple tables.
		- It cannot contain typed columns, which prevents arithmetic gathering.

       - Added a constructor to generate simplified GATK reports.
       - Added a method to easily add data to simple GATK reports.

 - Upgraded the input parser take advantage of the new file format (v1).
 - Added the GATKReportGatherer, more usability cmoing in next versionof GATK Report. Curently, it can only add rows from one table to another. Added private methods in GATKReport to combine Tables and Reports, It is very conservative and will only gather if the table columns, as well as everything else matches. At the column level, it uses the (redundant) row ids to add new rows. It will throw an exception if it is overwriting data.
 - Made some GATKReport methods public, and added more setters and getters.
 - Added method that compares formats of two GATKReports, and added an equals method to verify all data inside.
 - The gsalib for R now supports reading GATKReport v1 files in addition to legacy formats (v0.*)
 - Added a GATKReportDataType enum to give column a certain data type. This must be specified when making a gatherable report. This enum contains several methods including a reverse lookup map.
 - Added a data type field in GATKColumn, when a type is not specified, the unknown type is used. Unknown types should not be gathered.

Test changes:

 - Updated Unit Tests for GATK Report v1. Added a test for the gatherer. Left one test disabled while we transition from v0 to v1.
 - Updated the MD5 hashes in integration tests throughout the GATK.

Other changes:

 - Added the gatherer functions to CoverageByRG
 - Also added the scatterCount parameter in the Interval Coverage script
 - Dropped support for reading in legacy GATKReport formats ( v0.*)
 - Updated VariantEvalWalker to work with GATK Report v1, added a format String to all applicable DataPoints.
 - Rewrote the read file method for GATK report files.
 - Optimized the equals methods within GATKReport. The protected functions should only be called by the GATKReport methods.

Signed-off-by: Mauricio Carneiro <carneiro@broadinstitute.org>
2012-03-12 23:09:19 -04:00
Mark DePristo 0a7137616c Now converts gatkreports to properly typed R data types in gsa.read.gatkreport
-- use the general function type.convert from read.table to automagically convert the string data to booleans, factors, and numeric types as appropriate.  Vastly better than the previous behavior which only worked for numerics, in some cases.
2012-03-02 09:11:59 -05:00
Khalid Shakir aae61767c6 queueJobReport now compresses PDF when running R 2.13+.
Updated PostCallingQC.scala's VE and R to include missense to silent ratio and plot.
2012-01-10 17:32:30 -05:00
Mark DePristo 5383c50654 Protect ourselves when iteration is present but there's only a single iteration in queueJobReport.R 2011-12-19 10:08:38 -05:00
Mark DePristo 50c4436f90 scales=free shows variance within analysis better 2011-12-07 14:09:32 -05:00
Mark DePristo e17a1923fb Plots runtimes by analysis name and exechosts
Useful to understand the performance of analysis jobs by hosts,
and to debug problematic nodes
2011-12-07 09:24:47 -05:00
Khalid Shakir 4d0e34109f Compacting pdfs when running under R 2.13+. 2011-10-27 14:51:56 -04:00
Khalid Shakir b80d407dc7 No more hunting down R "resources". As a tradeoff Rscript cannot be specified on the commandline and will be found in the environment path.
Other minor cleanup.
2011-10-27 14:17:07 -04:00
Khalid Shakir fac9932938 Embedding gsalib source and queueJobReport R scripts in the dist and package jars.
Moved gsalib and queueJobReport.R to embeddable namespaced locations.
Updated packager dependencies/dir to add an @includes which filters the embedded fileset.
RScriptExecutor can now JIT compiles the gsalib.
RScriptExecutor uses ProcessController and sends the Rscript output to java's stdout when run under -l DEBUG.
Refactored ProcessController and IOUtils from Queue to Sting Utils.
Added more unit tests to ProcessController along with a utility class to hard stop OutputStreams at a specified byte count.
Replaced uses of some IOUtils with Apache Commons IO.
ShellJobRunner refactored to use direct ProcessController and now kills jobs on shutdown.
Better QGraph responsiveness on shutdown by using Object.wait() instead of Thread.sleep().
2011-10-24 15:58:34 -04:00
Mark DePristo ff3dccd062 Fixing errors in queueJobReport runtime unit 2011-10-07 12:04:53 -07:00
Khalid Shakir 6d6149b9a2 Updated gsalib gsa.read.gatkreport to return all reports, even those beginning with '.'.
In PreQC using geom_blank() so MEDIAN_INSERT_SIZE plot doesn't crash on facet_grid(scales='free') when data doesn't contain points for 'RF' or 'TANDEM'.
2011-10-05 18:30:40 -04:00
Mark DePristo 6ea57bf036 Merge branch 'master' into sgintervals 2011-09-19 09:50:19 -04:00
Mark DePristo bed78b47e0 Marginally better formating, with hours the default time 2011-09-18 20:18:18 -04:00
Ryan Poplin 07d365ce39 Fixing units in queue job report Gantt plots 2011-09-12 09:01:34 -04:00
Mark DePristo 9559115ad5 Bugfix for singleton runs. Now with histograms where possible 2011-09-06 16:54:01 -04:00
Mark DePristo c6d8df8639 queueJobReport is a public feature of Queue 2011-08-29 17:20:54 -04:00
Khalid Shakir 5dcac7b064 GATKReport v0.2:
- Floating point column widths are measured correctly
- Using fixed width columns instead of white space separated which allows spaces embedded in cell values
- Legacy support for parsing white space separated v0.1 tables where the columns may not be fixed width
- Enforcing that table descriptions do not contain newlines so that tables can be parsed correctly
Replaced GATKReportTableParser with existing functionality in GATKReport
2011-08-03 00:24:47 -04:00
Kiran V Garimella ca35defdcd Moved gsalib sources from private/ to public/ 2011-07-27 12:29:43 -04:00
Ryan Poplin 5faf40b79d Moving AnalyzeAnnotations into the archive because it has outlived its usefulness. 2011-07-02 10:39:53 -04:00
Ryan Poplin f4ae6edb92 Moving some of the released R scripts into public from private 2011-06-30 14:55:25 -04:00