Commit Graph

42 Commits (cdfd07f9eb4e2ca18b2b6b10d00797cd3a156ebd)

Author SHA1 Message Date
Ryan Poplin d4ac183580 Something changed with the ggtitle syntax in the latest version of ggplot2. 2013-08-14 14:40:03 -04:00
Valentin Ruano-Rubio 1f8282633b Removed plots generation from the BaseRecalibration software
Improved AnalyzeCovariates (AC) integration test.
Renamed AC test files ending with .grp to .table

Implementation:

* Removed RECAL_PDF/CSV_FILE from RecalibrationArgumentCollection (RAC). Updated rest of the code accordingly.
* Fixed BQSRIntegrationTest to work with new changes
2013-06-19 14:47:56 -04:00
Valentin Ruano-Rubio 08f92bb6f9 Added AnalyzeCovariates tool to generate BQSR assessment quality plots.
Implemtation details:

* Added tool class *.AnalyzeCovariates
* Added convenient addAll method to Utils to be able to add elements of an array.
* Added parameter comparison methods to RecalibrationArgumentCollection class in order to verify that multiple imput recalibration report are compatible and comparable.
* Modified the BQSR.R script to handle up to 3 different recalibration tables (-BQSR, -before and -after) and removed some irrelevant arguments (or argument values) from the output.
* Added an integration test class.
2013-06-19 14:38:02 -04:00
Mauricio Carneiro c8b1c47764 Updating gsalib for R-3.0 compatibility
* add package namespace that exports all the visible objects
   * list gsalib dependencies in the package requirements

[fixes #49987933]
2013-05-18 12:43:38 -04:00
Mark DePristo 2b86ab02be Improve queue script jobreport visualization script
-- the Queue jobreport PDF script now provides a high-level summary of the de-scattered runtimes of each analysis, so that its easy to see where your script is spending its time across scatters.
2013-05-07 12:11:46 -04:00
Geraldine Van der Auwera f972963918 Fixed issues raised by Appistry QA (mostly small fixes, corrections & clarifications to GATKDocs)
GATK-73 updated docs for bqsr args
GATK-9 differentiate CountRODs from CountRODsByRef
GATK-76 generate GATKDoc for CatVariants
GATK-4 made resource arg required
GATK-10 added -o, some docs to CountMales; some docs to CountLoci
GATK-11 fixed by MC's -o change; straightened out the docs.
GATK-77 fixed references to wiki
GATK-76 Added Ami's doc block
GATK-14 Added note that these annotations can only be used with VariantAnnotator
GATK-15 specified required=false for two arguments
GATK-23 Added documentation block
GATK-33 Added documentation
GATK-34 Added documentation
GATK-32 Corrected arg name and docstring in DiffObjects
GATK-32 Added note to DO doc about reference (required but unused)
GATK-29 Added doc block to CountIntervals
GATK-31 Added @Output PrintStream to enable -o
GATK-35 Touched up docs
GATK-36 Touched up docs, specified verbosity is optional
GATK-60 Corrected GContent annot module location in gatkdocs
GATK-68 touched up docs and arg docstrings
GATK-16 Added note of caution about calling RODRequiringAnnotations as a group
GATK-61 Added run requirements (num samples, min genotype quality)
Tweaked template and generic doc block formatting (h2 to h3 titles)
GATK-62 Added a caveat to HR annot
Made experimental annotation hidden
GATK-75 Added setup info regarding BWA
GATK-22 Clarified some argument requirements
GATK-48 Clarified -G doc comments
GATK-67 Added arg requirement
GATK-58 Added annotation and usage docs
GSATDG-96 Corrected doc
Updated MD5 for DiffObjectsIntegrationTests (only change is link in table title)
2013-03-12 10:57:14 -04:00
Mark DePristo 4a84ff4fce Fix a nasty bug in reading GATK reports with a single line
-- Old version would break during reading with (as usual) a cryptic error message
-- Fixed by avoiding collapsing into a single vector type from a matrix when you subset to a single row.  I believe this code confirms thats R is truly the worst programming language ever
2012-09-10 20:14:13 -04:00
Mark DePristo 4b8d9c3915 Actually load the library necessary to compactPDF
-- Old version was buggy in that if you didn't load "tools" package in your script it wouldn't compact the resulting PDF!  Fixed
2012-08-28 08:06:47 -04:00
Mark DePristo bf6c0aaa57 Fix for missing formatter in R 2.15
-- VariantCallQC now works on newest ESP call set
2012-08-17 11:49:02 -04:00
Mark DePristo 4cbd11faf5 Fixed spelling error in BQSR.R 2012-08-13 10:01:33 -04:00
Mark DePristo 243af0adb1 Expanded the BQSR reporting script
-- Includes header page
-- Table of arguments (Arguments)
-- Summary of counts (RecalData0)
-- Summary of counts by qual (RecalData1)
-- Fixed bug in output that resulted in covariates list always being null (updated md5s accordingly)
-- BQSR.R loads all relevant libaries now, include gplots, grid, and gsalib to run correctly
2012-08-12 13:45:14 -04:00
Ryan Poplin 2be29ebd22 Merged bug fix from Stable into Unstable 2012-08-01 14:35:30 -04:00
Ryan Poplin 4093909a56 Updating VQSR docs. Removing references to old best practices pages. 2012-08-01 14:30:24 -04:00
Mark DePristo 762a3d9b50 Move BQSR.R to utils/recalibration in R 2012-07-31 08:11:04 -04:00
Khalid Shakir 46ca49b63d Removed 'Walker' suffix from packages/GATKEngine.xml that were breaking the packaged release.
Archived AnalyzeCovariates scripts and removed references in build packages / GATK extensions.
2012-07-23 16:32:31 -04:00
Ryan Poplin 41df9bd2a2 Moving BQSR plotting script to public so that it can be used with the substiution-model-only version. 2012-07-23 11:46:07 -04:00
Mauricio Carneiro d446d34227 GATK Error messages now point to the new website instead of GetSatisfaction. 2012-07-20 17:27:11 -04:00
Mark DePristo a822087f11 Bugfix for R 2.15 not including reshape 2012-06-14 16:42:29 -04:00
Mark DePristo dab25afc88 Add warning message about ratios in variantQCreport, give ratio for MAF > 10% 2012-04-25 12:22:32 -04:00
Mark DePristo 87be63c7e4 Improve variantCallQC.R
-- Refactor plotting utilities into master utility in gsalib.  Everyone can use it now
-- Better plots for standard variantCallQC
2012-04-13 17:00:37 -04:00
Mark DePristo e4d49357ce Further cleanup of R 2012-03-22 21:24:37 -04:00
Mark DePristo 6c2290fb6e Performance optimization for gsa.read.gatkreport.R
-- instead of using y = rbind(x, y), which is O(n^2) in a loop when processing lines into a data structure in R, preallocate a matrix and explicitly assign each row to x.  This results in a radical performance improvement when reading large tables into R.  It's possible with this optimization to read in a 70MB table for variantQCReport.R with 200K lines for 800 samples.
2012-03-22 21:24:36 -04:00
Roger Zurawicki 7887a06703 GATKReport v1.0
GATKReport format changes:

 - All non-data header lines are preceeded with a single pound ( #:)
 - Every report now has a report header containing the version number and number of tables
 - Every table has two lines of table header: The first explains the size of the table and the data types of each column, the second contains the table name and description.
 - This new format will allow reports in the future to be gatherable.
 - Changed the header format to include an end-of-line string ":;"

Added features:

 - Simplified GATK Reports:

	The constructor for a simplified GATK Report. Simplified GATK report are designed for reports that do not need the advanced functionality of a full GATK Report.

	A simple GATK Report consists of:
		- A single table
		- No primary key ( it is hidden )
	    Optional:
		- Only untyped columns. As long as the data is an Object, it will be accepted.
		- Default column values being empty strings.
	Limitations:
		- A simple GATK report cannot contain multiple tables.
		- It cannot contain typed columns, which prevents arithmetic gathering.

       - Added a constructor to generate simplified GATK reports.
       - Added a method to easily add data to simple GATK reports.

 - Upgraded the input parser take advantage of the new file format (v1).
 - Added the GATKReportGatherer, more usability cmoing in next versionof GATK Report. Curently, it can only add rows from one table to another. Added private methods in GATKReport to combine Tables and Reports, It is very conservative and will only gather if the table columns, as well as everything else matches. At the column level, it uses the (redundant) row ids to add new rows. It will throw an exception if it is overwriting data.
 - Made some GATKReport methods public, and added more setters and getters.
 - Added method that compares formats of two GATKReports, and added an equals method to verify all data inside.
 - The gsalib for R now supports reading GATKReport v1 files in addition to legacy formats (v0.*)
 - Added a GATKReportDataType enum to give column a certain data type. This must be specified when making a gatherable report. This enum contains several methods including a reverse lookup map.
 - Added a data type field in GATKColumn, when a type is not specified, the unknown type is used. Unknown types should not be gathered.

Test changes:

 - Updated Unit Tests for GATK Report v1. Added a test for the gatherer. Left one test disabled while we transition from v0 to v1.
 - Updated the MD5 hashes in integration tests throughout the GATK.

Other changes:

 - Added the gatherer functions to CoverageByRG
 - Also added the scatterCount parameter in the Interval Coverage script
 - Dropped support for reading in legacy GATKReport formats ( v0.*)
 - Updated VariantEvalWalker to work with GATK Report v1, added a format String to all applicable DataPoints.
 - Rewrote the read file method for GATK report files.
 - Optimized the equals methods within GATKReport. The protected functions should only be called by the GATKReport methods.

Signed-off-by: Mauricio Carneiro <carneiro@broadinstitute.org>
2012-03-12 23:09:19 -04:00
Mark DePristo 0a7137616c Now converts gatkreports to properly typed R data types in gsa.read.gatkreport
-- use the general function type.convert from read.table to automagically convert the string data to booleans, factors, and numeric types as appropriate.  Vastly better than the previous behavior which only worked for numerics, in some cases.
2012-03-02 09:11:59 -05:00
Khalid Shakir aae61767c6 queueJobReport now compresses PDF when running R 2.13+.
Updated PostCallingQC.scala's VE and R to include missense to silent ratio and plot.
2012-01-10 17:32:30 -05:00
Mark DePristo 5383c50654 Protect ourselves when iteration is present but there's only a single iteration in queueJobReport.R 2011-12-19 10:08:38 -05:00
Mark DePristo 50c4436f90 scales=free shows variance within analysis better 2011-12-07 14:09:32 -05:00
Mark DePristo e17a1923fb Plots runtimes by analysis name and exechosts
Useful to understand the performance of analysis jobs by hosts,
and to debug problematic nodes
2011-12-07 09:24:47 -05:00
Khalid Shakir 4d0e34109f Compacting pdfs when running under R 2.13+. 2011-10-27 14:51:56 -04:00
Khalid Shakir b80d407dc7 No more hunting down R "resources". As a tradeoff Rscript cannot be specified on the commandline and will be found in the environment path.
Other minor cleanup.
2011-10-27 14:17:07 -04:00
Khalid Shakir fac9932938 Embedding gsalib source and queueJobReport R scripts in the dist and package jars.
Moved gsalib and queueJobReport.R to embeddable namespaced locations.
Updated packager dependencies/dir to add an @includes which filters the embedded fileset.
RScriptExecutor can now JIT compiles the gsalib.
RScriptExecutor uses ProcessController and sends the Rscript output to java's stdout when run under -l DEBUG.
Refactored ProcessController and IOUtils from Queue to Sting Utils.
Added more unit tests to ProcessController along with a utility class to hard stop OutputStreams at a specified byte count.
Replaced uses of some IOUtils with Apache Commons IO.
ShellJobRunner refactored to use direct ProcessController and now kills jobs on shutdown.
Better QGraph responsiveness on shutdown by using Object.wait() instead of Thread.sleep().
2011-10-24 15:58:34 -04:00
Mark DePristo ff3dccd062 Fixing errors in queueJobReport runtime unit 2011-10-07 12:04:53 -07:00
Khalid Shakir 6d6149b9a2 Updated gsalib gsa.read.gatkreport to return all reports, even those beginning with '.'.
In PreQC using geom_blank() so MEDIAN_INSERT_SIZE plot doesn't crash on facet_grid(scales='free') when data doesn't contain points for 'RF' or 'TANDEM'.
2011-10-05 18:30:40 -04:00
Mark DePristo 6ea57bf036 Merge branch 'master' into sgintervals 2011-09-19 09:50:19 -04:00
Mark DePristo bed78b47e0 Marginally better formating, with hours the default time 2011-09-18 20:18:18 -04:00
Ryan Poplin 07d365ce39 Fixing units in queue job report Gantt plots 2011-09-12 09:01:34 -04:00
Mark DePristo 9559115ad5 Bugfix for singleton runs. Now with histograms where possible 2011-09-06 16:54:01 -04:00
Mark DePristo c6d8df8639 queueJobReport is a public feature of Queue 2011-08-29 17:20:54 -04:00
Khalid Shakir 5dcac7b064 GATKReport v0.2:
- Floating point column widths are measured correctly
- Using fixed width columns instead of white space separated which allows spaces embedded in cell values
- Legacy support for parsing white space separated v0.1 tables where the columns may not be fixed width
- Enforcing that table descriptions do not contain newlines so that tables can be parsed correctly
Replaced GATKReportTableParser with existing functionality in GATKReport
2011-08-03 00:24:47 -04:00
Kiran V Garimella ca35defdcd Moved gsalib sources from private/ to public/ 2011-07-27 12:29:43 -04:00
Ryan Poplin 5faf40b79d Moving AnalyzeAnnotations into the archive because it has outlived its usefulness. 2011-07-02 10:39:53 -04:00
Ryan Poplin f4ae6edb92 Moving some of the released R scripts into public from private 2011-06-30 14:55:25 -04:00