* fixed queue script plot file names
* updated the ReadGroupCovariate to use the platform unit instead of sample + lane.
* fixed plotting of marginalized reported qualities
* updated BQSR queue script for faster turnaround
* implemented plot generation for scatter/gatherered runs
* adjusted output file names to be cooperative with the queue script
* added the recalibration report file to the argument table in the report
* added ReadCovariates unit test -- guarantees that all the covariates are being generated for every base in the read
* added RecalibrationReport unit test -- guarantees the integrity of the delta tables
* fixed context covariate famous "off by one" error
* reduced maximum quality score to Q50 (following Eric/Ryan's suggestion)
* remove context downsampling in BQSR R script
This test brings together the old and the new BQSR, building a recalibration table using the two separate frameworks and performing the recalibration calculation using the two different frameworks for 10,000+ bases and asserting that the calculations match in every case.
* Refactored CycleCovariate to be a fragment covariate instead of a per read covariate
* Refactored the CycleCovariateUnitTest to test the pairing information
* Updated BQSR Integration tests accordingly
* Made quantization levels parameter not hidden anymore
* Added hidden option to keep intermediate plotting files for debug purposes (they're automatically deleted)
* Added hidden option not to generate the plots automatically (important for scatter/gathering)
The most important reason for this change is that we no longer need to read the entire recal file into memory up front in ApplyRecalibration. For 1000G calling this was prohibitive in terms of memory requirements. Now we go through the rod system and pull in just the records we need at a given position.
As an added bonus, once BCF2 is live we can drastically cut down the sizes of these recal files (which can grow large for whole genome calling).
* removed low quality bases from the recalibration report.
* refactored the Datum (Recal and Accuracy) class structure
* created a new plotting csv table for optimized performance with the R script
* added a datum object that carries the accuracy information (AccuracyDatum) for plotting
* added mean reported quality score to all covariates
* added QualityScore as a covariate for plotting purposes
* added unit test to the key manager to operate with one required covariate and multiple optional covariates
* integrated the plotting into BQSR (automatically generates the pdf with the recalibration tearsheet)