* fixed context covariate famous "off by one" error
* reduced maximum quality score to Q50 (following Eric/Ryan's suggestion)
* remove context downsampling in BQSR R script
This test brings together the old and the new BQSR, building a recalibration table using the two separate frameworks and performing the recalibration calculation using the two different frameworks for 10,000+ bases and asserting that the calculations match in every case.
* Refactored CycleCovariate to be a fragment covariate instead of a per read covariate
* Refactored the CycleCovariateUnitTest to test the pairing information
* Updated BQSR Integration tests accordingly
* Made quantization levels parameter not hidden anymore
* Added hidden option to keep intermediate plotting files for debug purposes (they're automatically deleted)
* Added hidden option not to generate the plots automatically (important for scatter/gathering)
The most important reason for this change is that we no longer need to read the entire recal file into memory up front in ApplyRecalibration. For 1000G calling this was prohibitive in terms of memory requirements. Now we go through the rod system and pull in just the records we need at a given position.
As an added bonus, once BCF2 is live we can drastically cut down the sizes of these recal files (which can grow large for whole genome calling).
* removed low quality bases from the recalibration report.
* refactored the Datum (Recal and Accuracy) class structure
* created a new plotting csv table for optimized performance with the R script
* added a datum object that carries the accuracy information (AccuracyDatum) for plotting
* added mean reported quality score to all covariates
* added QualityScore as a covariate for plotting purposes
* added unit test to the key manager to operate with one required covariate and multiple optional covariates
* integrated the plotting into BQSR (automatically generates the pdf with the recalibration tearsheet)
- By porting from jython to java now accessible to Queue via automatic extension generation.
- Better handling for problematic sample names by using PicardAggregationUtils.
GATKReportTable looks up keys using arrays instead of dot-separated strings, which is useful when a sample has a period in the name.
CombineVariants has option to suppress the header with the command line, which is now invoked during VCF gathering.
Added SelectHeaders walker for filtering headers for dbGAP submission.
Generated command line for read filters now correctly prefixes the argument name as --read_filter instead of -read_filter.
Latest WholeGenomePipeline.
Other minor cleanup to utility methods.