Commit Graph

17 Commits (120deaa010e9ee14b77e28f6a1035cbb674765c1)

Author SHA1 Message Date
Mauricio Carneiro a19c27297f continuing the BQSR triage...
* fixed the loading of the new reduced size reports
   * reduced BQSR scala script memory to 2Gb
   * removed dcov parameter from BQSR scala script
   * fixed estimatedQReported calculation from -log10(pe) to -10*log10(pe).
   * updated md5's with the proper PHRED scaled EstimatedQReported
2012-04-05 14:34:15 -04:00
Mauricio Carneiro 7c3b3650bb BQSR bug triage
* fixed bug where some keys were using the same recal datum objects
    * fixed quantization qual calculations when combining multiple reports
    * fixed rounding error with empirical quality reported when combining reports
    * fixed combine routine in the gatk reports due to the primary keys being out of order
    * added auto-recalibration option to BQSR scala script
    * reduced the size of the recalibration report by ~15%
    * updated md5's
2012-04-05 09:32:18 -04:00
Mark DePristo 1ccea866d8 VariantEval now includes -keepAC0 argument to include sites with alt alleles but AC 0 in analyses
-- Updated EvalModules to work with new paramter
-- adding test file for keepAC0 to public/testdata and integration tests
2012-04-04 15:37:12 -04:00
Mauricio Carneiro 1b75663178 BQSR Gatherer implementation and integration tests
* restructured the hash tables into one class (RecalibrationReport) that has all the functionality for the different tables and key managers
   * optmized empirical qual calculation when merging recalibration reports
   * centralized the quality score quantization functionalities
   * unified the creating/loading of all the key manager/hash table structures.
   * added unit tests for the gatherer (disabled because gatk report needs to be sorted for automated testing)
   * added integration tests for BQSR and on-the-fly recalibration
2012-03-27 13:50:22 -05:00
Mauricio Carneiro 9f74969e3a BQSR with GATKReport implementation
* restructured BQSR to report recalibrated tables.
   * implemented empirical quality calculation to the BQSR stage (instead of on-the-fly recalibration)
   * linked quality score quantization to the BQSR stage, outputting a quantization histogram
   * included the arguments used in BQSR to the GATK Report
   * included all three tables (RG, QUAL and COVARIATES) to the GATK Report with empirical qualities

On-the-fly recalibration with GATK Report

   * loads all tables from the GATKReport using existing infrastructure (with minor updates)
   * implemented initialiazation of the covariates using BQSR's argument list
   * reduced memory usage significantly by loading only the empirical quality and estimated quality reported for each bit set key
   * applied quality quantization to the base recalibration
   * excluded low quality bases from on-the-fly recalibration for mismatches, insertions or deletions
2012-03-23 15:42:32 -04:00
Mauricio Carneiro e4cbeddf2d adding on-the-fly recalibration test data 2012-03-16 13:18:16 -04:00
Mark DePristo a63d1f58b6 analyzeRunReports cleanup for new minimal GATKRunReport structure
-- No more command lines or working directories
-- Added failing and successful gatkrunreports to public/testdata for testing
2012-03-12 09:46:26 -04:00
Mark DePristo e0c189909f Added support for breakpoint alleles
-- See https://getsatisfaction.com/gsa/topics/support_vcf_4_1_structural_variation_breakend_alleles?utm_content=topic_link&utm_medium=email&utm_source=new_topic
-- Added integrationtest to ensure that we can parse and write out breakpoint example
2012-02-23 12:14:48 -05:00
Mauricio Carneiro ed91461c49 Data Processing Pipeline Test
* Added standard pipeline test for the DPP
* Added a full BWA pipeline test for the DPP
* Included the extra files for the reference needed by BWA (to be used by DPP and PPP tests)
2011-12-12 00:24:51 -05:00
Mauricio Carneiro cca8a18608 PPP pipeline test
* added a pipeline test to the Pacbio Processing Pipeline.
* updated exampleBAM with more complete RG information so we can use it in a wider variety of pipeline tests
* added exampleDBSNP.vcf file with only chromosome 1 in the range of the exampleFASTA.fasta reference for pipeline tests
2011-12-11 17:32:21 -05:00
Mark DePristo 714cac21c9 Testdata for IntervalStratification 2011-11-10 11:08:34 -05:00
Mark DePristo a3c5a31686 Oops, forgot the PED test file 2011-10-05 21:09:08 -07:00
Mark DePristo c6ba944719 Adding bgzip vcf file for unit tests 2011-09-21 15:39:45 -04:00
Mark DePristo bc902e8421 GZIP VCF for testing 2011-08-19 09:05:39 -04:00
Mark DePristo 449bf1b539 Testdata for diffObjects.
PipelineTest updated to point to MD5DB.java
2011-07-18 10:47:03 -04:00
Mark A. DePristo 38740b0ff5 First working version of the DiffNode readers for VCF and BAM files. Unit tests confirm the readers are approximately working. Skeleton of a working DiffObjects walker that will be able to provide detailed information about how exactly two files of the same type differ, so long as the files are supported by the DiffNode structure. 2011-07-04 16:11:42 -04:00
David Roazen 3c9497788e Reorganized the codebase beneath top-level public and private directories,
removing the playground and oneoffprojects directories in the process. Updated
build.xml accordingly.
2011-06-28 06:55:19 -04:00