Commit Graph

9362 Commits (a1561a97c44b0bdbbdacafc556bd5db33bd44e52)

Author SHA1 Message Date
Mauricio Carneiro a1561a97c4 Changing the name of the integration test (too long to type) and disabling tests during the triage... 2012-04-19 20:36:51 -04:00
Guillermo del Angel d2488dfb81 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-04-19 19:40:03 -04:00
Guillermo del Angel c44c7b9a97 Restored optimization in Pair HMM only to compute HMM matrices starting in index where haplotypes start to diverge - saves about 15-20% of runtime which is what we lost by disabling banding in latest version, so runtime should be now about the same as what it was before refactoring. Output is bit-true to previous commit 2012-04-19 19:39:43 -04:00
Mauricio Carneiro 0f8c77391d BQSR bug triage #3
* fixed context covariate famous "off by one" error
   * reduced maximum quality score to Q50 (following Eric/Ryan's suggestion)
   * remove context downsampling in BQSR R script
2012-04-19 17:31:04 -04:00
Khalid Shakir df5dd841af AC strat now checks if evals will be merged before throwing an error on multiple eval files.
Minor tweaks to WGP script based on new recal VCF format.
2012-04-19 16:08:55 -04:00
Guillermo del Angel 3fa9089085 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-04-19 14:37:34 -04:00
Menachem Fromer 6d5b05c123 Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-04-19 14:18:34 -04:00
Menachem Fromer 53ebde2c3b Added Queue script to run HaplotypeCaller on user-defined sets of samples at particular loci 2012-04-19 14:17:57 -04:00
Guillermo del Angel 1ae2ab5b63 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-04-19 12:50:29 -04:00
Guillermo del Angel 0e6e0cb907 Merging bug fixes 2012-04-19 12:49:30 -04:00
Eric Banks 79272c5e15 Thanks to Menachem for pointing out that the docs for genotyping_mode and output_mode were the same (and unclear). Fixed. 2012-04-19 12:48:09 -04:00
Guillermo del Angel 02ff930f6a My changes 2012-04-19 12:45:18 -04:00
Eric Banks 2485cef5b8 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-04-19 11:46:06 -04:00
Eric Banks 76a6e37f4f Don't output callability metrics by default anymore; one can still have them output to the 'metrics' file (which is now @Hidden because they are really for GSA use). Added a TODO to move UG from @By reference to reads and rods once LIBS is cleaned up. 2012-04-19 11:45:56 -04:00
Ryan Poplin 1ea4e48a27 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-04-19 11:32:32 -04:00
Ryan Poplin 11001ab9a2 Adding option to HaplotypeCaller to genotype the events on the chosen haplotypes as independent events. The filtered reads are now kept around so they can be passed to the variant annotations. Unfortunately the filtered reads aren't assigned a likelihood yet so they are all thrown in the Allele.NO_CALL bin. 2012-04-19 11:32:10 -04:00
Mauricio Carneiro eb22cd7222 Unit test to guarantee BQSR sequential calculation accuracy
This test brings together the old and the new BQSR, building a recalibration table using the two separate frameworks and performing the recalibration calculation using the two different frameworks for 10,000+ bases and asserting that the calculations match in every case.
2012-04-19 09:33:40 -04:00
Mauricio Carneiro 68d0211fa1 Improved BQSR plotting and some new parameters
* Refactored CycleCovariate to be a fragment covariate instead of a per read covariate
   * Refactored the CycleCovariateUnitTest to test the pairing information
   * Updated BQSR Integration tests accordingly
   * Made quantization levels parameter not hidden anymore
   * Added hidden option to keep intermediate plotting files for debug purposes (they're automatically deleted)
   * Added hidden option not to generate the plots automatically (important for scatter/gathering)
2012-04-19 09:31:41 -04:00
Guillermo del Angel 143e92b797 Rebasing 2012-04-18 20:05:43 -04:00
Guillermo del Angel a530127956 More merge bug fixes 2012-04-18 19:59:39 -04:00
Guillermo del Angel 960e7e6aaf Changes to integration tests 2012-04-18 19:53:42 -04:00
Guillermo del Angel 82efd4457e Revert some bad merge changes 2012-04-18 16:35:09 -04:00
Guillermo del Angel 31c394d588 Resolve merge conflicts 2012-04-18 16:25:03 -04:00
Ryan Poplin 4999ae87ad Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-04-18 15:02:42 -04:00
Ryan Poplin dcc4871468 minor misc optimizations to PairHMM 2012-04-18 15:02:26 -04:00
Menachem Fromer 673fc00343 Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-04-18 14:17:00 -04:00
Menachem Fromer 66d3fec8e2 Need to print newline in printf 2012-04-18 14:15:50 -04:00
Eric Banks d3c84e7b1f This should be a User Error since it's provided from the DoC command-line arguments 2012-04-18 13:09:23 -04:00
Eric Banks 392f1903f7 Handling some of the NumberFormatExceptions seen via Tableau that are really user errors. 2012-04-18 12:57:37 -04:00
Menachem Fromer 70777c7af8 Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-04-18 12:42:28 -04:00
Menachem Fromer 5d3c764ed2 Cleaned up the DoC code and made the pseq commands a little more robust to non-standard chromosomes 2012-04-18 12:41:32 -04:00
Ryan Poplin 8a84456626 Following Eric's awesome update to change the VQSR recal file into a VCF file, the ApplyRecalibration step is now scatter/gather-able and tree reducible. 2012-04-18 11:24:04 -04:00
Eric Banks 333ae3bc2c Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-04-17 23:54:16 -04:00
Eric Banks 4448a3ea76 Final tweaks. Added an integration test to cover the case of SNPs and indels that start at the same position. 2012-04-17 23:54:10 -04:00
Mauricio Carneiro 2fba7df0a6 Updating integration tests for new BQSR output 2012-04-17 23:39:40 -04:00
Eric Banks c1f52b773a Minor tweaks and updated integration tests MD5s 2012-04-17 23:17:28 -04:00
Eric Banks 6d03bce0d3 Important refactoring of the VQSR recal file format: we now use a VCF instead of a CSV file.
The most important reason for this change is that we no longer need to read the entire recal file into memory up front in ApplyRecalibration.  For 1000G calling this was prohibitive in terms of memory requirements.  Now we go through the rod system and pull in just the records we need at a given position.

As an added bonus, once BCF2 is live we can drastically cut down the sizes of these recal files (which can grow large for whole genome calling).
2012-04-17 22:38:18 -04:00
Eric Banks ea793d8e27 Khalid pressured me into adding an integration test that makes sure we don't fail on reads with adjacent I and D events. 2012-04-17 21:21:29 -04:00
Menachem Fromer 00bef8f33b Must provide output as argument so that command-line string is constructed properly 2012-04-17 20:50:11 -04:00
Menachem Fromer e3bc1e8630 Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-04-17 20:33:45 -04:00
Menachem Fromer 4ff9cbd7e9 Added more support for XHMM options (and can now use Plink/Seq to calculate repeat-masked intervals since GATK and reference do not have this encoded in lower-case bases) 2012-04-17 20:33:02 -04:00
Mauricio Carneiro 46a212d8e9 Added "simplify reads" option to PrintReads. 2012-04-17 19:32:34 -04:00
Mauricio Carneiro f0c81b59b0 Implementation of the new BQSR plotting infrastructure
* removed low quality bases from the recalibration report.
   * refactored the Datum (Recal and Accuracy) class structure
   * created a new plotting csv table for optimized performance with the R script
   * added a datum object that carries the accuracy information (AccuracyDatum) for plotting
   * added mean reported quality score to all covariates
   * added QualityScore as a covariate for plotting purposes
   * added unit test to the key manager to operate with one required covariate and multiple optional covariates
   * integrated the plotting into BQSR (automatically generates the pdf with the recalibration tearsheet)
2012-04-17 19:23:55 -04:00
Ryan Poplin 952280bef1 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-04-17 17:00:14 -04:00
Ryan Poplin cf705f6c62 Adding read position rank sum test to the list of annotations that get produced with the HaplotypeCaller 2012-04-17 17:00:00 -04:00
Eric Banks 4ced094160 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-04-17 15:51:37 -04:00
Eric Banks 13c800417e Handle NPE in UG indel code: deletions immediately preceding insertions were not handled well in the code. 2012-04-17 15:51:23 -04:00
Guillermo del Angel c78b0eee3a Refactoring/fixing up UG HMM code: a) Make code use PairHMM class instead of having duplicated code. That way UG and HaplotypeCaller now use same core code. Changes to be able to do this: 1. Compute context-dependent GOP as a function of read, not of haplotype, b) Extracted code to initialize HMM arrays into separate method, c) Move PairHMM class and unit test to public, d) Reenable banded code in PairHMM, inverted sense of flag (true=enable feature) but leave off in HaplotypeCaller. 2012-04-17 14:22:48 -04:00
Ryan Poplin 08a1ef2681 Fixing the issue identified by Guillermo where the last base of read wasn't being considered in the non-global alignment PairHMM. Added his unit tests to catch this case. 2012-04-17 14:01:43 -04:00
Khalid Shakir 91cb654791 AggregateMetrics:
- By porting from jython to java now accessible to Queue via automatic extension generation.
- Better handling for problematic sample names by using PicardAggregationUtils.
GATKReportTable looks up keys using arrays instead of dot-separated strings, which is useful when a sample has a period in the name.
CombineVariants has option to suppress the header with the command line, which is now invoked during VCF gathering.
Added SelectHeaders walker for filtering headers for dbGAP submission.
Generated command line for read filters now correctly prefixes the argument name as --read_filter instead of -read_filter.
Latest WholeGenomePipeline.
Other minor cleanup to utility methods.
2012-04-17 11:45:32 -04:00