Ryan Poplin
18e4532d10
Turning down the amount of assembly graph pruning slightly in the case of low coverage.
2012-04-22 13:23:24 -04:00
Ryan Poplin
a1596791af
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-04-20 14:03:04 -04:00
Ryan Poplin
a57295eb75
Fixing a bug when breaking up active regions where the resulting regions would overlap by one base. Adding quality score manipulation from the UG into the haplotype caller (qual capped by mapping quality, min qual threshold).
2012-04-20 14:02:55 -04:00
Menachem Fromer
40a247e860
Run HC and UG for comparison; run HC with genotypeFullActiveRegion to get phased genotypes
2012-04-20 13:33:09 -04:00
Ryan Poplin
aa903de892
Hooking up the haplotype genotyping option requested by Menachem.
2012-04-20 11:46:37 -04:00
Guillermo del Angel
de68363c23
Removed experimental feature (aka hack) that was meant for 1000G consensus but remained in VQSR data manager - QD was being scaled by indel length. There's no evidence any more that QD is length-dependent, neither in CEU trio data nor in latest 1000G P2 calls
2012-04-20 10:58:34 -04:00
Mauricio Carneiro
a1561a97c4
Changing the name of the integration test (too long to type) and disabling tests during the triage...
2012-04-19 20:36:51 -04:00
Guillermo del Angel
d2488dfb81
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-04-19 19:40:03 -04:00
Guillermo del Angel
c44c7b9a97
Restored optimization in Pair HMM only to compute HMM matrices starting in index where haplotypes start to diverge - saves about 15-20% of runtime which is what we lost by disabling banding in latest version, so runtime should be now about the same as what it was before refactoring. Output is bit-true to previous commit
2012-04-19 19:39:43 -04:00
Mauricio Carneiro
0f8c77391d
BQSR bug triage #3
...
* fixed context covariate famous "off by one" error
* reduced maximum quality score to Q50 (following Eric/Ryan's suggestion)
* remove context downsampling in BQSR R script
2012-04-19 17:31:04 -04:00
Khalid Shakir
df5dd841af
AC strat now checks if evals will be merged before throwing an error on multiple eval files.
...
Minor tweaks to WGP script based on new recal VCF format.
2012-04-19 16:08:55 -04:00
Guillermo del Angel
3fa9089085
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-04-19 14:37:34 -04:00
Menachem Fromer
6d5b05c123
Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-04-19 14:18:34 -04:00
Menachem Fromer
53ebde2c3b
Added Queue script to run HaplotypeCaller on user-defined sets of samples at particular loci
2012-04-19 14:17:57 -04:00
Guillermo del Angel
1ae2ab5b63
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-04-19 12:50:29 -04:00
Guillermo del Angel
0e6e0cb907
Merging bug fixes
2012-04-19 12:49:30 -04:00
Eric Banks
79272c5e15
Thanks to Menachem for pointing out that the docs for genotyping_mode and output_mode were the same (and unclear). Fixed.
2012-04-19 12:48:09 -04:00
Guillermo del Angel
02ff930f6a
My changes
2012-04-19 12:45:18 -04:00
Eric Banks
2485cef5b8
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-04-19 11:46:06 -04:00
Eric Banks
76a6e37f4f
Don't output callability metrics by default anymore; one can still have them output to the 'metrics' file (which is now @Hidden because they are really for GSA use). Added a TODO to move UG from @By reference to reads and rods once LIBS is cleaned up.
2012-04-19 11:45:56 -04:00
Ryan Poplin
1ea4e48a27
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-04-19 11:32:32 -04:00
Ryan Poplin
11001ab9a2
Adding option to HaplotypeCaller to genotype the events on the chosen haplotypes as independent events. The filtered reads are now kept around so they can be passed to the variant annotations. Unfortunately the filtered reads aren't assigned a likelihood yet so they are all thrown in the Allele.NO_CALL bin.
2012-04-19 11:32:10 -04:00
Mauricio Carneiro
eb22cd7222
Unit test to guarantee BQSR sequential calculation accuracy
...
This test brings together the old and the new BQSR, building a recalibration table using the two separate frameworks and performing the recalibration calculation using the two different frameworks for 10,000+ bases and asserting that the calculations match in every case.
2012-04-19 09:33:40 -04:00
Mauricio Carneiro
68d0211fa1
Improved BQSR plotting and some new parameters
...
* Refactored CycleCovariate to be a fragment covariate instead of a per read covariate
* Refactored the CycleCovariateUnitTest to test the pairing information
* Updated BQSR Integration tests accordingly
* Made quantization levels parameter not hidden anymore
* Added hidden option to keep intermediate plotting files for debug purposes (they're automatically deleted)
* Added hidden option not to generate the plots automatically (important for scatter/gathering)
2012-04-19 09:31:41 -04:00
Guillermo del Angel
143e92b797
Rebasing
2012-04-18 20:05:43 -04:00
Guillermo del Angel
a530127956
More merge bug fixes
2012-04-18 19:59:39 -04:00
Guillermo del Angel
960e7e6aaf
Changes to integration tests
2012-04-18 19:53:42 -04:00
Guillermo del Angel
82efd4457e
Revert some bad merge changes
2012-04-18 16:35:09 -04:00
Guillermo del Angel
31c394d588
Resolve merge conflicts
2012-04-18 16:25:03 -04:00
Ryan Poplin
4999ae87ad
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-04-18 15:02:42 -04:00
Ryan Poplin
dcc4871468
minor misc optimizations to PairHMM
2012-04-18 15:02:26 -04:00
Menachem Fromer
673fc00343
Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-04-18 14:17:00 -04:00
Menachem Fromer
66d3fec8e2
Need to print newline in printf
2012-04-18 14:15:50 -04:00
Eric Banks
d3c84e7b1f
This should be a User Error since it's provided from the DoC command-line arguments
2012-04-18 13:09:23 -04:00
Eric Banks
392f1903f7
Handling some of the NumberFormatExceptions seen via Tableau that are really user errors.
2012-04-18 12:57:37 -04:00
Menachem Fromer
70777c7af8
Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-04-18 12:42:28 -04:00
Menachem Fromer
5d3c764ed2
Cleaned up the DoC code and made the pseq commands a little more robust to non-standard chromosomes
2012-04-18 12:41:32 -04:00
Ryan Poplin
8a84456626
Following Eric's awesome update to change the VQSR recal file into a VCF file, the ApplyRecalibration step is now scatter/gather-able and tree reducible.
2012-04-18 11:24:04 -04:00
Eric Banks
333ae3bc2c
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-04-17 23:54:16 -04:00
Eric Banks
4448a3ea76
Final tweaks. Added an integration test to cover the case of SNPs and indels that start at the same position.
2012-04-17 23:54:10 -04:00
Mauricio Carneiro
2fba7df0a6
Updating integration tests for new BQSR output
2012-04-17 23:39:40 -04:00
Eric Banks
c1f52b773a
Minor tweaks and updated integration tests MD5s
2012-04-17 23:17:28 -04:00
Eric Banks
6d03bce0d3
Important refactoring of the VQSR recal file format: we now use a VCF instead of a CSV file.
...
The most important reason for this change is that we no longer need to read the entire recal file into memory up front in ApplyRecalibration. For 1000G calling this was prohibitive in terms of memory requirements. Now we go through the rod system and pull in just the records we need at a given position.
As an added bonus, once BCF2 is live we can drastically cut down the sizes of these recal files (which can grow large for whole genome calling).
2012-04-17 22:38:18 -04:00
Eric Banks
ea793d8e27
Khalid pressured me into adding an integration test that makes sure we don't fail on reads with adjacent I and D events.
2012-04-17 21:21:29 -04:00
Menachem Fromer
00bef8f33b
Must provide output as argument so that command-line string is constructed properly
2012-04-17 20:50:11 -04:00
Menachem Fromer
e3bc1e8630
Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-04-17 20:33:45 -04:00
Menachem Fromer
4ff9cbd7e9
Added more support for XHMM options (and can now use Plink/Seq to calculate repeat-masked intervals since GATK and reference do not have this encoded in lower-case bases)
2012-04-17 20:33:02 -04:00
Mauricio Carneiro
46a212d8e9
Added "simplify reads" option to PrintReads.
2012-04-17 19:32:34 -04:00
Mauricio Carneiro
f0c81b59b0
Implementation of the new BQSR plotting infrastructure
...
* removed low quality bases from the recalibration report.
* refactored the Datum (Recal and Accuracy) class structure
* created a new plotting csv table for optimized performance with the R script
* added a datum object that carries the accuracy information (AccuracyDatum) for plotting
* added mean reported quality score to all covariates
* added QualityScore as a covariate for plotting purposes
* added unit test to the key manager to operate with one required covariate and multiple optional covariates
* integrated the plotting into BQSR (automatically generates the pdf with the recalibration tearsheet)
2012-04-17 19:23:55 -04:00
Ryan Poplin
952280bef1
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-04-17 17:00:14 -04:00