Mauricio Carneiro
902277856e
fix for RBP getPileupsForSamples()
...
do not differentiate per sample pileups from generic pileups. Do the same for both -- it's O(n) either way.
2012-04-24 17:20:30 -04:00
Mauricio Carneiro
82b4798913
CountBasesWalker -- a quick QC walker.
2012-04-24 17:20:30 -04:00
Mauricio Carneiro
e440d0ce69
BQSR triage #4
...
* fixed queue script plot file names
* updated the ReadGroupCovariate to use the platform unit instead of sample + lane.
* fixed plotting of marginalized reported qualities
2012-04-24 17:19:54 -04:00
Eric Banks
d6277b70d8
Forgot to consider the optimized case in hasAllele
2012-04-24 11:32:28 -04:00
Eric Banks
91bad244d5
Using a VCF whose ALT is the reference in GGA mode is a User Error
2012-04-24 11:08:37 -04:00
Eric Banks
74ad008163
Adding VariantContext.hasAlternateAllele functionality
2012-04-24 11:07:46 -04:00
Eric Banks
66f3315548
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-04-24 09:39:55 -04:00
Eric Banks
bcb93dda5f
Fixing docs (rank sum test values are not phred-scaled)
2012-04-24 09:39:42 -04:00
Mauricio Carneiro
e39a59594a
BQSR triage and test routines
...
* updated BQSR queue script for faster turnaround
* implemented plot generation for scatter/gatherered runs
* adjusted output file names to be cooperative with the queue script
* added the recalibration report file to the argument table in the report
* added ReadCovariates unit test -- guarantees that all the covariates are being generated for every base in the read
* added RecalibrationReport unit test -- guarantees the integrity of the delta tables
2012-04-23 11:23:00 -04:00
Eric Banks
a733723439
Merged bug fix from Stable into Unstable
2012-04-23 10:30:30 -04:00
Eric Banks
2761da975e
Handle null VCs (which can arise when indels are present in the file)
2012-04-23 10:30:00 -04:00
Eric Banks
cd63bcb1b8
Fixing unit tests to register the user exception being thrown (instead of the NumberFormatException)
2012-04-23 10:06:51 -04:00
Eric Banks
63aa79df82
Slightly better error message
2012-04-23 09:37:28 -04:00
Eric Banks
7b5fbf9567
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-04-23 09:34:08 -04:00
Eric Banks
4edb005411
Catch poorly formatted PL/GL fields
2012-04-23 09:33:50 -04:00
Ryan Poplin
35bb55f562
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-04-22 13:23:36 -04:00
Ryan Poplin
18e4532d10
Turning down the amount of assembly graph pruning slightly in the case of low coverage.
2012-04-22 13:23:24 -04:00
Eric Banks
1f23d99dfa
If we are subsetting alleles in the UG (either because there were too many or because some were not polymorphic), then we may need to trim the alleles (because the original VariantContext may have had to pad at the end). Thanks to Ryan for reporting this. Only one of the integration tests had even partially covered this case, so I added one that did.
2012-04-20 17:00:05 -04:00
Eric Banks
4b81c75642
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-04-20 14:30:19 -04:00
Eric Banks
f1c5510ec0
When running SelectVariants with the excludeNonVariants option, remove alleles from the ALT field that are no longer polymorphic.
2012-04-20 14:30:04 -04:00
Ryan Poplin
a1596791af
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-04-20 14:03:04 -04:00
Ryan Poplin
a57295eb75
Fixing a bug when breaking up active regions where the resulting regions would overlap by one base. Adding quality score manipulation from the UG into the haplotype caller (qual capped by mapping quality, min qual threshold).
2012-04-20 14:02:55 -04:00
Guillermo del Angel
de68363c23
Removed experimental feature (aka hack) that was meant for 1000G consensus but remained in VQSR data manager - QD was being scaled by indel length. There's no evidence any more that QD is length-dependent, neither in CEU trio data nor in latest 1000G P2 calls
2012-04-20 10:58:34 -04:00
Guillermo del Angel
d2488dfb81
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-04-19 19:40:03 -04:00
Guillermo del Angel
c44c7b9a97
Restored optimization in Pair HMM only to compute HMM matrices starting in index where haplotypes start to diverge - saves about 15-20% of runtime which is what we lost by disabling banding in latest version, so runtime should be now about the same as what it was before refactoring. Output is bit-true to previous commit
2012-04-19 19:39:43 -04:00
Mauricio Carneiro
0f8c77391d
BQSR bug triage #3
...
* fixed context covariate famous "off by one" error
* reduced maximum quality score to Q50 (following Eric/Ryan's suggestion)
* remove context downsampling in BQSR R script
2012-04-19 17:31:04 -04:00
Khalid Shakir
df5dd841af
AC strat now checks if evals will be merged before throwing an error on multiple eval files.
...
Minor tweaks to WGP script based on new recal VCF format.
2012-04-19 16:08:55 -04:00
Guillermo del Angel
1ae2ab5b63
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-04-19 12:50:29 -04:00
Guillermo del Angel
0e6e0cb907
Merging bug fixes
2012-04-19 12:49:30 -04:00
Eric Banks
79272c5e15
Thanks to Menachem for pointing out that the docs for genotyping_mode and output_mode were the same (and unclear). Fixed.
2012-04-19 12:48:09 -04:00
Guillermo del Angel
02ff930f6a
My changes
2012-04-19 12:45:18 -04:00
Eric Banks
2485cef5b8
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-04-19 11:46:06 -04:00
Eric Banks
76a6e37f4f
Don't output callability metrics by default anymore; one can still have them output to the 'metrics' file (which is now @Hidden because they are really for GSA use). Added a TODO to move UG from @By reference to reads and rods once LIBS is cleaned up.
2012-04-19 11:45:56 -04:00
Ryan Poplin
1ea4e48a27
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-04-19 11:32:32 -04:00
Ryan Poplin
11001ab9a2
Adding option to HaplotypeCaller to genotype the events on the chosen haplotypes as independent events. The filtered reads are now kept around so they can be passed to the variant annotations. Unfortunately the filtered reads aren't assigned a likelihood yet so they are all thrown in the Allele.NO_CALL bin.
2012-04-19 11:32:10 -04:00
Mauricio Carneiro
eb22cd7222
Unit test to guarantee BQSR sequential calculation accuracy
...
This test brings together the old and the new BQSR, building a recalibration table using the two separate frameworks and performing the recalibration calculation using the two different frameworks for 10,000+ bases and asserting that the calculations match in every case.
2012-04-19 09:33:40 -04:00
Mauricio Carneiro
68d0211fa1
Improved BQSR plotting and some new parameters
...
* Refactored CycleCovariate to be a fragment covariate instead of a per read covariate
* Refactored the CycleCovariateUnitTest to test the pairing information
* Updated BQSR Integration tests accordingly
* Made quantization levels parameter not hidden anymore
* Added hidden option to keep intermediate plotting files for debug purposes (they're automatically deleted)
* Added hidden option not to generate the plots automatically (important for scatter/gathering)
2012-04-19 09:31:41 -04:00
Guillermo del Angel
143e92b797
Rebasing
2012-04-18 20:05:43 -04:00
Guillermo del Angel
960e7e6aaf
Changes to integration tests
2012-04-18 19:53:42 -04:00
Guillermo del Angel
82efd4457e
Revert some bad merge changes
2012-04-18 16:35:09 -04:00
Guillermo del Angel
31c394d588
Resolve merge conflicts
2012-04-18 16:25:03 -04:00
Ryan Poplin
4999ae87ad
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-04-18 15:02:42 -04:00
Ryan Poplin
dcc4871468
minor misc optimizations to PairHMM
2012-04-18 15:02:26 -04:00
Eric Banks
d3c84e7b1f
This should be a User Error since it's provided from the DoC command-line arguments
2012-04-18 13:09:23 -04:00
Eric Banks
392f1903f7
Handling some of the NumberFormatExceptions seen via Tableau that are really user errors.
2012-04-18 12:57:37 -04:00
Ryan Poplin
8a84456626
Following Eric's awesome update to change the VQSR recal file into a VCF file, the ApplyRecalibration step is now scatter/gather-able and tree reducible.
2012-04-18 11:24:04 -04:00
Eric Banks
4448a3ea76
Final tweaks. Added an integration test to cover the case of SNPs and indels that start at the same position.
2012-04-17 23:54:10 -04:00
Eric Banks
c1f52b773a
Minor tweaks and updated integration tests MD5s
2012-04-17 23:17:28 -04:00
Eric Banks
6d03bce0d3
Important refactoring of the VQSR recal file format: we now use a VCF instead of a CSV file.
...
The most important reason for this change is that we no longer need to read the entire recal file into memory up front in ApplyRecalibration. For 1000G calling this was prohibitive in terms of memory requirements. Now we go through the rod system and pull in just the records we need at a given position.
As an added bonus, once BCF2 is live we can drastically cut down the sizes of these recal files (which can grow large for whole genome calling).
2012-04-17 22:38:18 -04:00
Eric Banks
ea793d8e27
Khalid pressured me into adding an integration test that makes sure we don't fail on reads with adjacent I and D events.
2012-04-17 21:21:29 -04:00