Ryan Poplin
e48727dae3
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-08-09 10:31:10 -04:00
Guillermo del Angel
5be7e0621d
Merge branch 'master' of ssh://gsa4.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-08-09 09:58:34 -04:00
Guillermo del Angel
71ee8d87b3
Rename per-sample ML allelic fractions and counts so that they don't have the same name as the per-site INFO fields, and clarify wording in VCF header
2012-08-09 09:58:20 -04:00
Eric Banks
6230b49a86
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-08-08 21:44:43 -04:00
Eric Banks
35cec8530c
Make coverage threshold in FindCoveredIntervals a command-line argument
2012-08-08 21:44:24 -04:00
Mauricio Carneiro
250ffd2ad7
Merged bug fix from Stable into Unstable
2012-08-08 15:50:07 -04:00
Mauricio Carneiro
78c1556186
Fixing ReduceReads downsampling bug -- downsampled reads were not being excluded from the read window, causing them to trail back and get caught by the sliding window exception
2012-08-08 15:49:31 -04:00
Ryan Poplin
1223d77546
Removing argument from HaplotypeCaller that was made unneccesary by recent improvements to triggering around large events
2012-08-08 15:13:20 -04:00
Eric Banks
0a2a646a52
Other random FindBugs fixes
2012-08-08 14:56:27 -04:00
Eric Banks
f652d7806e
FindBugs found an infinite loop in the code
2012-08-08 14:44:49 -04:00
Eric Banks
4c84cc9486
Quick pass of FindBugs 'should be static inner class' fixes.
2012-08-08 14:42:06 -04:00
Eric Banks
a0196c9f5b
Quick pass of FindBugs 'method invokes inefficient Number constructor' fixes.
2012-08-08 14:34:16 -04:00
Eric Banks
4b2e3cec0b
Quick pass of FindBugs 'inefficient use of keySet iterator instead of entrySet iterator' fixes for core tools.
2012-08-08 14:29:41 -04:00
Guillermo del Angel
3e2752667c
Intermediate checkin for ReducedReads with HaplotypeCaller - change min read count over k-mer to average count over k-mer when doing assembly of a reduced read (not optimal, currently trying max and then will decide on best approach), fix merge conflicts
2012-08-08 12:07:33 -04:00
David Roazen
a7811d673f
Update URL for phone home / GATK key documentation output by the GATK upon error
2012-08-08 09:29:54 -04:00
Mark DePristo
cda8d944b7
Bugfixes for BCF with VQSR
...
-- Old version converted doubles directly from strings. New version uses VariantContext getAttributeAsDouble() that looks at the values directly to determine how to convert from Object to Double (via Double.valueOf, (Double), or (Double)(Integer)).
-- getAttributeAsDouble() is now smart in converting integers to doubles as needed
-- Removed unnecessary logging info in BCF2Codec
-- Added integration tests to ensure that VQSR works end-to-end with BCF2 using sites version of the file khalid sent to me
-- Added vqsr.bcf_test.snps.unfiltered.bcf file for this integration test
2012-08-07 17:22:39 -04:00
Mark DePristo
80b94a4f9a
AdaptiveContexts implement pruning to a given chi2 p value
...
-- Added bonferroni corrected p-value pruning, so you tell it how significant of a different you are willing to collapse in the tree, and it prunes the tree down to this maximum threshold
-- Penalty is now a phred-scaled p-value not the raw chi2 value
-- Split command line arguments in VisualizeContextTree into separate arguments for each type of pruning
2012-08-07 17:22:39 -04:00
Mark DePristo
982c735c76
VisualizeAdaptiveTree now considers only leaf nodes when computing max/min penalty
2012-08-07 17:22:39 -04:00
Ryan Poplin
15085bf03e
The UnifiedGenotyper now makes use of base insertion and base deletion quality scores if they exist in the reads.
2012-08-07 13:58:22 -04:00
Eric Banks
2c76f71a03
Update -maxAlleles argument in integration tests
2012-08-06 22:48:04 -04:00
Guillermo del Angel
c66a896b8e
Fix UG integration test broken by new -maxAltAlleles nomenclature
2012-08-06 21:29:21 -04:00
Guillermo del Angel
97c5ed4feb
Merge branch 'master' of ssh://gsa4.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-08-06 20:22:31 -04:00
Guillermo del Angel
238d55cb61
Fixes for running HaplotypeCaller with reduced reads: a) minor refactoring, pulled out code to compute mean representative count to ReadUtils, b) Don't use min representative count over kmer when constructing de Bruijn graph - this creates many paths with multiplicity=1 and makes us lose a lot of SNP's at edge of capture targets. Use mean instead
2012-08-06 20:22:12 -04:00
Eric Banks
d2feb5d742
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-08-06 15:06:45 -04:00
Eric Banks
4b5e1aed1f
We might completely overhaul the AssignSomaticStatus tool at some point, but for now I've just tweaked it a little so that it runs well with the TALEN data: fixed VCF header bugs and changed it so that any multi-sample VCF can be used and the user specifies which of the samples is the normal and which should be used as the tumor (only 1 T/N pair allowed now).
2012-08-06 15:06:36 -04:00
Mark DePristo
00858f16a6
Deleting empty unit test for AdaptiveContexts
2012-08-06 12:58:13 -04:00
Ryan Poplin
f1c30c3a59
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-08-06 12:02:26 -04:00
Ryan Poplin
d85b38e4da
Updating HaplotypeCaller integration tests
2012-08-06 12:02:19 -04:00
Mark DePristo
44f160f29f
indelGOP and indelGCP are now advanced, not hidden arguments
2012-08-06 11:42:55 -04:00
Mark DePristo
2f004665fb
Fixing public -> private dep
2012-08-06 11:42:55 -04:00
Mark DePristo
7bf5ca51ee
Major bugfix for adaptive contexts
...
-- Basically I was treating the context history in the wrong direction, effectively predicting the further bases in the context based on the closer one. Totally backward. Updated the code to build the tree in the right direction.
-- Added a few more useful outputs for analysis (minPenalty and maxPenalty)
-- Misc. cleanup of the code
-- Overall I'm not 100% certain this is even the right way to think about the problem. Clearly this is producing a reasonable output but the sum of chi2 values over the entire tree is just enormous. Perhaps a MCMC convergence / sampling criterion would be a better way to think about this problem?
2012-08-06 11:42:55 -04:00
Mark DePristo
b4841548f1
Bug fixes and misc. improvements to running the adaptive context tools
...
-- Better output file name defaults
-- Fixed nasty bug where I included non-existant quals in the contexts to process because they showed up in the Cycle covariate
-- Data is processed in qual order now, so it's easier to see progress
-- Logger messages explaining where we are in the process
-- When in UPDATE mode we still write out the information for an equivalent prune by depth for post analysis
2012-08-06 11:42:55 -04:00
Ryan Poplin
b8709d8c67
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-08-06 11:41:28 -04:00
Ryan Poplin
afa70a13a9
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-08-06 11:40:27 -04:00
Ryan Poplin
973d1d47ed
Merging together the computeDiploidHaplotypeLikelihoods functions in the HaplotypeCaller's LikelihoodEngine so they both benefit from the ReducedRead's RepresentativeCount
2012-08-06 11:40:07 -04:00
Eric Banks
210db5ec27
Update -maxAlleles argument to -maxAltAlleles to make it more accurate. The hidden GSA production -capMaxAllelesForIndels argument also gets updated.
2012-08-06 11:31:18 -04:00
Eric Banks
8f95a03bb6
Prevent NumberFormatExceptions when parsing the VCF POS field
2012-08-06 11:19:54 -04:00
Ryan Poplin
b7eec2fd0e
Bug fixes related to the changes in allele padding. If a haplotype started with an insertion it led to array index out of bounds. Haplotype allele insert function is now very simple because all alleles are treated the same way. HaplotypeUnitTest now uses a variant context instead of creating Allele objects directly.
2012-08-05 12:29:10 -04:00
Mark DePristo
e1bba91836
Ready for full-scale evaluation adaptive BQSR contexts
...
-- VisualizeContextTree now can write out an equivalent BQSR table determined after adaptive context merging of all RG x QUAL x CONTEXT trees
-- Docs, algorithm descriptions, etc so that it makes sense what's going on
-- VisualizeContextTree should really be simplified when into a single tool that just visualize the trees when / if we decide to make adaptive contexts standard part of BQSR
-- Misc. cleaning, organization of the code (recalibation tests were in private but corresponding actual files were public)
2012-08-03 16:02:53 -04:00
Guillermo del Angel
d2e8eb7b23
Fixed 2 haplotype caller unit tests: a) new interface for addReadLikelihoods() including read counts, b) disable test that test basic DeBruijn graph assembly, not ready yet
2012-08-03 14:26:51 -04:00
Ryan Poplin
c3b6e2b143
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-08-03 13:14:43 -04:00
Ryan Poplin
ff80f17721
Using PathComparatorTotalScore in the assembly graph traversal does a better job of capturing low frequency branches that are inside high frequnecy haplotypes.
2012-08-03 13:14:37 -04:00
Guillermo del Angel
6f8e7692d4
Merge branch 'master' of ssh://gsa4.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-08-03 12:24:37 -04:00
Guillermo del Angel
9e25b209e0
First pass of implementation of Reduced Reads with HaplotypeCaller. Main changes: a) Active region: scale PL's by representative count to determine whether region is active. b) Scale per-read, per-haplotype likelihoods by read representative counts. A read representative count is (temporarily) defined as the average representative count over all bases in read, TBD whether this is good enough to avoid biases in GL's. c) DeBruijn assembler inserts kmers N times in graph, where N is min representative count of read over kmer span - TBD again whether this is the best approach. d) Bug fixes in FragmentUtils: logic to merge fragments was wrong in cases where there is discrepancy of overlaps between unclipped/soft clipped bases. Didn't affect things before but RR makes prevalence of hard-clipped bases in CIGARs more prevalent so this was exposed. e) Cache read representative counts along with read likelihoods associated with a Haplotype. Code can/should be cleaned up and unified with PairHMMIndelErrorModelCode, as well as refactored to support arbitrary ploidy in HaplotypeCaller
2012-08-03 12:24:23 -04:00
Ryan Poplin
8817fc70d1
Merged bug fix from Stable into Unstable
2012-08-03 10:45:01 -04:00
Ryan Poplin
f40d0a0a28
Updating VQSR to work with the MNP and symbolic variants that are coming out of the HaplotypeCaller. Integration tests change because of the MNPs in dbSNP.
2012-08-03 10:44:36 -04:00
Joel Thibault
51bd03cc36
Add RemoveProgramRecords annotation to ActiveRegionWalker
2012-08-03 09:54:16 -04:00
Joel Thibault
addbfd6437
Add a RemoveProgramRecords annotation
...
* Add the RemoveProgramRecords annotation to LocusWalker
2012-08-03 09:54:16 -04:00
Joel Thibault
524d7ea306
Choose whether to keep program records based on Walker
...
* Add keepProgramRecords argument
* Make removeProgramRecords / keepProgramRecords override default
2012-08-03 09:54:16 -04:00
Mark DePristo
e04989f76d
Bugfix for new PASS position in dictionary in BCF2
2012-08-03 09:42:21 -04:00