Mauricio Carneiro
7cf9911924
Fixed ReduceReads bug where variant regions were missing.
...
This affected variant regions with more than 100 reads and less than 250 reads. Only bams reduced with GATK v2 and 2.1 were affected.
2012-09-19 16:09:08 -04:00
Ryan Poplin
e5cfdb4811
Bug fix for popular _Duplicate allele added to VariantContext_ error reported on the forum. It seems to be due to lower case bases in the reference being treated as reference mismatches. We would try to turn these mismatches into SNP events, for example c/C. We now uppercase the result from IndexedFastaSequenceFile.getSubsequenceAt()
2012-08-22 14:39:35 -04:00
Ryan Poplin
464d49509a
Pulling out common caller arguments into its own StandardCallerArgumentCollection base class so that every caller isn't exposed to the unused arguments from every other caller.
2012-08-20 15:28:39 -04:00
Ryan Poplin
c67d708c51
Bug fix in HaplotypeCaller for non-regular bases in the reference or reads. Those events don't get created any more. Bug fix for advanced GenotypeFullActiveRegion mode: custom variant annotations created by the HC don't make sense when in this mode so don't try to calculate them.
2012-08-20 13:41:08 -04:00
Eric Banks
05cbf1c8c0
FindBugs 'Efficiency' fixes
2012-08-16 15:40:52 -04:00
Eric Banks
dac3958461
Killing off some FindBugs 'Usability' issues
2012-08-16 13:32:44 -04:00
Eric Banks
eca9613356
Adding support of X and = CIGAR operators to the GATK
2012-08-10 14:54:07 -04:00
Ryan Poplin
2a113977a9
Resolving merge conflicts with the new MD5s
2012-08-10 11:47:00 -04:00
Ryan Poplin
5f82ffd5d8
Adding LowQual filter to the output of the HaplotypeCaller.
2012-08-10 11:25:14 -04:00
Ryan Poplin
9887bc4410
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-08-09 16:31:06 -04:00
Mauricio Carneiro
abb168e1ba
Merged bug fix from Stable into Unstable
2012-08-09 16:09:58 -04:00
Mauricio Carneiro
67d4148b32
Fixing but reported by Thomas in the forum where reads were soft-clipped beyond the limits of the contig and ReduceReads was failing with a NoSuchElement exception. Now we hard clip anything that goes beyond the boundaries of the contig.
2012-08-09 15:58:18 -04:00
Mauricio Carneiro
58420098ac
Merged bug fix from Stable into Unstable
2012-08-09 13:02:23 -04:00
Mauricio Carneiro
c6132ebe26
Fixed divide by zero bug when downsampler goes over regions where reads are all filtered out. Added Guillermo's bug report as an integration test
2012-08-09 13:02:11 -04:00
Ryan Poplin
e48727dae3
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-08-09 10:31:10 -04:00
Guillermo del Angel
5be7e0621d
Merge branch 'master' of ssh://gsa4.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-08-09 09:58:34 -04:00
Guillermo del Angel
71ee8d87b3
Rename per-sample ML allelic fractions and counts so that they don't have the same name as the per-site INFO fields, and clarify wording in VCF header
2012-08-09 09:58:20 -04:00
Mauricio Carneiro
250ffd2ad7
Merged bug fix from Stable into Unstable
2012-08-08 15:50:07 -04:00
Mauricio Carneiro
78c1556186
Fixing ReduceReads downsampling bug -- downsampled reads were not being excluded from the read window, causing them to trail back and get caught by the sliding window exception
2012-08-08 15:49:31 -04:00
Ryan Poplin
1223d77546
Removing argument from HaplotypeCaller that was made unneccesary by recent improvements to triggering around large events
2012-08-08 15:13:20 -04:00
Eric Banks
4b2e3cec0b
Quick pass of FindBugs 'inefficient use of keySet iterator instead of entrySet iterator' fixes for core tools.
2012-08-08 14:29:41 -04:00
Guillermo del Angel
97c5ed4feb
Merge branch 'master' of ssh://gsa4.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-08-06 20:22:31 -04:00
Guillermo del Angel
238d55cb61
Fixes for running HaplotypeCaller with reduced reads: a) minor refactoring, pulled out code to compute mean representative count to ReadUtils, b) Don't use min representative count over kmer when constructing de Bruijn graph - this creates many paths with multiplicity=1 and makes us lose a lot of SNP's at edge of capture targets. Use mean instead
2012-08-06 20:22:12 -04:00
Ryan Poplin
b8709d8c67
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-08-06 11:41:28 -04:00
Ryan Poplin
973d1d47ed
Merging together the computeDiploidHaplotypeLikelihoods functions in the HaplotypeCaller's LikelihoodEngine so they both benefit from the ReducedRead's RepresentativeCount
2012-08-06 11:40:07 -04:00
Ryan Poplin
b7eec2fd0e
Bug fixes related to the changes in allele padding. If a haplotype started with an insertion it led to array index out of bounds. Haplotype allele insert function is now very simple because all alleles are treated the same way. HaplotypeUnitTest now uses a variant context instead of creating Allele objects directly.
2012-08-05 12:29:10 -04:00
Ryan Poplin
c3b6e2b143
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-08-03 13:14:43 -04:00
Ryan Poplin
ff80f17721
Using PathComparatorTotalScore in the assembly graph traversal does a better job of capturing low frequency branches that are inside high frequnecy haplotypes.
2012-08-03 13:14:37 -04:00
Guillermo del Angel
6f8e7692d4
Merge branch 'master' of ssh://gsa4.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-08-03 12:24:37 -04:00
Guillermo del Angel
9e25b209e0
First pass of implementation of Reduced Reads with HaplotypeCaller. Main changes: a) Active region: scale PL's by representative count to determine whether region is active. b) Scale per-read, per-haplotype likelihoods by read representative counts. A read representative count is (temporarily) defined as the average representative count over all bases in read, TBD whether this is good enough to avoid biases in GL's. c) DeBruijn assembler inserts kmers N times in graph, where N is min representative count of read over kmer span - TBD again whether this is the best approach. d) Bug fixes in FragmentUtils: logic to merge fragments was wrong in cases where there is discrepancy of overlaps between unclipped/soft clipped bases. Didn't affect things before but RR makes prevalence of hard-clipped bases in CIGARs more prevalent so this was exposed. e) Cache read representative counts along with read likelihoods associated with a Haplotype. Code can/should be cleaned up and unified with PairHMMIndelErrorModelCode, as well as refactored to support arbitrary ploidy in HaplotypeCaller
2012-08-03 12:24:23 -04:00
Ryan Poplin
3ece4c4993
Merged bug fix from Stable into Unstable
2012-08-02 11:41:36 -04:00
Ryan Poplin
cb8bc18aeb
Fix for error in HaplotypeCaller. HC has a UG argument collection for the UG engine but some of those arguments aren't appropriate to set.
2012-08-02 11:41:06 -04:00
Guillermo del Angel
9ac72dbd4d
Merged bug fix from Stable into Unstable
2012-08-01 10:56:45 -04:00
Guillermo del Angel
01265f78e6
Add sanity check and possible bug fix for forum user: if haplotypes cannot be created from given alleles when genotyping indels (e.g. too close to contig boundary, etc.) in pool mode, empty allele list, signifying site can't be genotyped
2012-08-01 10:50:00 -04:00
Guillermo del Angel
4a23f3cd11
Simple cleanup of pool caller code - since usage is much more general than just calling pools, AF calculation models and GL calculation models are renamed from Pool -> GeneralPloidy. Also, don't have users specify special arguments for -glm and -pnrm. Instead, when running UG with sample ploidy != 2, the correct general ploidy modules are automatically detected and loaded. -glm now reverts to old [SNP|INDEL|BOTH] usage
2012-07-31 16:34:20 -04:00
Mark DePristo
e00ed8bc5e
Cleanup BQSR classes
...
-- Moved most of BQSR classes (which are used throughout the codebase) to utils.recalibration. It's better in my opinion to keep commonly used code in utils, and only specialized code in walkers. As code becomes embedded throughout GATK its should be refactored to live in utils
-- Removed unncessary imports of BQSR in VQSR v3
-- Now ready to refactor QualQuantizer and unit test into a subclass of RecalDatum, refactor unit tests into RecalDatum unit tests, and generalize into hierarchical recal datum that can be used in QualQuantizer and the analysis of adaptive context covariate
-- Update PluginManager to sort the plugins and interfaces. This allows us to have a deterministic order in which the plugin classes come back, which caused BQSR integration tests to temporarily change because I moved my classes around a bit.
2012-07-31 08:11:03 -04:00
Guillermo del Angel
e6b326c189
Merge branch 'master' of ssh://gsa4.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-07-30 21:32:19 -04:00
Guillermo del Angel
6c9d3ec155
Remerge after changes to allele construction code. More cleanups/fixes to artificial read pileup provider
2012-07-30 21:32:03 -04:00
Ryan Poplin
3dabb90eb0
Updating example active region walker integration test.
2012-07-30 21:26:16 -04:00
Ryan Poplin
13591b169f
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-07-30 12:13:24 -04:00
Ryan Poplin
48b9495460
Fixes to the likelihood based LD calculation for deciding when to combine consecutive events.
2012-07-30 12:12:56 -04:00
Ryan Poplin
9002758ede
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-07-30 12:10:00 -04:00
Ryan Poplin
7a73042cd3
Bug fix for the case of merging two VCs when a deletion deletes the padding base for a consecutive indel. Added unit test to cover this case.
2012-07-30 12:09:23 -04:00
Eric Banks
5743694196
Merged bug fix from Stable into Unstable
2012-07-30 11:35:28 -04:00
Eric Banks
79195b97a3
Adding categories for the remaining uncategorized walkers
2012-07-30 11:35:08 -04:00
Guillermo del Angel
5b9a1af7fe
Intermediate fix for pool GL unit test: fix up artificial read pileup provider to give consistent data. b) Increase downsampling in pool integration tests with reference sample, and shorten MT tests so they don't last too long
2012-07-30 09:56:10 -04:00
Eric Banks
2b1b00ade5
All integration tests and VC/Allele unit tests are passing
2012-07-27 17:03:49 -04:00
Eric Banks
beb7610195
Resolving merge conflicts
2012-07-27 15:52:02 -04:00
Eric Banks
27e7e11ec0
Allele refactoring checkpoint #3 : all integration tests except for PoolCaller are passing now. Fixed a couple of bugs from old code that popped up during md5 difference review. Added VariantContextUtils.requiresPaddingBase() method for tools that create alleles to use for determining whether or not to add the ref padding base. One of the HaplotypeCaller tests wasn't passing because of RankSumTest differences, so I added a TODO for Ryan to look into this.
2012-07-27 15:48:40 -04:00
Ryan Poplin
22bb4804f0
HaplotypeCaller now use an excessive number of high quality soft clips as a triggering signal in order to capture both end points of a large deletion in a single active region.
2012-07-27 12:44:02 -04:00