Commit Graph

90 Commits (458bbdee8fea389ab7e3f554bd7759d37957cc0e)

Author SHA1 Message Date
Eric Banks eca9613356 Adding support of X and = CIGAR operators to the GATK 2012-08-10 14:54:07 -04:00
Ryan Poplin 2a113977a9 Resolving merge conflicts with the new MD5s 2012-08-10 11:47:00 -04:00
Ryan Poplin 5f82ffd5d8 Adding LowQual filter to the output of the HaplotypeCaller. 2012-08-10 11:25:14 -04:00
Ryan Poplin 9887bc4410 Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-08-09 16:31:06 -04:00
Mauricio Carneiro abb168e1ba Merged bug fix from Stable into Unstable 2012-08-09 16:09:58 -04:00
Mauricio Carneiro 67d4148b32 Fixing but reported by Thomas in the forum where reads were soft-clipped beyond the limits of the contig and ReduceReads was failing with a NoSuchElement exception. Now we hard clip anything that goes beyond the boundaries of the contig. 2012-08-09 15:58:18 -04:00
Mauricio Carneiro 58420098ac Merged bug fix from Stable into Unstable 2012-08-09 13:02:23 -04:00
Mauricio Carneiro c6132ebe26 Fixed divide by zero bug when downsampler goes over regions where reads are all filtered out. Added Guillermo's bug report as an integration test 2012-08-09 13:02:11 -04:00
Ryan Poplin e48727dae3 Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-08-09 10:31:10 -04:00
Guillermo del Angel 5be7e0621d Merge branch 'master' of ssh://gsa4.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-08-09 09:58:34 -04:00
Guillermo del Angel 71ee8d87b3 Rename per-sample ML allelic fractions and counts so that they don't have the same name as the per-site INFO fields, and clarify wording in VCF header 2012-08-09 09:58:20 -04:00
Mauricio Carneiro 250ffd2ad7 Merged bug fix from Stable into Unstable 2012-08-08 15:50:07 -04:00
Mauricio Carneiro 78c1556186 Fixing ReduceReads downsampling bug -- downsampled reads were not being excluded from the read window, causing them to trail back and get caught by the sliding window exception 2012-08-08 15:49:31 -04:00
Ryan Poplin 1223d77546 Removing argument from HaplotypeCaller that was made unneccesary by recent improvements to triggering around large events 2012-08-08 15:13:20 -04:00
Eric Banks 4b2e3cec0b Quick pass of FindBugs 'inefficient use of keySet iterator instead of entrySet iterator' fixes for core tools. 2012-08-08 14:29:41 -04:00
Guillermo del Angel 3e2752667c Intermediate checkin for ReducedReads with HaplotypeCaller - change min read count over k-mer to average count over k-mer when doing assembly of a reduced read (not optimal, currently trying max and then will decide on best approach), fix merge conflicts 2012-08-08 12:07:33 -04:00
Eric Banks 2c76f71a03 Update -maxAlleles argument in integration tests 2012-08-06 22:48:04 -04:00
Guillermo del Angel c66a896b8e Fix UG integration test broken by new -maxAltAlleles nomenclature 2012-08-06 21:29:21 -04:00
Guillermo del Angel 97c5ed4feb Merge branch 'master' of ssh://gsa4.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-08-06 20:22:31 -04:00
Guillermo del Angel 238d55cb61 Fixes for running HaplotypeCaller with reduced reads: a) minor refactoring, pulled out code to compute mean representative count to ReadUtils, b) Don't use min representative count over kmer when constructing de Bruijn graph - this creates many paths with multiplicity=1 and makes us lose a lot of SNP's at edge of capture targets. Use mean instead 2012-08-06 20:22:12 -04:00
Ryan Poplin d85b38e4da Updating HaplotypeCaller integration tests 2012-08-06 12:02:19 -04:00
Ryan Poplin b8709d8c67 Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-08-06 11:41:28 -04:00
Ryan Poplin 973d1d47ed Merging together the computeDiploidHaplotypeLikelihoods functions in the HaplotypeCaller's LikelihoodEngine so they both benefit from the ReducedRead's RepresentativeCount 2012-08-06 11:40:07 -04:00
Ryan Poplin b7eec2fd0e Bug fixes related to the changes in allele padding. If a haplotype started with an insertion it led to array index out of bounds. Haplotype allele insert function is now very simple because all alleles are treated the same way. HaplotypeUnitTest now uses a variant context instead of creating Allele objects directly. 2012-08-05 12:29:10 -04:00
Guillermo del Angel d2e8eb7b23 Fixed 2 haplotype caller unit tests: a) new interface for addReadLikelihoods() including read counts, b) disable test that test basic DeBruijn graph assembly, not ready yet 2012-08-03 14:26:51 -04:00
Ryan Poplin c3b6e2b143 Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-08-03 13:14:43 -04:00
Ryan Poplin ff80f17721 Using PathComparatorTotalScore in the assembly graph traversal does a better job of capturing low frequency branches that are inside high frequnecy haplotypes. 2012-08-03 13:14:37 -04:00
Guillermo del Angel 6f8e7692d4 Merge branch 'master' of ssh://gsa4.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-08-03 12:24:37 -04:00
Guillermo del Angel 9e25b209e0 First pass of implementation of Reduced Reads with HaplotypeCaller. Main changes: a) Active region: scale PL's by representative count to determine whether region is active. b) Scale per-read, per-haplotype likelihoods by read representative counts. A read representative count is (temporarily) defined as the average representative count over all bases in read, TBD whether this is good enough to avoid biases in GL's. c) DeBruijn assembler inserts kmers N times in graph, where N is min representative count of read over kmer span - TBD again whether this is the best approach. d) Bug fixes in FragmentUtils: logic to merge fragments was wrong in cases where there is discrepancy of overlaps between unclipped/soft clipped bases. Didn't affect things before but RR makes prevalence of hard-clipped bases in CIGARs more prevalent so this was exposed. e) Cache read representative counts along with read likelihoods associated with a Haplotype. Code can/should be cleaned up and unified with PairHMMIndelErrorModelCode, as well as refactored to support arbitrary ploidy in HaplotypeCaller 2012-08-03 12:24:23 -04:00
Ryan Poplin 3ece4c4993 Merged bug fix from Stable into Unstable 2012-08-02 11:41:36 -04:00
Ryan Poplin cb8bc18aeb Fix for error in HaplotypeCaller. HC has a UG argument collection for the UG engine but some of those arguments aren't appropriate to set. 2012-08-02 11:41:06 -04:00
Guillermo del Angel 9ac72dbd4d Merged bug fix from Stable into Unstable 2012-08-01 10:56:45 -04:00
Guillermo del Angel 01265f78e6 Add sanity check and possible bug fix for forum user: if haplotypes cannot be created from given alleles when genotyping indels (e.g. too close to contig boundary, etc.) in pool mode, empty allele list, signifying site can't be genotyped 2012-08-01 10:50:00 -04:00
Guillermo del Angel 4a23f3cd11 Simple cleanup of pool caller code - since usage is much more general than just calling pools, AF calculation models and GL calculation models are renamed from Pool -> GeneralPloidy. Also, don't have users specify special arguments for -glm and -pnrm. Instead, when running UG with sample ploidy != 2, the correct general ploidy modules are automatically detected and loaded. -glm now reverts to old [SNP|INDEL|BOTH] usage 2012-07-31 16:34:20 -04:00
Mark DePristo e00ed8bc5e Cleanup BQSR classes
-- Moved most of BQSR classes (which are used throughout the codebase) to utils.recalibration.  It's better in my opinion to keep commonly used code in utils, and only specialized code in walkers.  As code becomes embedded throughout GATK its should be refactored to live in utils
-- Removed unncessary imports of BQSR in VQSR v3
-- Now ready to refactor QualQuantizer and unit test into a subclass of RecalDatum, refactor unit tests into RecalDatum unit tests, and generalize into hierarchical recal datum that can be used in QualQuantizer and the analysis of adaptive context covariate
-- Update PluginManager to sort the plugins and interfaces.  This allows us to have a deterministic order in which the plugin classes come back, which caused BQSR integration tests to temporarily change because I moved my classes around a bit.
2012-07-31 08:11:03 -04:00
Guillermo del Angel e6b326c189 Merge branch 'master' of ssh://gsa4.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-07-30 21:32:19 -04:00
Guillermo del Angel 6c9d3ec155 Remerge after changes to allele construction code. More cleanups/fixes to artificial read pileup provider 2012-07-30 21:32:03 -04:00
Ryan Poplin 3dabb90eb0 Updating example active region walker integration test. 2012-07-30 21:26:16 -04:00
Ryan Poplin c2b57ee444 updating HC integration tests after these changes. 2012-07-30 12:41:40 -04:00
Ryan Poplin 13591b169f Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-07-30 12:13:24 -04:00
Ryan Poplin 48b9495460 Fixes to the likelihood based LD calculation for deciding when to combine consecutive events. 2012-07-30 12:12:56 -04:00
Ryan Poplin 9002758ede Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-07-30 12:10:00 -04:00
Ryan Poplin 7a73042cd3 Bug fix for the case of merging two VCs when a deletion deletes the padding base for a consecutive indel. Added unit test to cover this case. 2012-07-30 12:09:23 -04:00
Eric Banks 5743694196 Merged bug fix from Stable into Unstable 2012-07-30 11:35:28 -04:00
Eric Banks 79195b97a3 Adding categories for the remaining uncategorized walkers 2012-07-30 11:35:08 -04:00
Guillermo del Angel 5b9a1af7fe Intermediate fix for pool GL unit test: fix up artificial read pileup provider to give consistent data. b) Increase downsampling in pool integration tests with reference sample, and shorten MT tests so they don't last too long 2012-07-30 09:56:10 -04:00
Eric Banks 99b15b2b3a Final checkpoint: all tests pass. Note that there were bugs in the PoolGenotypeLikelihoodsUnitTest that needed fixing and eventually led to my needing to disable one of the tests (with a note for Guillermo to look into it). Also note that while I have moved over the GATK to use the new non-null representation of Alleles, I didn't remove all of the now-superfluous code throughout to do padding checking on merges; we'll need to do this on a subsequent push. 2012-07-29 01:07:59 -04:00
Eric Banks 2b1b00ade5 All integration tests and VC/Allele unit tests are passing 2012-07-27 17:03:49 -04:00
Eric Banks beb7610195 Resolving merge conflicts 2012-07-27 15:52:02 -04:00
Eric Banks 27e7e11ec0 Allele refactoring checkpoint #3: all integration tests except for PoolCaller are passing now. Fixed a couple of bugs from old code that popped up during md5 difference review. Added VariantContextUtils.requiresPaddingBase() method for tools that create alleles to use for determining whether or not to add the ref padding base. One of the HaplotypeCaller tests wasn't passing because of RankSumTest differences, so I added a TODO for Ryan to look into this. 2012-07-27 15:48:40 -04:00