Commit Graph

55 Commits (874dbf5b58669dfe2d0b22aa954598ca89a77b41)

Author SHA1 Message Date
Guillermo del Angel e6b326c189 Merge branch 'master' of ssh://gsa4.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-07-30 21:32:19 -04:00
Guillermo del Angel 6c9d3ec155 Remerge after changes to allele construction code. More cleanups/fixes to artificial read pileup provider 2012-07-30 21:32:03 -04:00
Ryan Poplin 3dabb90eb0 Updating example active region walker integration test. 2012-07-30 21:26:16 -04:00
Ryan Poplin c2b57ee444 updating HC integration tests after these changes. 2012-07-30 12:41:40 -04:00
Ryan Poplin 13591b169f Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-07-30 12:13:24 -04:00
Ryan Poplin 48b9495460 Fixes to the likelihood based LD calculation for deciding when to combine consecutive events. 2012-07-30 12:12:56 -04:00
Ryan Poplin 9002758ede Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-07-30 12:10:00 -04:00
Ryan Poplin 7a73042cd3 Bug fix for the case of merging two VCs when a deletion deletes the padding base for a consecutive indel. Added unit test to cover this case. 2012-07-30 12:09:23 -04:00
Eric Banks 5743694196 Merged bug fix from Stable into Unstable 2012-07-30 11:35:28 -04:00
Eric Banks 79195b97a3 Adding categories for the remaining uncategorized walkers 2012-07-30 11:35:08 -04:00
Guillermo del Angel 5b9a1af7fe Intermediate fix for pool GL unit test: fix up artificial read pileup provider to give consistent data. b) Increase downsampling in pool integration tests with reference sample, and shorten MT tests so they don't last too long 2012-07-30 09:56:10 -04:00
Eric Banks 99b15b2b3a Final checkpoint: all tests pass. Note that there were bugs in the PoolGenotypeLikelihoodsUnitTest that needed fixing and eventually led to my needing to disable one of the tests (with a note for Guillermo to look into it). Also note that while I have moved over the GATK to use the new non-null representation of Alleles, I didn't remove all of the now-superfluous code throughout to do padding checking on merges; we'll need to do this on a subsequent push. 2012-07-29 01:07:59 -04:00
Eric Banks 2b1b00ade5 All integration tests and VC/Allele unit tests are passing 2012-07-27 17:03:49 -04:00
Eric Banks beb7610195 Resolving merge conflicts 2012-07-27 15:52:02 -04:00
Eric Banks 27e7e11ec0 Allele refactoring checkpoint #3: all integration tests except for PoolCaller are passing now. Fixed a couple of bugs from old code that popped up during md5 difference review. Added VariantContextUtils.requiresPaddingBase() method for tools that create alleles to use for determining whether or not to add the ref padding base. One of the HaplotypeCaller tests wasn't passing because of RankSumTest differences, so I added a TODO for Ryan to look into this. 2012-07-27 15:48:40 -04:00
Ryan Poplin 22bb4804f0 HaplotypeCaller now use an excessive number of high quality soft clips as a triggering signal in order to capture both end points of a large deletion in a single active region. 2012-07-27 12:44:02 -04:00
Ryan Poplin a0890126a8 ActiveRegionWalker's isActive function returns a results object now instead of just a double. 2012-07-27 11:01:39 -04:00
Eric Banks baf3e33730 Allele refactoring checkpoint 2: all code finally compiles, AD and STR annotations are fixed, and most of the UG integration tests pass. 2012-07-26 23:27:11 -04:00
Ryan Poplin 35e803e110 Merged bug fix from Stable into Unstable 2012-07-26 14:00:04 -04:00
Ryan Poplin 4f741b4cd7 Smoothing in the BQSR bins should be one error observation and one non-error observation. 2012-07-26 13:59:02 -04:00
Guillermo del Angel 2ae890155c Improvements to indel calling in pool caller: a) Compute per-read likelihoods in reference sample to determine wheter a read is informative or not. b) Fixed bugs in unit tests. c) Fixed padding-related bugs when computing matches/mismatches in ErrorModel, d) Added a couple of more integration tests to increase test coverage, including testing odd ploidy 2012-07-26 13:43:00 -04:00
Eric Banks a694d1b5de Merge branch 'master' into allelePadding 2012-07-26 01:53:14 -04:00
Eric Banks 32516a2f60 Initial checkpoint commit of VariantContext/Allele refactoring. There were just too many problems associated with the different representation of alleles in VCF (padded) vs. VariantContext (unpadded). We are moving VC to use the VCF representation. No more reference base for indels in VC and no more trimming and padding of alleles. Even reverse trimming has been stopped (the theory being that writers of VCF now know what they are doing and often want the reverse padding if they put it there; this has been requested on GetSatisfaction). Code compiles but presumably pretty much all tests with indels with fail at this point. 2012-07-26 01:50:39 -04:00
Eric Banks 7eb3f54750 Added category docs for the remaining public walkers (I think I got them all). I removed a couple of totally unnecessary walkers. 2012-07-25 21:40:28 -04:00
Eric Banks 0a98a6aa8d Adding extraDocs tag per Mauricio's request 2012-07-25 18:23:18 -04:00
Eric Banks 05fa377a8e Adding GATK categories to standard walkers. Will add to remaining walkers after the next successful release (so that I can see which walkers are public and still need it). 2012-07-25 16:05:47 -04:00
Eric Banks a5721a8846 Context covariate optimizations were not suited for multiple threads, so I removed them (since that ended up being much, much easier than trying to make the covariates thread local). Added -nt 2 layer to BQSR integration tests to confirm that it now works with multiple threads. 2012-07-25 13:38:07 -04:00
Eric Banks 675ccab2fa Renaming BQSR to BaseRecalibrator 2012-07-23 10:17:17 -04:00
Ryan Poplin 2e486d83e2 Updating HaplotypeCaller docs and expanding integration tests. 2012-07-23 10:05:42 -04:00
Mauricio Carneiro df965d4a5a Fixing BQSR integration test 2012-07-21 11:11:45 -04:00
Mauricio Carneiro 116885a450 Removed the "Walker" suffix from all walkers that had it.
* Did not touch archived walkers... those can be named whatever.
   * Kept abstract classes that end in Walker untouched (e.g. LocusWalker, ReadWalker, ...)
   * Renamed a few inner classes due to conflict when stripping off Walker from their outer classes: ContigStats, FlagStats and FastaStats.
2012-07-20 17:27:11 -04:00
Ryan Poplin 1592841c93 New function for merging nearby events into MNPs or complex substitutions. Added extensive unit tests. 2012-07-19 13:16:33 -04:00
Guillermo del Angel c16f9f2f15 a) Use new method to check for GATK Like, b) minor improvements to indel pool caller (more to come): brain-dead, quick way to limit number of alt alleles to genotype. We can't process too many alt alleles because of the combinatorial explosion of GL values with high ploidy, and some STR validation targets had up to 12 alt alleles, resulting of GL vectors of > 1e8 elements. Can't use pileup elements since typically not many alleles will be in one pileup, and different alleles will appear in different samples, TBD a nicer solution. c) Commit to posterity scala script for large scale validation calling, still work in progress 2012-07-19 10:24:08 -04:00
Eric Banks 5f5edeca63 Reverting move of BQSR tests to public, as per DR's email 2012-07-19 10:02:05 -04:00
Eric Banks 9c1ab1b0c0 Move BQSR integration test and its dependent files into public; previously there was a protected->private dependency. 2012-07-18 21:11:33 -04:00
Mark DePristo 74e153ff4a FisherStrand now uses RankSumTest isUsableBase to decide if a read should be included in testing
-- Previously used hardcoded MAPQ > 20 && QUAL > 20 but now uses isUsableBase
-- Updating MD5s as appropriate
2012-07-18 16:07:47 -04:00
Eric Banks e4db8dde91 Enabled a whole other bunch of integration tests for BQSRv2. While I was there I also changed the default context size for indels to 3 (from 8) since that's what works best in the current implementation (as suggested by Ryan). At this point, all of the new core tools (ReduceReads, BQSRv2, HaplotypeCaller, UG extensions) have been moved over to protected and should be stable. Looks like we are pretty much ready for GATK 2.0! 2012-07-17 23:36:43 -04:00
Guillermo del Angel 731bbba2e6 Bug fixes for integration test, use correct new UG syntax 2012-07-17 16:57:59 -04:00
Guillermo del Angel 40b8c7172c Pool Caller refactoring in preparation of GATK 2.0: a) PoolCallerUnifiedArgumentCollection disappeared, and arguments moved to UnifiedArgumentCollection. b) PoolCallerWalker is no longer needed and redundant, all functionality subsumed by UG. UG now checks if GATK is lite - if so, don't allow ploidy > 2. c) Moved pool classes from private to protected. d) Changed the way to specify ploidy. Instead of specifying samples per pool and having ploidy = 2*samplesPerPool, have user specify ploidy directly, which is cleaner. Update tests accordingly. We can now call triploid seedless grape genotypes correctly in theory. e) Renamed argument -reference to -reference_sample_calls since the former is ambiguous and it's not clear what it refers to. 2012-07-17 15:27:04 -04:00
Eric Banks b0d99fd10d Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-07-17 15:12:28 -04:00
Eric Banks 305db8c0d1 Total rewrite of the isGATKLite() functionality with help of Khalid/David. PluginManager was not working for us. 2012-07-17 15:11:03 -04:00
Ryan Poplin bf2d5efe4d Moving HaplotypeCaller integration and unit tests over to protected as well. 2012-07-17 14:51:26 -04:00
Ryan Poplin c55934043e Moving HaplotypeCaller from private to protected 2012-07-17 14:41:19 -04:00
Eric Banks 3a64398d07 Cleaned up the isGATKLite check 2012-07-17 12:46:16 -04:00
Eric Banks 62c5228048 1) Revert previous change - indel recalibration is turned on by default and users of the Lite version will need to turn it off to avoid a User Error. 2) Implemented the engine.isGATKLite() method. 2012-07-17 12:23:40 -04:00
Eric Banks 40618ac471 A bunch of BQSR changes: 1) by default we do not emit indel quals, but they can be turned on with --enable_indel_quals. 2) We check whether or not we are running in Lite mode (not done yet) and if so and the user is trying to recalibrate indels, we throw a User Error (not supported). 3) Like v1 we now allow the user to set the qual value below which we don't recalibrate (this was the remaining source of differences in the v1 vs. v2 plots). 2012-07-17 10:52:43 -04:00
Eric Banks d5b3a2eabf Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-07-17 00:32:53 -04:00
Eric Banks f657b8bda8 Complete overhaul of the BQSRv2 integration tests. Much more comprehensive. Still need to deal with a few tests that need some modifications before I'm done, but I'll take care of that sometime tomorrow. 2012-07-17 00:32:34 -04:00
Eric Banks 0a89adbcdb Add utility decorators so that classes can tell you which package source they come from if they want to (suggested by Khalid). Using those decorators, we can easily pull out the BQSR updateDataForPileupElement() method into a standard RecalibrationEngine and an AdvancedRecalibrationEngine and use the protected one (AdvancedRE) if available (otherwise, the public one). 2012-07-16 15:34:50 -04:00
Eric Banks 52baac1e16 Move BQSRv2 into public and v1 into the archive. 2012-07-16 14:23:38 -04:00