Commit Graph

2446 Commits (e00ed8bc5e2e2c9bf7376c4926fe1359de1a738c)

Author SHA1 Message Date
Mark DePristo e00ed8bc5e Cleanup BQSR classes
-- Moved most of BQSR classes (which are used throughout the codebase) to utils.recalibration.  It's better in my opinion to keep commonly used code in utils, and only specialized code in walkers.  As code becomes embedded throughout GATK its should be refactored to live in utils
-- Removed unncessary imports of BQSR in VQSR v3
-- Now ready to refactor QualQuantizer and unit test into a subclass of RecalDatum, refactor unit tests into RecalDatum unit tests, and generalize into hierarchical recal datum that can be used in QualQuantizer and the analysis of adaptive context covariate
-- Update PluginManager to sort the plugins and interfaces.  This allows us to have a deterministic order in which the plugin classes come back, which caused BQSR integration tests to temporarily change because I moved my classes around a bit.
2012-07-31 08:11:03 -04:00
Mark DePristo 191294eedc Initial cleanup of RecalDatum for move and further refactoring
-- Moved Datum, the now unnecessary superclass, into RecalDatum
-- Fixed some obviously dangerous synchronization errors in RecalDatum, though these may not have caused problems because they may not have been called in parallel mode
2012-07-31 08:11:03 -04:00
Mark DePristo 0670316288 Be clearer that dcov 50 is good for 4x, should use 200 for >30x 2012-07-31 08:11:02 -04:00
Mark DePristo 874dbf5b58 Maximum wait for GATK run report upload reduced to 10 seconds 2012-07-31 08:11:02 -04:00
Guillermo del Angel e6b326c189 Merge branch 'master' of ssh://gsa4.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-07-30 21:32:19 -04:00
Guillermo del Angel 6c9d3ec155 Remerge after changes to allele construction code. More cleanups/fixes to artificial read pileup provider 2012-07-30 21:32:03 -04:00
Ryan Poplin 7ed06ee7b9 Updating FindCoveredIntervals to use the changes to the ActiveRegionWalker. 2012-07-30 12:16:27 -04:00
Ryan Poplin 13591b169f Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-07-30 12:13:24 -04:00
Eric Banks 0b30588d67 Catch yet another class of User Errors 2012-07-30 11:59:56 -04:00
Eric Banks 5743694196 Merged bug fix from Stable into Unstable 2012-07-30 11:35:28 -04:00
Eric Banks 79195b97a3 Adding categories for the remaining uncategorized walkers 2012-07-30 11:35:08 -04:00
Guillermo del Angel 5b9a1af7fe Intermediate fix for pool GL unit test: fix up artificial read pileup provider to give consistent data. b) Increase downsampling in pool integration tests with reference sample, and shorten MT tests so they don't last too long 2012-07-30 09:56:10 -04:00
Eric Banks 7630c929a7 Re-enabling the unit tests for reverse allele clipping 2012-07-29 22:24:56 -04:00
Eric Banks b07bf1950b Adding an integration test for another feature that I snuck in during a previous commit: we now allow lower-case bases in the REF/ALT alleles of a VCF and upper-case them (this had been turned off because the previous version used Strings to do the uppercasing whereas we stick with byte operations now). 2012-07-29 22:19:49 -04:00
Eric Banks c4ae9c6cfb With the new Allele representation we can finally handle complex events (because they aren't so complex anymore). One place this manifests itself is with the strict VCF validation (ValidateVariants used to skip these events but doesn't anymore) so I've added a new test with complex events to the VV integration test. 2012-07-29 19:22:02 -04:00
Eric Banks 99b15b2b3a Final checkpoint: all tests pass. Note that there were bugs in the PoolGenotypeLikelihoodsUnitTest that needed fixing and eventually led to my needing to disable one of the tests (with a note for Guillermo to look into it). Also note that while I have moved over the GATK to use the new non-null representation of Alleles, I didn't remove all of the now-superfluous code throughout to do padding checking on merges; we'll need to do this on a subsequent push. 2012-07-29 01:07:59 -04:00
Eric Banks 2b1b00ade5 All integration tests and VC/Allele unit tests are passing 2012-07-27 17:03:49 -04:00
Eric Banks beb7610195 Resolving merge conflicts 2012-07-27 15:52:02 -04:00
Eric Banks 27e7e11ec0 Allele refactoring checkpoint #3: all integration tests except for PoolCaller are passing now. Fixed a couple of bugs from old code that popped up during md5 difference review. Added VariantContextUtils.requiresPaddingBase() method for tools that create alleles to use for determining whether or not to add the ref padding base. One of the HaplotypeCaller tests wasn't passing because of RankSumTest differences, so I added a TODO for Ryan to look into this. 2012-07-27 15:48:40 -04:00
Ryan Poplin 22bb4804f0 HaplotypeCaller now use an excessive number of high quality soft clips as a triggering signal in order to capture both end points of a large deletion in a single active region. 2012-07-27 12:44:02 -04:00
Ryan Poplin a0890126a8 ActiveRegionWalker's isActive function returns a results object now instead of just a double. 2012-07-27 11:01:39 -04:00
Eric Banks ef335b6213 Several more walkers have been brought up to use the new Allele representation. 2012-07-27 02:14:25 -04:00
Eric Banks 9e2209694a Re-enable reverse trimming of alleles in UG engine when sub-selecting alleles after genotyping. UG integration tests now pass. 2012-07-27 00:47:15 -04:00
Eric Banks baf3e33730 Allele refactoring checkpoint 2: all code finally compiles, AD and STR annotations are fixed, and most of the UG integration tests pass. 2012-07-26 23:27:11 -04:00
Ryan Poplin 35e803e110 Merged bug fix from Stable into Unstable 2012-07-26 14:00:04 -04:00
Ryan Poplin 4f741b4cd7 Smoothing in the BQSR bins should be one error observation and one non-error observation. 2012-07-26 13:59:02 -04:00
Guillermo del Angel 2ae890155c Improvements to indel calling in pool caller: a) Compute per-read likelihoods in reference sample to determine wheter a read is informative or not. b) Fixed bugs in unit tests. c) Fixed padding-related bugs when computing matches/mismatches in ErrorModel, d) Added a couple of more integration tests to increase test coverage, including testing odd ploidy 2012-07-26 13:43:00 -04:00
Eric Banks a694d1b5de Merge branch 'master' into allelePadding 2012-07-26 01:53:14 -04:00
Eric Banks 32516a2f60 Initial checkpoint commit of VariantContext/Allele refactoring. There were just too many problems associated with the different representation of alleles in VCF (padded) vs. VariantContext (unpadded). We are moving VC to use the VCF representation. No more reference base for indels in VC and no more trimming and padding of alleles. Even reverse trimming has been stopped (the theory being that writers of VCF now know what they are doing and often want the reverse padding if they put it there; this has been requested on GetSatisfaction). Code compiles but presumably pretty much all tests with indels with fail at this point. 2012-07-26 01:50:39 -04:00
Mark DePristo 8c418a15da Sorting out HMS error handling (fingers crossed)
-- Check if a traversal error occurred in the last shard
-- Catch ExecutionException from the TreeReducer and throw as our HMS execption
-- ShardTraverser just throws the exception as formatted by the HMS, rather than wrapping it as a RuntimeException itself
-- EngineFeaturesIntegrationTests now uses public exampleFASTA (faster), and does 1000x iterations (slower)
2012-07-25 23:13:12 -04:00
Mark DePristo 9242f63a4d On the way to really sorting out HMS error handling
-- Better error message when a traveral error occurs (a real bug)
-- EngineFeaturesIntegrationTest runs the multi-threaded error testing routines 50x times
-- A bit of cleanup in WalkerTest
2012-07-25 22:11:10 -04:00
Mark DePristo 5671992db3 RMDTrackBuilderUnitTest now uses private/testdata file to avoid filesystem race conditions 2012-07-25 22:05:04 -04:00
Eric Banks 7eb3f54750 Added category docs for the remaining public walkers (I think I got them all). I removed a couple of totally unnecessary walkers. 2012-07-25 21:40:28 -04:00
Eric Banks 2982b24c4b Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/stable 2012-07-25 20:36:53 -04:00
Eric Banks 0a98a6aa8d Adding extraDocs tag per Mauricio's request 2012-07-25 18:23:18 -04:00
Mauricio Carneiro fce5cb9f35 Few category changes 2012-07-25 17:23:02 -04:00
Eric Banks 05fa377a8e Adding GATK categories to standard walkers. Will add to remaining walkers after the next successful release (so that I can see which walkers are public and still need it). 2012-07-25 16:05:47 -04:00
Mauricio Carneiro d46cf47bd1 Updating Read Filter documentation 2012-07-25 15:05:47 -04:00
Eric Banks 6a3bfa3811 Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/stable 2012-07-25 14:11:11 -04:00
Eric Banks 357e0b35af Register GATK-full-only walkers and rethrow the missing walker error as a not supported in GATK lite error 2012-07-25 14:11:03 -04:00
Roger Zurawicki 5b74763096 Removed Categories.
We will use DocumentedGATKFeatures to create categories in our documentation. Eric I guess will be in charge of this. We need to remove walkers and think how to categorize everything.

Tools can be hidden from GATKdocs with the @Hidden annotation

Signed-off-by: Mauricio Carneiro <carneiro@broadinstitute.org>
2012-07-25 13:46:24 -04:00
Eric Banks a5721a8846 Context covariate optimizations were not suited for multiple threads, so I removed them (since that ended up being much, much easier than trying to make the covariates thread local). Added -nt 2 layer to BQSR integration tests to confirm that it now works with multiple threads. 2012-07-25 13:38:07 -04:00
Eric Banks e0c07f5567 Reverting old commits that made error handling better because ultimately they made things worse. 2012-07-25 12:37:59 -04:00
Mark DePristo 16947e93f2 Integration test to ensure VariantFiltration makes . -> PASS/FAIL like VQSR
Signed-off-by: Mark DePristo <depristo@broadinstitute.org>
2012-07-25 08:56:39 -04:00
Mark DePristo fcefa61bce Remove reference dependence in BCF2Codec
-- Adding BCF2Codec to VCF.jar and associated unit tests

Signed-off-by: Mark DePristo <depristo@broadinstitute.org>
2012-07-25 08:56:38 -04:00
Mark DePristo 19a257a5c1 Multiple bugfixes
-- VariantFiltration now properly sets passFilters in VC
-- BCF2 writer now properly decodes lazy BCF genotype data that it uses.  Improper use generated a horrible subtle bug but the good news is that the extra checks I put in (unnecessarily a few days ago) caught the bug!

Signed-off-by: Mark DePristo <depristo@broadinstitute.org>
2012-07-25 08:56:38 -04:00
Mark DePristo 3066894215 Bugfix for BCF2
-- Always decode genotypes block when writing out a BCF file.  If the header changes (and we currently don't know this easily) then the dictionary keys used in the genotypes block may be invalid.  Temporarily added a private static boolean that turns off writing of the blocks until Eric and his team rewrite the header.

Signed-off-by: Mark DePristo <depristo@broadinstitute.org>
2012-07-25 08:56:38 -04:00
Guillermo del Angel eb55061fd0 a) Document BEAGLE codec, b) Bug fix: inbreeding coefficient shouldn't be computed for non-diploid organisms in current implementaiton 2012-07-24 12:16:15 -04:00
Mauricio Carneiro 348e86159e Moving doclets to public 2012-07-23 23:52:14 -04:00
Mauricio Carneiro 5cd98a36b9 Making ForumAPIUtils public 2012-07-23 17:44:24 -04:00