Commit Graph

608 Commits (e12ae65d33b3e6fd009fcd47eef3f90ed4e75a12)

Author SHA1 Message Date
Mark DePristo 0f4acaae1b Update MD5s with new FS score 2012-08-28 08:06:47 -04:00
Christopher Hartl db2e88c7cb Fix for badIndelLength() throwing NPE at non-indel sites. Added integration test. 2012-08-25 12:38:23 -07:00
Christopher Hartl f1166d6d00 Spotted a potential bug where sample IDs passed in from the meta data were only checked against the sample IDs in the VCF header if the input file happened to be a meta data file rather than a fam file. Added a check for fam files as well, and added an integration test to cover each case. 2012-08-23 11:43:19 -07:00
Guillermo del Angel e29469eeeb Forgot to update 2 integration test md5's (in this cases, changes are legit because of the code revamp of AD, it's simpler if AD is not output when a site is not variant, as genotype DP conveys the same information) 2012-08-22 15:53:33 -04:00
Guillermo del Angel 901f47d8af Final step (for now) in VA refactoring: update MD5's because, a) since it's not guaranteed that we'll iterate through reads/pileups in the same order, the rank sum dithering will change annotations, b) FS uses new generic threshold to distinguish uninformative reads (it used to use ad-hoc thresholds), c) AD definition changed and throws away uninformative reads, d) shortened general ploidy integration tests for quicker debugging. May have missed some MD5's in the update so there may be lingering test failures still 2012-08-22 11:38:51 -04:00
Eric Banks c7ce3e1cf5 Merged bug fix from Stable into Unstable 2012-08-22 00:24:40 -04:00
Eric Banks 03017855e4 WTF - why is support for whole-read insertions all messed up in LIBS? I've pushed a temporary patch for now (the right solution should certainly not be implemented in stable; LIBS needs to be better thought out). Added another unit test. 2012-08-22 00:24:01 -04:00
Christopher Hartl ba8622ff0d number of stashed changes are lurking in here. In order of importance:
- Fix for M_Trieb's error report on the forum, and addition of integration tests to cover the walker.
 - Addition of StructuralIndel as a class of variation within the VariantContext. These are for variants with a full alt allele that's >150bp in length.
 - Adaptation of the MVLikelihoodRatio to work for a set of trios (takes the max over the trios of the MVLR)
 - InsertSizeDistribution changed to use the new gatk report output (it was previously broken)
 - RetrogeneDiscovery changed to be compatible with the new gatk report
 - A maxIndelSize argument added to SelectVariants
 - ByTranscriptEvaluator rewritten for cleanliness
 - VariantRecalibrator modified to not exclude structural indels from recalibration if the mode is INDEL
 - Documentation added to DepthOfCoverageIntegrationTest (no, don't yell at chartl ;_; )

Also sorry for the long commit history behind this that is the result of fixing merge conflicts. Because this *also* fixes a conflict (from git stash apply), for some reason I can't rebase all of them away. I'm pretty sure some of the commit notes say "this note isn't important because I'm going to rebase it anyway".
2012-08-21 07:08:58 -04:00
Eric Banks 40d5efc804 Fix for Adam K's reported bug: we weren't handling reads that were entirely insertions properly in LIBS. Specifically, the event bases were off-by-one (which was disasterous in Adam's case with a 1bp read). Added a unit test to cover this case. 2012-08-20 23:12:41 -04:00
Mark DePristo 9121b98167 CombineVariants outputs the first non-MISSING qual, not the maximum
-- When merging multiple VCF records at a site, the combined VCF record has the QUAL of the first VCF record with a non-MISSING QUAL value.  The previous behavior was to take the max QUAL, which resulted in sometime strange downstream confusion.
2012-08-19 10:29:38 -04:00
Mauricio Carneiro d16cb68539 Updated and more thorough version of the BadCigar read filter
* No reads with Hard/Soft clips in the middle of the cigar
   * No reads starting with deletions (with or without preceding clips)
   * No reads ending in deletions (with or without follow-up clips)
   * No reads that are fully hard or soft clipped
   * No reads that have consecutive indels in the cigar (II, DD, ID or DI)

 Also added systematic test for good cigars and iterative test for bad cigars.
2012-08-17 17:05:27 -04:00
Eric Banks 611d9b61e2 Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-08-16 13:05:36 -04:00
Eric Banks 2df04dc48a Fix for performance problem in GGA mode related to previous --regenotype commit. Instead of trying to hack around the determination of the calculation model when it's not needed, just simply overload the calculateGenotypes() method to add one that does simple genotyping. Re-enabling the Pool Caller integration tests. 2012-08-16 13:05:17 -04:00
Mark DePristo a9a1c499fd Update md5 in VariantRecalibrationWalkers test for BCF2 -- only encoding differences 2012-08-16 13:03:13 -04:00
Mark DePristo c0a31b2e5b CombineVariants parallel integration tests
-- All tests but one (using old bad VCF3 input) run unmodified with parallel code.
-- Disabled UNSAFE_VCF_PROCESSING for all but that test, which changes md5s because the output files have fixed headers
-- Minor optimizations to simpleMerge
2012-08-15 21:13:16 -04:00
Mark DePristo ae4d4482ac Parallel combine variants!
-- CombineVariants is now TreeReducible!
-- Integration tests running in parallel all pass except one (will fix) due to incorrect use of db=0 flag on input from old VCF format
2012-08-15 21:13:15 -04:00
Eric Banks 87e41c83c5 In AlleleCount stratification, check to make sure the AC (or MLEAC) is valid (i.e. not higher than number of chromosomes) and throw a User Error if it isn't. Added a test for bad AC. 2012-08-14 15:02:30 -04:00
Eric Banks 8e3774fb0e Fixing behavior of the --regenotype argument in SelectVariants to properly run in GenotypeGivenAlleles mode. Added integration tests to cover recent SV changes. 2012-08-14 14:21:42 -04:00
Eric Banks 34b62fa092 Two changes to SelectVariants: 1) don't add DP INFO annotation if DP wasn't used in the input VCF (it was adding DP=0 previously). 2) If MLEAC or MLEAF is present in the original VCF and the number of samples decreases, remove those annotations from the VC. 2012-08-14 12:54:31 -04:00
Ami Levy Moonshine 6fefdaf428 "update integration tests in CombineVariantsIntegrationTest"
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-08-10 17:00:35 -04:00
Ami Levy Moonshine 4968daf0a5 update integration tests at CombineVariantsIntegrationTest 2012-08-10 16:58:05 -04:00
Eric Banks eca9613356 Adding support of X and = CIGAR operators to the GATK 2012-08-10 14:54:07 -04:00
Mark DePristo cda8d944b7 Bugfixes for BCF with VQSR
-- Old version converted doubles directly from strings.  New version uses VariantContext getAttributeAsDouble() that looks at the values directly to determine how to convert from Object to Double (via Double.valueOf, (Double), or (Double)(Integer)).
-- getAttributeAsDouble() is now smart in converting integers to doubles as needed
-- Removed unnecessary logging info in BCF2Codec
-- Added integration tests to ensure that VQSR works end-to-end with BCF2 using sites version of the file khalid sent to me
-- Added vqsr.bcf_test.snps.unfiltered.bcf file for this integration test
2012-08-07 17:22:39 -04:00
Ryan Poplin 15085bf03e The UnifiedGenotyper now makes use of base insertion and base deletion quality scores if they exist in the reads. 2012-08-07 13:58:22 -04:00
Ryan Poplin 8817fc70d1 Merged bug fix from Stable into Unstable 2012-08-03 10:45:01 -04:00
Ryan Poplin f40d0a0a28 Updating VQSR to work with the MNP and symbolic variants that are coming out of the HaplotypeCaller. Integration tests change because of the MNPs in dbSNP. 2012-08-03 10:44:36 -04:00
Mark DePristo ccac77d888 Bugfix for incorrect allele counting in IndelSummary
-- Previous version would count all alt alleles as present in a sample, even if only 1 were present, because of the way VariantEval subsetted VCs
-- Updated code for subsetting VCs by sample to be clearer about how it handles rederiving alleles
-- Update a few pieces of code to get previous correct behavior
-- Updated a few MD5s as now ref calls at sites in dbSNP are counted as having a comp sites, and therefore show up in known sites when Novelty strat is on (which I think is correct)
-- Walkers that used old subsetting function with true are now using clearer version that does rederive alleles by default
2012-08-01 15:45:12 -04:00
Joel Thibault 2b25df3d53 Add removeProgramRecords argument
* Add unit test for the removeProgramRecords
2012-08-01 15:33:05 -04:00
Eric Banks ab53d73459 Quick fix to user error catching 2012-07-31 15:50:32 -04:00
Eric Banks 10111450aa Fixed AlignmentUtils bug for handling Ns in the CIGAR string. Added a UG integration test that calls a BAM with such reads (provided by a user on GetSatisfaction). 2012-07-31 15:37:22 -04:00
Mark DePristo e00ed8bc5e Cleanup BQSR classes
-- Moved most of BQSR classes (which are used throughout the codebase) to utils.recalibration.  It's better in my opinion to keep commonly used code in utils, and only specialized code in walkers.  As code becomes embedded throughout GATK its should be refactored to live in utils
-- Removed unncessary imports of BQSR in VQSR v3
-- Now ready to refactor QualQuantizer and unit test into a subclass of RecalDatum, refactor unit tests into RecalDatum unit tests, and generalize into hierarchical recal datum that can be used in QualQuantizer and the analysis of adaptive context covariate
-- Update PluginManager to sort the plugins and interfaces.  This allows us to have a deterministic order in which the plugin classes come back, which caused BQSR integration tests to temporarily change because I moved my classes around a bit.
2012-07-31 08:11:03 -04:00
Guillermo del Angel 6c9d3ec155 Remerge after changes to allele construction code. More cleanups/fixes to artificial read pileup provider 2012-07-30 21:32:03 -04:00
Guillermo del Angel 5b9a1af7fe Intermediate fix for pool GL unit test: fix up artificial read pileup provider to give consistent data. b) Increase downsampling in pool integration tests with reference sample, and shorten MT tests so they don't last too long 2012-07-30 09:56:10 -04:00
Eric Banks c4ae9c6cfb With the new Allele representation we can finally handle complex events (because they aren't so complex anymore). One place this manifests itself is with the strict VCF validation (ValidateVariants used to skip these events but doesn't anymore) so I've added a new test with complex events to the VV integration test. 2012-07-29 19:22:02 -04:00
Eric Banks 99b15b2b3a Final checkpoint: all tests pass. Note that there were bugs in the PoolGenotypeLikelihoodsUnitTest that needed fixing and eventually led to my needing to disable one of the tests (with a note for Guillermo to look into it). Also note that while I have moved over the GATK to use the new non-null representation of Alleles, I didn't remove all of the now-superfluous code throughout to do padding checking on merges; we'll need to do this on a subsequent push. 2012-07-29 01:07:59 -04:00
Eric Banks beb7610195 Resolving merge conflicts 2012-07-27 15:52:02 -04:00
Eric Banks 27e7e11ec0 Allele refactoring checkpoint #3: all integration tests except for PoolCaller are passing now. Fixed a couple of bugs from old code that popped up during md5 difference review. Added VariantContextUtils.requiresPaddingBase() method for tools that create alleles to use for determining whether or not to add the ref padding base. One of the HaplotypeCaller tests wasn't passing because of RankSumTest differences, so I added a TODO for Ryan to look into this. 2012-07-27 15:48:40 -04:00
Eric Banks ef335b6213 Several more walkers have been brought up to use the new Allele representation. 2012-07-27 02:14:25 -04:00
Eric Banks baf3e33730 Allele refactoring checkpoint 2: all code finally compiles, AD and STR annotations are fixed, and most of the UG integration tests pass. 2012-07-26 23:27:11 -04:00
Guillermo del Angel 2ae890155c Improvements to indel calling in pool caller: a) Compute per-read likelihoods in reference sample to determine wheter a read is informative or not. b) Fixed bugs in unit tests. c) Fixed padding-related bugs when computing matches/mismatches in ErrorModel, d) Added a couple of more integration tests to increase test coverage, including testing odd ploidy 2012-07-26 13:43:00 -04:00
Mark DePristo 8c418a15da Sorting out HMS error handling (fingers crossed)
-- Check if a traversal error occurred in the last shard
-- Catch ExecutionException from the TreeReducer and throw as our HMS execption
-- ShardTraverser just throws the exception as formatted by the HMS, rather than wrapping it as a RuntimeException itself
-- EngineFeaturesIntegrationTests now uses public exampleFASTA (faster), and does 1000x iterations (slower)
2012-07-25 23:13:12 -04:00
Mark DePristo 9242f63a4d On the way to really sorting out HMS error handling
-- Better error message when a traveral error occurs (a real bug)
-- EngineFeaturesIntegrationTest runs the multi-threaded error testing routines 50x times
-- A bit of cleanup in WalkerTest
2012-07-25 22:11:10 -04:00
Mark DePristo 5671992db3 RMDTrackBuilderUnitTest now uses private/testdata file to avoid filesystem race conditions 2012-07-25 22:05:04 -04:00
Mark DePristo 16947e93f2 Integration test to ensure VariantFiltration makes . -> PASS/FAIL like VQSR
Signed-off-by: Mark DePristo <depristo@broadinstitute.org>
2012-07-25 08:56:39 -04:00
Guillermo del Angel 39f45127f3 Fix md5's broken by recent changes to FisherStrand calculation 2012-07-21 14:41:38 -04:00
Mauricio Carneiro 65f4b67b86 Fixing walker unit test with the new naming convention 2012-07-20 17:50:29 -04:00
Mauricio Carneiro 116885a450 Removed the "Walker" suffix from all walkers that had it.
* Did not touch archived walkers... those can be named whatever.
   * Kept abstract classes that end in Walker untouched (e.g. LocusWalker, ReadWalker, ...)
   * Renamed a few inner classes due to conflict when stripping off Walker from their outer classes: ContigStats, FlagStats and FastaStats.
2012-07-20 17:27:11 -04:00
Eric Banks 5f5edeca63 Reverting move of BQSR tests to public, as per DR's email 2012-07-19 10:02:05 -04:00
Eric Banks d46ccec04e Adding Unit Tests to cover the exception catching for Picard errors: because we are using String matching, we want to ensure that we know if/when the exception text changes underneath us. 2012-07-18 21:48:58 -04:00
Eric Banks 9c1ab1b0c0 Move BQSR integration test and its dependent files into public; previously there was a protected->private dependency. 2012-07-18 21:11:33 -04:00