Commit Graph

9943 Commits (3a64398d07cefa346c5081a79b936fa71e1dda1f)

Author SHA1 Message Date
Eric Banks 3a64398d07 Cleaned up the isGATKLite check 2012-07-17 12:46:16 -04:00
Eric Banks 62c5228048 1) Revert previous change - indel recalibration is turned on by default and users of the Lite version will need to turn it off to avoid a User Error. 2) Implemented the engine.isGATKLite() method. 2012-07-17 12:23:40 -04:00
Eric Banks 40618ac471 A bunch of BQSR changes: 1) by default we do not emit indel quals, but they can be turned on with --enable_indel_quals. 2) We check whether or not we are running in Lite mode (not done yet) and if so and the user is trying to recalibrate indels, we throw a User Error (not supported). 3) Like v1 we now allow the user to set the qual value below which we don't recalibrate (this was the remaining source of differences in the v1 vs. v2 plots). 2012-07-17 10:52:43 -04:00
Eric Banks d5b3a2eabf Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-07-17 00:32:53 -04:00
Eric Banks f657b8bda8 Complete overhaul of the BQSRv2 integration tests. Much more comprehensive. Still need to deal with a few tests that need some modifications before I'm done, but I'll take care of that sometime tomorrow. 2012-07-17 00:32:34 -04:00
Eric Banks a003148d50 Move AnalyzeCovariates over too. 2012-07-16 16:11:56 -04:00
Eric Banks 0a89adbcdb Add utility decorators so that classes can tell you which package source they come from if they want to (suggested by Khalid). Using those decorators, we can easily pull out the BQSR updateDataForPileupElement() method into a standard RecalibrationEngine and an AdvancedRecalibrationEngine and use the protected one (AdvancedRE) if available (otherwise, the public one). 2012-07-16 15:34:50 -04:00
Eric Banks 52baac1e16 Move BQSRv2 into public and v1 into the archive. 2012-07-16 14:23:38 -04:00
Joel Thibault 6c6a324583 Loosen a restriction on isOriginalRead()
* no longer needs to satisfy ReadAndIntervalOverlap.OVERLAP_CONTAINED
2012-07-16 14:07:10 -04:00
Khalid Shakir 07822d6c0f Fixed input annotations for master/test files on DiffObjectsWalker. 2012-07-16 13:33:11 -04:00
Eric Banks 2a830939df Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-07-14 23:49:59 -04:00
Eric Banks f29cadd7e2 By default, don't quantize quals in BQSRv2 2012-07-14 23:49:48 -04:00
Eric Banks 75543a3f22 ReadClipper.clipRead's claim that it doesn't modify the original read was false. Ultimately, GATKSAMRecord.clone (as documented) creates a soft copy of the read - so modifying e.g. the bases of the cloned read means that you modify the bases of the original read too. Because of this, when the BQSRv2 Context covariate was writing Ns over the low quality tails of the reads they got propagated out to the output BAM file (very bad). I've updated the ReadClipper docs and cleaned up the code (no reason to use a clone of the read anymore given that we are already modifying the original). For now, the simplest thing is to have the Context covariate store the original bases, overwrite low quality Ns, compute covariates, and rewrite the original bases; we can update later if needed. 2012-07-13 18:50:27 -04:00
Ryan Poplin 44c532531b VC priority list needs to be updated after removing unassembled haplotypes. 2012-07-13 16:50:08 -04:00
Ryan Poplin 443f02ffc2 Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-07-13 16:09:24 -04:00
Ryan Poplin c978e37e5c updating HC integration tests for all these changes 2012-07-13 16:09:11 -04:00
Khalid Shakir 6dfcc486e8 In ApplyRecalibration marking filter as PASS instead of '.' when the site passes by calling .passFilters(). 2012-07-13 15:40:56 -04:00
Ryan Poplin d553905d79 Don't try to genotype both an unassembled symbolic allele and a fully assembled insertion if they both start at the same location. Bug fix for the case of multiple indels that when all combined together make an MNP. 2012-07-13 15:22:22 -04:00
Ryan Poplin 3ab5e2c64b Don't try to combine together unassembled, symbolic alleles. 2012-07-12 21:20:14 -04:00
Ryan Poplin d70bb59182 HaplotypeCaller now calls insertion events that aren't fully assembled as symbolic alleles. 2012-07-10 14:22:23 -06:00
Ryan Poplin 75e5a50b8a Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-07-10 12:53:00 -06:00
Guillermo del Angel 279dff9f81 Bug fix when specifying a JEXL expression for a field that doesn't exist: we should treat the whole expression as false, but we were rethrowing the JEXL exception in this case. Added integration test to cover this in SelectVariants 2012-07-10 13:59:00 -04:00
Eric Banks d7bf74fb7e Updating default value for -mindel to the one used by Khalid in the pipeline and me in my tests. 2012-07-10 02:04:26 -04:00
Mauricio Carneiro 7eb45b4038 Fixed BQSR IntegrationTests
* BinaryTag covariate is Experimental, not Standard (this was breaking integration tests)
   * New parameter in the Recalibration report requires new MD5 for one of the integration tests.
2012-07-09 13:55:12 -04:00
Mauricio Carneiro 6c17c50fa2 Updates to ReduceReads
* Added optional parameter to not hard clip on the interval border
   * Made not clipping the default behavior (hence integration tests changed)
   * Updated integration tests.
2012-07-09 13:46:51 -04:00
Eric Banks dd0c47ab7e Don't cast to a specific walker type since any walker can use the VA engine 2012-07-09 10:25:58 -04:00
Mark DePristo 5b0ade67c8 Updates to VCF processing for better BCF processing
-- getMetaData now split into getMetaDataInSortedOrder() [old functionality] and getMetaDataInOriginalOrder() [according to the header order].  Important as BCF uses the order of elements in the header in the offsets to keys, and we were automatically sorting the BCF2 header which is out of order in samtools and the whole system was going crazy
-- Updating GATK code to use the appropriate header function (this is why so many files have changed)
-- BCF2 code was busted in not differentiating PASS from . from FILTER in VC (tests coming that will actually stress this)
-- Bugfix for adding contig lines to BCF2 header dictionary
-- VCFHeader metaData no longer sorted internally.  The system now maintains the data in header order, and only sorts output as requested in API
-- VCFWriter and BCF2Writer now explictly sort their header lines
-- Don't allow filters to be added that are PASS in the contract
2012-07-08 15:44:33 -07:00
Mark DePristo 63f5262e45 mergeInfoWithMaxAC is no longer hidden in CombineVariants 2012-07-08 15:44:32 -07:00
Mark DePristo 66aee613e2 Bugfix for set key in mergeInfoWithMaxAC.
-- Previous version was always setting set=source of info with highest AC.  Should actually have been set to the set annotation value itself.
2012-07-08 15:44:32 -07:00
Mark DePristo 91f0ed8059 Fixed nasty Rscript typo in VariantRecalibrator when compactPDF is available 2012-07-08 15:44:32 -07:00
Mark DePristo 87b090c362 Update VariantRecalibator error message to use -resource not old -B syntax 2012-07-08 15:44:31 -07:00
Mark DePristo 08c57db784 Commented out static instantiation of broad-specific capability preventing GATKdocs to build
-- Joel, you need to fix this
2012-07-08 15:44:31 -07:00
Mauricio Carneiro 125e6c1a47 added BinaryTagCovariate for ancient dna analysis 2012-07-06 15:03:20 -04:00
Mauricio Carneiro e93b025b39 Fixing unit test
with the new clipping behavior for weird cigars, we no longer can assert the final number of bases in the unit test, so I'm taking this bit off the unit test.
2012-07-06 12:08:09 -04:00
Mauricio Carneiro f603d4c48c Fixing PairHMMIndelErrorModel boundary issue
When checking the limits of a read to clip, it wasn't considering reads that may already been clipped before.
2012-07-06 11:48:04 -04:00
Ryan Poplin 83e33d14c5 Bug fix in the merging of events into complex substitutions. 2012-07-05 16:00:18 -04:00
Ryan Poplin c2e1dc63ff Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-07-05 10:37:40 -04:00
Ryan Poplin 729c5e878d Bug fix in the merging of events into complex substitutions. 2012-07-05 10:37:04 -04:00
Eric Banks dd571d9aa0 Added a --no_indel_quals argument that when used with -BQSR inhibits the writing of base insertion and base deletion quality tags. 2012-07-04 01:22:20 -04:00
Eric Banks 33306d2e20 Changing the logic of the -standard argument; the way it stands currently one can never turn off the cycle or context covariates. Now they are on by default and users must opt out of them to turn them off. 2012-07-04 00:21:21 -04:00
Eric Banks 7d30558e6f Only 'pad' the cycle covariate for indels, not substitutions 2012-07-03 23:47:01 -04:00
Mauricio Carneiro 17efbbf8b1 Fixed ReadClipperUnitTest
The behavior of the clipping on weird cigar strings such as 1I1S1H and 9S56H has changed, and the test has to change accordingly.
2012-07-03 16:38:51 -04:00
Joel Thibault f09141bd2f Add a Mongo tester which outputs BCFs 2012-07-03 15:15:33 -04:00
Eric Banks 22f1afddaa Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-07-03 14:55:59 -04:00
Eric Banks 617eebd204 More misc cleanup 2012-07-03 14:55:37 -04:00
Eric Banks 344c3aeb1d Cleanup from previous commit 2012-07-03 14:42:44 -04:00
Ryan Poplin 9e8e78de15 Adding the model name to the VQSR filter lines so that they don't get clobbered with consecutive VQSR runs for SNPs and then indels. 2012-07-03 14:30:37 -04:00
Eric Banks 0b37d44b0d Optimizations for the RecalDatum to make BQSR (Count Covariates) much faster. Needs some cleanup. 2012-07-03 13:05:11 -04:00
Eric Banks 031322ff00 Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-07-03 00:12:59 -04:00
Eric Banks a4670113bd Refactored/renamed the nested integer array; cleaned up code a bit. 2012-07-03 00:12:33 -04:00