Eric Banks
e4db8dde91
Enabled a whole other bunch of integration tests for BQSRv2. While I was there I also changed the default context size for indels to 3 (from 8) since that's what works best in the current implementation (as suggested by Ryan). At this point, all of the new core tools (ReduceReads, BQSRv2, HaplotypeCaller, UG extensions) have been moved over to protected and should be stable. Looks like we are pretty much ready for GATK 2.0!
2012-07-17 23:36:43 -04:00
Eric Banks
a8d08ea18d
As a user pointed out, it is not valid for a GenomeLoc to have a start or stop equal to 0.
2012-07-17 22:18:43 -04:00
Guillermo del Angel
29273abab7
Merge branch 'master' of ssh://gsa4.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-07-17 16:58:12 -04:00
Guillermo del Angel
731bbba2e6
Bug fixes for integration test, use correct new UG syntax
2012-07-17 16:57:59 -04:00
Eric Banks
33be41ecf5
Cleaning up integration test
2012-07-17 16:06:04 -04:00
Eric Banks
8dbc9cb29c
Add the ability to emit the original quals in the OQ tag
2012-07-17 15:52:56 -04:00
Guillermo del Angel
40b8c7172c
Pool Caller refactoring in preparation of GATK 2.0: a) PoolCallerUnifiedArgumentCollection disappeared, and arguments moved to UnifiedArgumentCollection. b) PoolCallerWalker is no longer needed and redundant, all functionality subsumed by UG. UG now checks if GATK is lite - if so, don't allow ploidy > 2. c) Moved pool classes from private to protected. d) Changed the way to specify ploidy. Instead of specifying samples per pool and having ploidy = 2*samplesPerPool, have user specify ploidy directly, which is cleaner. Update tests accordingly. We can now call triploid seedless grape genotypes correctly in theory. e) Renamed argument -reference to -reference_sample_calls since the former is ambiguous and it's not clear what it refers to.
2012-07-17 15:27:04 -04:00
Laurent Francioli
68d0e4dd6d
- Multi-allelic sites are now correctly ignored - Reporting of mendelian violations enhanced - Corrected TP overflow by caping it to Bye.MAX_VALUE
...
-Updated integrationtests to reflect changes in MVF file output
Signed-off-by: Eric Banks <ebanks@broadinstitute.org>
2012-07-17 15:21:10 -04:00
Eric Banks
b0d99fd10d
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-07-17 15:12:28 -04:00
Eric Banks
305db8c0d1
Total rewrite of the isGATKLite() functionality with help of Khalid/David. PluginManager was not working for us.
2012-07-17 15:11:03 -04:00
Ryan Poplin
6efbcd99f1
HaplotypeCaller is now an AnnotatorCompatibleWalker with all the rights and privileges pertaining thereto. Enabling the ClippingRankSumTest after showing it was useful for 1000 Genomes calling.
2012-07-17 14:38:36 -04:00
Eric Banks
110886e8b9
Oops, got the logic wrong.
2012-07-17 13:37:11 -04:00
Eric Banks
a963b37424
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-07-17 13:15:37 -04:00
Eric Banks
3a64398d07
Cleaned up the isGATKLite check
2012-07-17 12:46:16 -04:00
Eric Banks
62c5228048
1) Revert previous change - indel recalibration is turned on by default and users of the Lite version will need to turn it off to avoid a User Error. 2) Implemented the engine.isGATKLite() method.
2012-07-17 12:23:40 -04:00
Chris Saunders
1913d1bbd0
Put RunReport S3 upload on timeout thread
...
Move the RunReport S3 upload process onto a separate thread with a timeout allowing the parent to continue.
Signed-off-by: Khalid Shakir <kshakir@broadinstitute.org>
2012-07-17 12:19:39 -04:00
Eric Banks
40618ac471
A bunch of BQSR changes: 1) by default we do not emit indel quals, but they can be turned on with --enable_indel_quals. 2) We check whether or not we are running in Lite mode (not done yet) and if so and the user is trying to recalibrate indels, we throw a User Error (not supported). 3) Like v1 we now allow the user to set the qual value below which we don't recalibrate (this was the remaining source of differences in the v1 vs. v2 plots).
2012-07-17 10:52:43 -04:00
Eric Banks
d5b3a2eabf
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-07-17 00:32:53 -04:00
Eric Banks
f657b8bda8
Complete overhaul of the BQSRv2 integration tests. Much more comprehensive. Still need to deal with a few tests that need some modifications before I'm done, but I'll take care of that sometime tomorrow.
2012-07-17 00:32:34 -04:00
Eric Banks
a003148d50
Move AnalyzeCovariates over too.
2012-07-16 16:11:56 -04:00
Eric Banks
0a89adbcdb
Add utility decorators so that classes can tell you which package source they come from if they want to (suggested by Khalid). Using those decorators, we can easily pull out the BQSR updateDataForPileupElement() method into a standard RecalibrationEngine and an AdvancedRecalibrationEngine and use the protected one (AdvancedRE) if available (otherwise, the public one).
2012-07-16 15:34:50 -04:00
Eric Banks
52baac1e16
Move BQSRv2 into public and v1 into the archive.
2012-07-16 14:23:38 -04:00
Khalid Shakir
07822d6c0f
Fixed input annotations for master/test files on DiffObjectsWalker.
2012-07-16 13:33:11 -04:00
Eric Banks
2a830939df
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-07-14 23:49:59 -04:00
Eric Banks
f29cadd7e2
By default, don't quantize quals in BQSRv2
2012-07-14 23:49:48 -04:00
Eric Banks
75543a3f22
ReadClipper.clipRead's claim that it doesn't modify the original read was false. Ultimately, GATKSAMRecord.clone (as documented) creates a soft copy of the read - so modifying e.g. the bases of the cloned read means that you modify the bases of the original read too. Because of this, when the BQSRv2 Context covariate was writing Ns over the low quality tails of the reads they got propagated out to the output BAM file (very bad). I've updated the ReadClipper docs and cleaned up the code (no reason to use a clone of the read anymore given that we are already modifying the original). For now, the simplest thing is to have the Context covariate store the original bases, overwrite low quality Ns, compute covariates, and rewrite the original bases; we can update later if needed.
2012-07-13 18:50:27 -04:00
Ryan Poplin
443f02ffc2
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-07-13 16:09:24 -04:00
Khalid Shakir
6dfcc486e8
In ApplyRecalibration marking filter as PASS instead of '.' when the site passes by calling .passFilters().
2012-07-13 15:40:56 -04:00
Ryan Poplin
d70bb59182
HaplotypeCaller now calls insertion events that aren't fully assembled as symbolic alleles.
2012-07-10 14:22:23 -06:00
Guillermo del Angel
279dff9f81
Bug fix when specifying a JEXL expression for a field that doesn't exist: we should treat the whole expression as false, but we were rethrowing the JEXL exception in this case. Added integration test to cover this in SelectVariants
2012-07-10 13:59:00 -04:00
Mauricio Carneiro
7eb45b4038
Fixed BQSR IntegrationTests
...
* BinaryTag covariate is Experimental, not Standard (this was breaking integration tests)
* New parameter in the Recalibration report requires new MD5 for one of the integration tests.
2012-07-09 13:55:12 -04:00
Eric Banks
dd0c47ab7e
Don't cast to a specific walker type since any walker can use the VA engine
2012-07-09 10:25:58 -04:00
Mark DePristo
5b0ade67c8
Updates to VCF processing for better BCF processing
...
-- getMetaData now split into getMetaDataInSortedOrder() [old functionality] and getMetaDataInOriginalOrder() [according to the header order]. Important as BCF uses the order of elements in the header in the offsets to keys, and we were automatically sorting the BCF2 header which is out of order in samtools and the whole system was going crazy
-- Updating GATK code to use the appropriate header function (this is why so many files have changed)
-- BCF2 code was busted in not differentiating PASS from . from FILTER in VC (tests coming that will actually stress this)
-- Bugfix for adding contig lines to BCF2 header dictionary
-- VCFHeader metaData no longer sorted internally. The system now maintains the data in header order, and only sorts output as requested in API
-- VCFWriter and BCF2Writer now explictly sort their header lines
-- Don't allow filters to be added that are PASS in the contract
2012-07-08 15:44:33 -07:00
Mark DePristo
63f5262e45
mergeInfoWithMaxAC is no longer hidden in CombineVariants
2012-07-08 15:44:32 -07:00
Mark DePristo
66aee613e2
Bugfix for set key in mergeInfoWithMaxAC.
...
-- Previous version was always setting set=source of info with highest AC. Should actually have been set to the set annotation value itself.
2012-07-08 15:44:32 -07:00
Mark DePristo
91f0ed8059
Fixed nasty Rscript typo in VariantRecalibrator when compactPDF is available
2012-07-08 15:44:32 -07:00
Mark DePristo
87b090c362
Update VariantRecalibator error message to use -resource not old -B syntax
2012-07-08 15:44:31 -07:00
Mauricio Carneiro
125e6c1a47
added BinaryTagCovariate for ancient dna analysis
2012-07-06 15:03:20 -04:00
Mauricio Carneiro
e93b025b39
Fixing unit test
...
with the new clipping behavior for weird cigars, we no longer can assert the final number of bases in the unit test, so I'm taking this bit off the unit test.
2012-07-06 12:08:09 -04:00
Mauricio Carneiro
f603d4c48c
Fixing PairHMMIndelErrorModel boundary issue
...
When checking the limits of a read to clip, it wasn't considering reads that may already been clipped before.
2012-07-06 11:48:04 -04:00
Eric Banks
dd571d9aa0
Added a --no_indel_quals argument that when used with -BQSR inhibits the writing of base insertion and base deletion quality tags.
2012-07-04 01:22:20 -04:00
Eric Banks
33306d2e20
Changing the logic of the -standard argument; the way it stands currently one can never turn off the cycle or context covariates. Now they are on by default and users must opt out of them to turn them off.
2012-07-04 00:21:21 -04:00
Eric Banks
7d30558e6f
Only 'pad' the cycle covariate for indels, not substitutions
2012-07-03 23:47:01 -04:00
Mauricio Carneiro
17efbbf8b1
Fixed ReadClipperUnitTest
...
The behavior of the clipping on weird cigar strings such as 1I1S1H and 9S56H has changed, and the test has to change accordingly.
2012-07-03 16:38:51 -04:00
Eric Banks
22f1afddaa
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-07-03 14:55:59 -04:00
Eric Banks
617eebd204
More misc cleanup
2012-07-03 14:55:37 -04:00
Eric Banks
344c3aeb1d
Cleanup from previous commit
2012-07-03 14:42:44 -04:00
Ryan Poplin
9e8e78de15
Adding the model name to the VQSR filter lines so that they don't get clobbered with consecutive VQSR runs for SNPs and then indels.
2012-07-03 14:30:37 -04:00
Eric Banks
0b37d44b0d
Optimizations for the RecalDatum to make BQSR (Count Covariates) much faster. Needs some cleanup.
2012-07-03 13:05:11 -04:00
Eric Banks
031322ff00
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-07-03 00:12:59 -04:00