Christopher Hartl
3ee46cced2
Merge branch 'master' of ssh://gsa1.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-07-19 21:25:40 -04:00
Christopher Hartl
af383c30b5
Ensure that the gene summary has a header line
2012-07-19 21:24:04 -04:00
Mark DePristo
2ca5fc62a2
Support for MISSING BCF2 type
...
-- Heng wants to use 0x0? to represent any missing type value, which in our implementation was invalid. Updated our codebase to support this construct. Heng said he'll update the BCF2 quick reference.
-- Enabled integration test reading Heng's ex2.bcf file
-- GATK now only warns in the case where the END info field isn't the same (or +1 due to padding) as the getEnd() function as determined by the GATK. Turns out there's a single record in the 1000G SV call set that doesn't have the right length
-- VariantContextTestProvider now tests that X = Y where X -> writing -> reading -> writing -> reading = Y for a variety of variant context inputs X
-- Added integration test reading 1000G SV chr1 calls (from Chris)
2012-07-19 16:14:26 -04:00
Khalid Shakir
50365d01c4
Updated HSPTest expected values due to variant eval changes in earlier commit.
2012-07-19 15:24:53 -04:00
Ryan Poplin
1592841c93
New function for merging nearby events into MNPs or complex substitutions. Added extensive unit tests.
2012-07-19 13:16:33 -04:00
Mark DePristo
a4884f82cd
Final final version of GATK beta license
2012-07-19 10:39:34 -04:00
Guillermo del Angel
c16f9f2f15
a) Use new method to check for GATK Like, b) minor improvements to indel pool caller (more to come): brain-dead, quick way to limit number of alt alleles to genotype. We can't process too many alt alleles because of the combinatorial explosion of GL values with high ploidy, and some STR validation targets had up to 12 alt alleles, resulting of GL vectors of > 1e8 elements. Can't use pileup elements since typically not many alleles will be in one pileup, and different alleles will appear in different samples, TBD a nicer solution. c) Commit to posterity scala script for large scale validation calling, still work in progress
2012-07-19 10:24:08 -04:00
Eric Banks
5f5edeca63
Reverting move of BQSR tests to public, as per DR's email
2012-07-19 10:02:05 -04:00
Eric Banks
e370030e6c
As requested by Mark, I've broken out the code to pull out the protected subclass when available (and otherwise use the public version) into the GATKLiteUtils class. People should use this code instead of reimplementing all of the java reflection on their own.
2012-07-18 22:44:37 -04:00
Eric Banks
d46ccec04e
Adding Unit Tests to cover the exception catching for Picard errors: because we are using String matching, we want to ensure that we know if/when the exception text changes underneath us.
2012-07-18 21:48:58 -04:00
Eric Banks
9c1ab1b0c0
Move BQSR integration test and its dependent files into public; previously there was a protected->private dependency.
2012-07-18 21:11:33 -04:00
Joel Thibault
0b87b0ead8
Use lazy evaluation to initialize MongoDB
...
* Removes a Broad-specific dependency in GATKDocs
2012-07-18 16:49:58 -04:00
Mark DePristo
994c5c31c1
Enabling VariantEval integration tests for ValidationReport
2012-07-18 16:07:47 -04:00
Mark DePristo
74e153ff4a
FisherStrand now uses RankSumTest isUsableBase to decide if a read should be included in testing
...
-- Previously used hardcoded MAPQ > 20 && QUAL > 20 but now uses isUsableBase
-- Updating MD5s as appropriate
2012-07-18 16:07:47 -04:00
Mark DePristo
30f441d385
Finalizing GATK2 license text
2012-07-18 16:07:47 -04:00
Mark DePristo
dede3a30e9
Improvements to the validation report of VariantEval
...
-- If eval has genotypes and comp has genotypes, then subset the genotypes of comp down to the samples being evaluated when considering TP, FP, FN, TN status. This is important in the case where you want to use this to assess, for example, the quality of calls on NA12878 but you have a CEU trio comp VCF. The previous version was counting sites polymorphic in mom against the calls in NA12878.
-- Added testdata VCF and integrationtests to ensure this behavior continues in the future
-- TODO: actually run integration tests when I have an internet connection
2012-07-18 16:07:47 -04:00
Mark DePristo
559a4826be
Improvements to the validation report of VariantEval
...
-- If eval has genotypes and comp has genotypes, then subset the genotypes of comp down to the samples being evaluated when considering TP, FP, FN, TN status. This is important in the case where you want to use this to assess, for example, the quality of calls on NA12878 but you have a CEU trio comp VCF. The previous version was counting sites polymorphic in mom against the calls in NA12878.
-- Added testdata VCF and integrationtests to ensure this behavior continues in the future
2012-07-18 16:07:46 -04:00
Mark DePristo
dc292c0317
FisherStrand now includes all reads and bases, regardless of mapping quality and base quality, just like other annotations
...
-- This actually proved to be a problem with Ion Torrent data where the base quality can be quite low, and so we need to include Q15 bases for calling effectively.
2012-07-18 16:07:46 -04:00
Eric Banks
bd799fc989
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-07-18 16:01:20 -04:00
Eric Banks
2c0f073ab1
Make -qq arg hidden for now since it's still very experimental
2012-07-18 15:43:25 -04:00
Eric Banks
b46c85e8b4
More bad BAM file catching
2012-07-18 15:26:31 -04:00
Eric Banks
659eee13a6
Handle NPE generated in UG when non-standard reference bases are present in the fasta
2012-07-18 15:16:27 -04:00
Eric Banks
9af2cfe283
Catch underlying file system problems that get masked as Tribble index errors. There's also a quick patch to the HMS that isn't really the ultimate fix needed; Mark and I will review at a later point.
2012-07-18 15:11:38 -04:00
Eric Banks
4c730542f0
Handle RuntimeExceptions thrown by Picard that are really User Errors. I will add unit tests for these as best I can later.
2012-07-18 13:56:35 -04:00
Eric Banks
ae08d35138
Catch 'too many open files' errors that show up when trying to read the bam index. All that needs to be done is to flesh out the original error message (because it will get caught later and rethrown correctly).
2012-07-18 12:57:34 -04:00
David Roazen
fd0c2d269e
Change committest target to allow inheritance of properties
...
Needed for a fix I'm working on for the Bamboo release plan
2012-07-18 12:45:51 -04:00
Eric Banks
f2fe59a9d4
Wow, there are a ton of errors captured having to do with being unable to merge the temp Tribble output. I'm expanding the error message a bit to help see if we can do anything going forward.
2012-07-18 12:31:59 -04:00
Eric Banks
e4db8dde91
Enabled a whole other bunch of integration tests for BQSRv2. While I was there I also changed the default context size for indels to 3 (from 8) since that's what works best in the current implementation (as suggested by Ryan). At this point, all of the new core tools (ReduceReads, BQSRv2, HaplotypeCaller, UG extensions) have been moved over to protected and should be stable. Looks like we are pretty much ready for GATK 2.0!
2012-07-17 23:36:43 -04:00
Eric Banks
a8d08ea18d
As a user pointed out, it is not valid for a GenomeLoc to have a start or stop equal to 0.
2012-07-17 22:18:43 -04:00
Eric Banks
7e2c830636
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-07-17 21:54:56 -04:00
Eric Banks
a9f27e5b02
Updated md5s for DPP test
2012-07-17 21:54:46 -04:00
Guillermo del Angel
29273abab7
Merge branch 'master' of ssh://gsa4.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-07-17 16:58:12 -04:00
Guillermo del Angel
731bbba2e6
Bug fixes for integration test, use correct new UG syntax
2012-07-17 16:57:59 -04:00
Eric Banks
33be41ecf5
Cleaning up integration test
2012-07-17 16:06:04 -04:00
Eric Banks
8dbc9cb29c
Add the ability to emit the original quals in the OQ tag
2012-07-17 15:52:56 -04:00
Eric Banks
4e3780fd4f
Updated md5 for PBPP
2012-07-17 15:47:43 -04:00
Guillermo del Angel
40b8c7172c
Pool Caller refactoring in preparation of GATK 2.0: a) PoolCallerUnifiedArgumentCollection disappeared, and arguments moved to UnifiedArgumentCollection. b) PoolCallerWalker is no longer needed and redundant, all functionality subsumed by UG. UG now checks if GATK is lite - if so, don't allow ploidy > 2. c) Moved pool classes from private to protected. d) Changed the way to specify ploidy. Instead of specifying samples per pool and having ploidy = 2*samplesPerPool, have user specify ploidy directly, which is cleaner. Update tests accordingly. We can now call triploid seedless grape genotypes correctly in theory. e) Renamed argument -reference to -reference_sample_calls since the former is ambiguous and it's not clear what it refers to.
2012-07-17 15:27:04 -04:00
Laurent Francioli
68d0e4dd6d
- Multi-allelic sites are now correctly ignored - Reporting of mendelian violations enhanced - Corrected TP overflow by caping it to Bye.MAX_VALUE
...
-Updated integrationtests to reflect changes in MVF file output
Signed-off-by: Eric Banks <ebanks@broadinstitute.org>
2012-07-17 15:21:10 -04:00
Eric Banks
863eb5b5c0
Use Context not Dinuc covariate
2012-07-17 15:18:11 -04:00
Eric Banks
b0d99fd10d
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-07-17 15:12:28 -04:00
Eric Banks
305db8c0d1
Total rewrite of the isGATKLite() functionality with help of Khalid/David. PluginManager was not working for us.
2012-07-17 15:11:03 -04:00
Ryan Poplin
bf2d5efe4d
Moving HaplotypeCaller integration and unit tests over to protected as well.
2012-07-17 14:51:26 -04:00
Ryan Poplin
c55934043e
Moving HaplotypeCaller from private to protected
2012-07-17 14:41:19 -04:00
Ryan Poplin
6efbcd99f1
HaplotypeCaller is now an AnnotatorCompatibleWalker with all the rights and privileges pertaining thereto. Enabling the ClippingRankSumTest after showing it was useful for 1000 Genomes calling.
2012-07-17 14:38:36 -04:00
Eric Banks
110886e8b9
Oops, got the logic wrong.
2012-07-17 13:37:11 -04:00
David Roazen
836f882c30
Forgot to escape fallback text in email script; fixed TERRIBLE sed-related bug
2012-07-17 13:18:08 -04:00
Eric Banks
a963b37424
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-07-17 13:15:37 -04:00
Eric Banks
17d627b86d
Update the DPP and PBPP to use the BQSRv2 walkers
2012-07-17 13:15:32 -04:00
Eric Banks
3a64398d07
Cleaned up the isGATKLite check
2012-07-17 12:46:16 -04:00
Eric Banks
62c5228048
1) Revert previous change - indel recalibration is turned on by default and users of the Lite version will need to turn it off to avoid a User Error. 2) Implemented the engine.isGATKLite() method.
2012-07-17 12:23:40 -04:00