Commit Graph

10888 Commits (9c63cee9fcdb69a7a8e8d77a771ddb2afa18f7cd)

Author SHA1 Message Date
Mark DePristo 9c63cee9fc Moving pnrm to UnifiedArgumentCollection so it's available with the HaplotypeCaller 2012-10-21 12:42:31 -04:00
Guillermo del Angel e9b7324dc1 Merge branch 'master' of ssh://gsa3/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-10-21 12:38:49 -04:00
Guillermo del Angel 67b9e7319e Fix for integration tests: new criterion in AF exact calculation model to trim alleles based on likelihoods does produce better results and resulting alleles changed in 2 sites at integration tests (and all subsequent sites after this had minor annotation differences due to RankSum dithering) 2012-10-21 12:38:33 -04:00
Eric Banks 0616b98551 Not sure why we were setting the UAC variables instead of the simpleUAC ones when that's what we wanted. 2012-10-21 08:26:26 -04:00
Eric Banks d44d5b8275 Fix RawHapMapCodec so that it can build indexes. Minor fixes to VCF codec. 2012-10-21 01:29:59 -04:00
Eric Banks 841a906f21 Adding a hidden (for now) argument to UG (and HC) that tells the caller that the incoming samples are contaminated by N% and to fix it by aggressively down-sampling all alleles. This actually works. Yes, you read that right: given that we know what N is, we can make good calls on bams that have N% contamination. Only hooked up for SNPS right now. No tests added yet. 2012-10-20 23:31:56 -04:00
Eric Banks 2c624f76c8 Refactoring the Unified (and Standard) Argument Collections because it was really ugly that the subclass had to do all the cloning for the super class. The clone() method is really not recommended best practice in Java anyways, so I changed it so that we use standard overloaded constructors. Confirmed that the Haplotype Caller --help docs do not include UG-specific arguments. 2012-10-20 20:35:54 -04:00
Ryan Poplin a647f1e076 Refactoring the PairHMM util class to allow for multiple implementations which can be specified by the callers via an enum argument. Adding an optimized PairHMM implementation which caches per-read calculations as well as a logless implementation which drastically reduces the runtime of the HMM while also increasing the precision of the result. In the HaplotypeCaller we now lexicographically sort the haplotypes to take maximal benefit of the haplotype offset optimization which only recalculates the HMM matrices after the first differing base in the haplotype. Many thanks to Mauricio for all the initial groundwork for these optimizations. The change to the one HC integration test is in the fourth decimal of HaplotypeScore. 2012-10-20 16:38:18 -04:00
David Roazen 25e7dcc46f Performance tests: only need 1 iteration now that we're no longer running on the farm 2012-10-19 22:11:59 -04:00
Joel Thibault 45f64425a3 Update read metrics per shard rather than locus 2012-10-19 15:29:01 -04:00
Joel Thibault 637e0cf151 CountReads does not permit the use of output files 2012-10-19 15:29:01 -04:00
Joel Thibault a5333006bb Mark @Output as required 2012-10-19 15:29:01 -04:00
Yossi Farjoun 6557b7e364 Qscript for validating the pooledCaller against a single validation set 2012-10-19 14:39:57 -04:00
Khalid Shakir 2ef456d51a Added explicit @ClassType annotations to @Argument for Option[Int] or Option[Double] since scala seems to change the reflected type to Option[Object] on some systems.
Changed ReflectionUtils.getGenericTypes' order of looking for @ClassType since the primitive generic wasn't completely erased, only changed to Object which is incorrect.
More fixes to @Arguments labeled as java.io.File via incorrect @Input annotation.
Put in a default undocumented implementation of @Argument doc() to match the one added to @Input.
2012-10-19 13:20:29 -04:00
Eric Banks 4622896312 Oops, killed contracts 2012-10-19 13:04:05 -04:00
Eric Banks 9c088fe3fe Actually a better implementation of GATKSAMRecord.getSoftStart(). Last commit was all wrong. Oops. 2012-10-19 12:41:24 -04:00
Eric Banks f7bd4998fc No need for dummy GLs 2012-10-19 12:13:59 -04:00
Eric Banks f08e5a44da Better implementation of GATKSAMRecord.getSoftStart() 2012-10-19 12:11:18 -04:00
Eric Banks deca564aef Merge branch 'master' of ssh://gsa2/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-10-19 12:01:49 -04:00
Eric Banks d3cf37dfaf Bug fix for general ploidy model: when choosing the most likely alternate allele(s), you need to weight the likelihood mass by the ploidy of the specific alleles (otherwise all alt alleles will have the same probability). This fixes Yossi's issue with pooled validation calling. This may brek integration tests, but I will leave that to GdA to handle. 2012-10-19 12:01:45 -04:00
Eric Banks 27d8d3f51e RR optimization: don't recalculate the entire bitset of variant sites for every read added to the sliding window. Instead, reuse as much of the previously calculated bitset as you can (basically from the window start until the start of the new read minus the context size). In some awfully performing regions this cuts down the runtime in half, although in others this doesn't seem to help much (so clearly something else is going on). Note that I still need to fix one last bug here, but it's almost done. 2012-10-19 11:59:34 -04:00
Khalid Shakir 403654d40a Fixed null checkes in ArgumentTypeDescriptor due to ArgumentMatchValue updates.
Fixed @Arguments such as scatter count that were labeled as java.io.File via incorrect @Input annotation.
2012-10-18 16:57:15 -04:00
Guillermo del Angel a4184716d8 Merge branch 'master' of ssh://gsa3/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-10-18 15:42:31 -04:00
Guillermo del Angel 3db38c5a93 Bug fix: inbreeding coeff shouldn't be computed in ref-only sites 2012-10-18 15:42:14 -04:00
Ami Levy Moonshine a0381f15af Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-10-18 14:49:34 -04:00
Ami Levy Moonshine f665c6ceb9 minor chnage (add 129 to the dbSNP name 2012-10-18 14:48:37 -04:00
Ryan Poplin b4e69239dd In order to be considered an informative read in the PerReadAlleleLikelihoodMap it has to be informative compared to all other alleles not just the worst allele. Also, fixing a bug when there is only one allele in the map. 2012-10-18 14:31:15 -04:00
Mauricio Carneiro 3504f71b6b Fixing a null pointer exception bug for DEV-10 2012-10-18 13:58:38 -04:00
Ami Levy Moonshine 262c84a459 (1) Add RR step to the general calling script. \n (2) add more modularity to the script 2012-10-18 10:47:17 -04:00
Mark DePristo d3fc797cfe SelectVariants is actually *NOT* NanoSchedulable 2012-10-18 10:42:20 -04:00
Mark DePristo f20fa9d082 SelectVariants is actually NanoSchedulable 2012-10-18 10:27:05 -04:00
Mark DePristo 97abb98c0b Bugfix for bad nt / nct argument detection in MicroScheduler 2012-10-18 10:27:05 -04:00
Eric Banks 54f698422c Better implementation for getSoftEnd() in GATKSAMRecord 2012-10-18 09:01:51 -04:00
Ami Levy Moonshine acc0fb2f7a Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-10-17 22:16:02 -04:00
Eric Banks 20ffbcc86e RR optimization: profiling was showing that the BaseCounts class was a major bottleneck because the underlying implementation was a HashMap. Given that the map index was an indexable Enum anyways, it makes a lot more sense to implement as a native array. Knocks 30% off the runtime in bad regions. 2012-10-17 21:44:53 -04:00
Mauricio Carneiro b57df6cac8 Bringing CMI changes into the main GATK repo.
Merge remote-tracking branch 'cmi/master'
2012-10-17 15:23:19 -04:00
Mauricio Carneiro 4dea20b9b5 Release of CMI-0.1.0 2012-10-17 14:59:36 -04:00
Mark DePristo 8288c30e36 Use buffered output for ExactCallLogger 2012-10-17 14:15:11 -04:00
Mark DePristo fa93681f51 Scalability test for EXACT models 2012-10-17 14:15:11 -04:00
Mark DePristo c9e7a947c2 Improve interface of ExactCallLogger, use it to have a more informative AFCalcPerformanceTest 2012-10-17 14:15:11 -04:00
David Roazen d6be657966 BQSR profiling: execute multiple operations per thread so that threading overhead doesn't dominate 2012-10-17 13:25:33 -04:00
kshakir 0196dbeaca Added more logging to push/pull of RemoteFiles. 2012-10-17 09:52:17 -04:00
Eric Banks 33df1afe0e More BaseCounts optimizations for RR. 2012-10-17 00:55:44 -04:00
Eric Banks 19e2b5f0d5 RR optimization: since total count in BaseCounts is requested so often, don't keep computing it from scratch each time. 2012-10-17 00:44:23 -04:00
kshakir f93b279151 Moved the class field caching from QScript to a ClassFieldCache utility.
Using ClassFieldCache to pull values from QScript for passing to done() method of QStatusMessenger.
2012-10-16 18:49:31 -04:00
Guillermo del Angel f0e04376ec Add output file tag so caller can specify output vcf 2012-10-16 16:05:12 -04:00
David Roazen b30e2a5b7d BQSR: tool to profile the effects of more-granular locking on scalability by # of threads 2012-10-16 14:43:16 -04:00
Ami Levy Moonshine 402ce963f9 changes in the postQC summary tables foramt 2012-10-16 14:11:54 -04:00
Guillermo del Angel 62d9de084f Changes to specify outputs from inputs arguments per Khalid's request 2012-10-16 13:57:35 -04:00
Mark DePristo 9bcefadd4e Refactor ExactCallLogger into a separate class
-- Update minor integration tests with NanoSchedule due to qual accuracy update
2012-10-16 13:30:09 -04:00