Ryan Poplin
a647f1e076
Refactoring the PairHMM util class to allow for multiple implementations which can be specified by the callers via an enum argument. Adding an optimized PairHMM implementation which caches per-read calculations as well as a logless implementation which drastically reduces the runtime of the HMM while also increasing the precision of the result. In the HaplotypeCaller we now lexicographically sort the haplotypes to take maximal benefit of the haplotype offset optimization which only recalculates the HMM matrices after the first differing base in the haplotype. Many thanks to Mauricio for all the initial groundwork for these optimizations. The change to the one HC integration test is in the fourth decimal of HaplotypeScore.
2012-10-20 16:38:18 -04:00
David Roazen
25e7dcc46f
Performance tests: only need 1 iteration now that we're no longer running on the farm
2012-10-19 22:11:59 -04:00
Joel Thibault
45f64425a3
Update read metrics per shard rather than locus
2012-10-19 15:29:01 -04:00
Joel Thibault
637e0cf151
CountReads does not permit the use of output files
2012-10-19 15:29:01 -04:00
Joel Thibault
a5333006bb
Mark @Output as required
2012-10-19 15:29:01 -04:00
Yossi Farjoun
6557b7e364
Qscript for validating the pooledCaller against a single validation set
2012-10-19 14:39:57 -04:00
Khalid Shakir
2ef456d51a
Added explicit @ClassType annotations to @Argument for Option[Int] or Option[Double] since scala seems to change the reflected type to Option[Object] on some systems.
...
Changed ReflectionUtils.getGenericTypes' order of looking for @ClassType since the primitive generic wasn't completely erased, only changed to Object which is incorrect.
More fixes to @Arguments labeled as java.io.File via incorrect @Input annotation.
Put in a default undocumented implementation of @Argument doc() to match the one added to @Input.
2012-10-19 13:20:29 -04:00
Eric Banks
4622896312
Oops, killed contracts
2012-10-19 13:04:05 -04:00
Eric Banks
9c088fe3fe
Actually a better implementation of GATKSAMRecord.getSoftStart(). Last commit was all wrong. Oops.
2012-10-19 12:41:24 -04:00
Eric Banks
f7bd4998fc
No need for dummy GLs
2012-10-19 12:13:59 -04:00
Eric Banks
f08e5a44da
Better implementation of GATKSAMRecord.getSoftStart()
2012-10-19 12:11:18 -04:00
Eric Banks
deca564aef
Merge branch 'master' of ssh://gsa2/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-10-19 12:01:49 -04:00
Eric Banks
d3cf37dfaf
Bug fix for general ploidy model: when choosing the most likely alternate allele(s), you need to weight the likelihood mass by the ploidy of the specific alleles (otherwise all alt alleles will have the same probability). This fixes Yossi's issue with pooled validation calling. This may brek integration tests, but I will leave that to GdA to handle.
2012-10-19 12:01:45 -04:00
Eric Banks
27d8d3f51e
RR optimization: don't recalculate the entire bitset of variant sites for every read added to the sliding window. Instead, reuse as much of the previously calculated bitset as you can (basically from the window start until the start of the new read minus the context size). In some awfully performing regions this cuts down the runtime in half, although in others this doesn't seem to help much (so clearly something else is going on). Note that I still need to fix one last bug here, but it's almost done.
2012-10-19 11:59:34 -04:00
Khalid Shakir
403654d40a
Fixed null checkes in ArgumentTypeDescriptor due to ArgumentMatchValue updates.
...
Fixed @Arguments such as scatter count that were labeled as java.io.File via incorrect @Input annotation.
2012-10-18 16:57:15 -04:00
Guillermo del Angel
a4184716d8
Merge branch 'master' of ssh://gsa3/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-10-18 15:42:31 -04:00
Guillermo del Angel
3db38c5a93
Bug fix: inbreeding coeff shouldn't be computed in ref-only sites
2012-10-18 15:42:14 -04:00
Ami Levy Moonshine
a0381f15af
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-10-18 14:49:34 -04:00
Ami Levy Moonshine
f665c6ceb9
minor chnage (add 129 to the dbSNP name
2012-10-18 14:48:37 -04:00
Ryan Poplin
b4e69239dd
In order to be considered an informative read in the PerReadAlleleLikelihoodMap it has to be informative compared to all other alleles not just the worst allele. Also, fixing a bug when there is only one allele in the map.
2012-10-18 14:31:15 -04:00
Mauricio Carneiro
3504f71b6b
Fixing a null pointer exception bug for DEV-10
2012-10-18 13:58:38 -04:00
Ami Levy Moonshine
262c84a459
(1) Add RR step to the general calling script. \n (2) add more modularity to the script
2012-10-18 10:47:17 -04:00
Mark DePristo
d3fc797cfe
SelectVariants is actually *NOT* NanoSchedulable
2012-10-18 10:42:20 -04:00
Mark DePristo
f20fa9d082
SelectVariants is actually NanoSchedulable
2012-10-18 10:27:05 -04:00
Mark DePristo
97abb98c0b
Bugfix for bad nt / nct argument detection in MicroScheduler
2012-10-18 10:27:05 -04:00
Eric Banks
54f698422c
Better implementation for getSoftEnd() in GATKSAMRecord
2012-10-18 09:01:51 -04:00
Ami Levy Moonshine
acc0fb2f7a
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-10-17 22:16:02 -04:00
Eric Banks
20ffbcc86e
RR optimization: profiling was showing that the BaseCounts class was a major bottleneck because the underlying implementation was a HashMap. Given that the map index was an indexable Enum anyways, it makes a lot more sense to implement as a native array. Knocks 30% off the runtime in bad regions.
2012-10-17 21:44:53 -04:00
Mauricio Carneiro
b57df6cac8
Bringing CMI changes into the main GATK repo.
...
Merge remote-tracking branch 'cmi/master'
2012-10-17 15:23:19 -04:00
Mauricio Carneiro
4dea20b9b5
Release of CMI-0.1.0
2012-10-17 14:59:36 -04:00
Mark DePristo
8288c30e36
Use buffered output for ExactCallLogger
2012-10-17 14:15:11 -04:00
Mark DePristo
fa93681f51
Scalability test for EXACT models
2012-10-17 14:15:11 -04:00
Mark DePristo
c9e7a947c2
Improve interface of ExactCallLogger, use it to have a more informative AFCalcPerformanceTest
2012-10-17 14:15:11 -04:00
David Roazen
d6be657966
BQSR profiling: execute multiple operations per thread so that threading overhead doesn't dominate
2012-10-17 13:25:33 -04:00
kshakir
0196dbeaca
Added more logging to push/pull of RemoteFiles.
2012-10-17 09:52:17 -04:00
Eric Banks
33df1afe0e
More BaseCounts optimizations for RR.
2012-10-17 00:55:44 -04:00
Eric Banks
19e2b5f0d5
RR optimization: since total count in BaseCounts is requested so often, don't keep computing it from scratch each time.
2012-10-17 00:44:23 -04:00
kshakir
f93b279151
Moved the class field caching from QScript to a ClassFieldCache utility.
...
Using ClassFieldCache to pull values from QScript for passing to done() method of QStatusMessenger.
2012-10-16 18:49:31 -04:00
Guillermo del Angel
f0e04376ec
Add output file tag so caller can specify output vcf
2012-10-16 16:05:12 -04:00
David Roazen
b30e2a5b7d
BQSR: tool to profile the effects of more-granular locking on scalability by # of threads
2012-10-16 14:43:16 -04:00
Ami Levy Moonshine
402ce963f9
changes in the postQC summary tables foramt
2012-10-16 14:11:54 -04:00
Guillermo del Angel
62d9de084f
Changes to specify outputs from inputs arguments per Khalid's request
2012-10-16 13:57:35 -04:00
Mark DePristo
9bcefadd4e
Refactor ExactCallLogger into a separate class
...
-- Update minor integration tests with NanoSchedule due to qual accuracy update
2012-10-16 13:30:09 -04:00
Kristian Cibulskis
b26b7bd8e5
fixed problem with isIntermediate flag being interited from FQ2BAM
...
added support for tumor flag in metadata
2012-10-16 12:20:41 -04:00
Mark DePristo
c74d7061fe
Added AFCalcResultUnitTest
...
-- Ensures that the posteriors remain within reasonable ranges. Fixed bug where normalization of posteriors = {-1e30, 0.0} => {-100000, 0.0} which isn't good. Now tests ensure that the normalization process preserves log10 precision where possible
-- Updated MathUtils to make this possible
2012-10-16 08:11:06 -04:00
Mark DePristo
9b0ab4e941
Cleanup IndependentAllelesDiploidExactAFCalc
...
-- Remove capability to truncate genotype likelihoods -- this wasn't used and isn't really useful after all
-- Added lots of contracts and docs, still more to come.
-- Created a default makeMaxLikelihoods function in ReferenceDiploidExactAFCalc and DiploidExactAFCalc so that multiple subclasses don't just do the default thing
-- Generalized reference bi-allelic model in IndependentAllelesDiploidExactAFCalc so that in principle any bi-allelic reference model can be used.
2012-10-16 08:11:06 -04:00
Mark DePristo
6bd0ec8de4
Proper likelihoods and posterior probability of the joint allele frequency in IndependentAllelesDiploidExactAFCalc
...
-- Fixed minor numerical stability issue in AFCalcResult
-- posterior of joint A/B/C is 1 - (1 - P(D | AF_b == 0)) x (1 - P(D | AF_c == 0)), for any number of alleles, obviously. Now computes the joint posterior like this, and then back-calculates likelihoods that generate these posteriors given the priors. It's not pretty but it's the best thing to do
2012-10-16 08:11:06 -04:00
Mark DePristo
d1511e38ad
Removing ConstrainedAFCalculationModel; AFCalcPerformanceTest
...
-- Superceded by IndependentAFCalc
-- Added support to read in an ExactModelLog in AFCalcPerformanceTest and run the independent alleles model on it.
-- A few misc. bug fixes discovered during running the performance test
2012-10-16 08:11:06 -04:00
kshakir
9fcf71c031
Updated google reflections due to stale slf4j version conflicting with other projects also trying to use Queue as a component.
...
Added targets to build.xml to effectively 'mvn install' packaged GATK/Queue from ant.
TODO: Versions during 'mvn install' are hardcoded at 0.0.1 until a better versioning scheme that works with maven dependencies has been identified.
2012-10-16 02:22:30 -04:00
Ryan Poplin
31be807664
Updating missed integration test.
2012-10-15 22:31:52 -04:00