Mark DePristo
695cf83675
More docs and contracts for classes in genotyper.afcalc
...
-- Future protection of the output of GeneralPloidyExactAFCalc, which produces in some cases bad likelihoods (positive values)
2012-10-21 12:42:31 -04:00
Mark DePristo
99c9031cb4
Merge AFCalcResultTracker into StateTracker, cleanup
...
-- These two classes were really the same, and now they are actually the same!
-- Cleanuped the interfaces, removed duplicate data
-- Added lots of contracts, some of which found numerical issues with GeneralPloidyExactAFCalc (which have been patched over but not fixed)
-- Moved goodProbability and goodProbabilityVector utilities to MathUtils. Very useful for contracts!
2012-10-21 12:42:31 -04:00
Mark DePristo
9c63cee9fc
Moving pnrm to UnifiedArgumentCollection so it's available with the HaplotypeCaller
2012-10-21 12:42:31 -04:00
Guillermo del Angel
e9b7324dc1
Merge branch 'master' of ssh://gsa3/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-10-21 12:38:49 -04:00
Guillermo del Angel
67b9e7319e
Fix for integration tests: new criterion in AF exact calculation model to trim alleles based on likelihoods does produce better results and resulting alleles changed in 2 sites at integration tests (and all subsequent sites after this had minor annotation differences due to RankSum dithering)
2012-10-21 12:38:33 -04:00
Eric Banks
0616b98551
Not sure why we were setting the UAC variables instead of the simpleUAC ones when that's what we wanted.
2012-10-21 08:26:26 -04:00
Eric Banks
d44d5b8275
Fix RawHapMapCodec so that it can build indexes. Minor fixes to VCF codec.
2012-10-21 01:29:59 -04:00
Eric Banks
841a906f21
Adding a hidden (for now) argument to UG (and HC) that tells the caller that the incoming samples are contaminated by N% and to fix it by aggressively down-sampling all alleles. This actually works. Yes, you read that right: given that we know what N is, we can make good calls on bams that have N% contamination. Only hooked up for SNPS right now. No tests added yet.
2012-10-20 23:31:56 -04:00
Eric Banks
2c624f76c8
Refactoring the Unified (and Standard) Argument Collections because it was really ugly that the subclass had to do all the cloning for the super class. The clone() method is really not recommended best practice in Java anyways, so I changed it so that we use standard overloaded constructors. Confirmed that the Haplotype Caller --help docs do not include UG-specific arguments.
2012-10-20 20:35:54 -04:00
Ryan Poplin
a647f1e076
Refactoring the PairHMM util class to allow for multiple implementations which can be specified by the callers via an enum argument. Adding an optimized PairHMM implementation which caches per-read calculations as well as a logless implementation which drastically reduces the runtime of the HMM while also increasing the precision of the result. In the HaplotypeCaller we now lexicographically sort the haplotypes to take maximal benefit of the haplotype offset optimization which only recalculates the HMM matrices after the first differing base in the haplotype. Many thanks to Mauricio for all the initial groundwork for these optimizations. The change to the one HC integration test is in the fourth decimal of HaplotypeScore.
2012-10-20 16:38:18 -04:00
David Roazen
25e7dcc46f
Performance tests: only need 1 iteration now that we're no longer running on the farm
2012-10-19 22:11:59 -04:00
Joel Thibault
45f64425a3
Update read metrics per shard rather than locus
2012-10-19 15:29:01 -04:00
Joel Thibault
637e0cf151
CountReads does not permit the use of output files
2012-10-19 15:29:01 -04:00
Joel Thibault
a5333006bb
Mark @Output as required
2012-10-19 15:29:01 -04:00
Yossi Farjoun
6557b7e364
Qscript for validating the pooledCaller against a single validation set
2012-10-19 14:39:57 -04:00
Khalid Shakir
2ef456d51a
Added explicit @ClassType annotations to @Argument for Option[Int] or Option[Double] since scala seems to change the reflected type to Option[Object] on some systems.
...
Changed ReflectionUtils.getGenericTypes' order of looking for @ClassType since the primitive generic wasn't completely erased, only changed to Object which is incorrect.
More fixes to @Arguments labeled as java.io.File via incorrect @Input annotation.
Put in a default undocumented implementation of @Argument doc() to match the one added to @Input.
2012-10-19 13:20:29 -04:00
Eric Banks
4622896312
Oops, killed contracts
2012-10-19 13:04:05 -04:00
Eric Banks
9c088fe3fe
Actually a better implementation of GATKSAMRecord.getSoftStart(). Last commit was all wrong. Oops.
2012-10-19 12:41:24 -04:00
Guillermo del Angel
4f768e2f58
redo QC picard parts
2012-10-19 12:25:46 -04:00
Christopher Hartl
860ab1e539
Merge branch 'master' of gsa2:/humgen/gsa-scr1/chartl/dev/unstable
2012-10-19 12:16:30 -04:00
Eric Banks
f7bd4998fc
No need for dummy GLs
2012-10-19 12:13:59 -04:00
Eric Banks
f08e5a44da
Better implementation of GATKSAMRecord.getSoftStart()
2012-10-19 12:11:18 -04:00
Eric Banks
deca564aef
Merge branch 'master' of ssh://gsa2/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-10-19 12:01:49 -04:00
Eric Banks
d3cf37dfaf
Bug fix for general ploidy model: when choosing the most likely alternate allele(s), you need to weight the likelihood mass by the ploidy of the specific alleles (otherwise all alt alleles will have the same probability). This fixes Yossi's issue with pooled validation calling. This may brek integration tests, but I will leave that to GdA to handle.
2012-10-19 12:01:45 -04:00
Eric Banks
27d8d3f51e
RR optimization: don't recalculate the entire bitset of variant sites for every read added to the sliding window. Instead, reuse as much of the previously calculated bitset as you can (basically from the window start until the start of the new read minus the context size). In some awfully performing regions this cuts down the runtime in half, although in others this doesn't seem to help much (so clearly something else is going on). Note that I still need to fix one last bug here, but it's almost done.
2012-10-19 11:59:34 -04:00
Guillermo del Angel
1658975f43
Intermediate fix
2012-10-19 11:01:56 -04:00
Kristian Cibulskis
6da7fb4132
* resolved [DEV-88]
2012-10-19 10:47:10 -04:00
Khalid Shakir
403654d40a
Fixed null checkes in ArgumentTypeDescriptor due to ArgumentMatchValue updates.
...
Fixed @Arguments such as scatter count that were labeled as java.io.File via incorrect @Input annotation.
2012-10-18 16:57:15 -04:00
Christopher Hartl
5444f33f04
Some final touches (pointers --> variables).
...
With this we've gone from:
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.034 0.034 483.779 483.779 <string>:1(<module>)
to:
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.041 0.041 31.006 31.006 <string>:1(<module>)
or a reduction of the runtime by 94%.
notbad.gif
2012-10-18 16:45:14 -04:00
Christopher Hartl
dd42ca112d
Next big holdup: numpy.diag(N*p*(1-p)) is way too slow. Place this into a c routine as well.
...
Also, after much debugging, the memory leaks and segfaults (well...those that I know about so far) are coming from numpy.linalg.lstsq. Give up on the guessing, and set beta to 0 initially.
2012-10-18 16:08:14 -04:00
Guillermo del Angel
a4184716d8
Merge branch 'master' of ssh://gsa3/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-10-18 15:42:31 -04:00
Guillermo del Angel
3db38c5a93
Bug fix: inbreeding coeff shouldn't be computed in ref-only sites
2012-10-18 15:42:14 -04:00
Ami Levy Moonshine
a0381f15af
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-10-18 14:49:34 -04:00
Ami Levy Moonshine
f665c6ceb9
minor chnage (add 129 to the dbSNP name
2012-10-18 14:48:37 -04:00
Ryan Poplin
b4e69239dd
In order to be considered an informative read in the PerReadAlleleLikelihoodMap it has to be informative compared to all other alleles not just the worst allele. Also, fixing a bug when there is only one allele in the map.
2012-10-18 14:31:15 -04:00
Mauricio Carneiro
3504f71b6b
Fixing a null pointer exception bug for DEV-10
2012-10-18 13:58:38 -04:00
Guillermo del Angel
c39578ec89
Undo git mess - revert back to origin and THEN comment out QC metrics
2012-10-18 12:49:26 -04:00
Guillermo del Angel
38656780b0
Revert "comment out QC metrics until picard jar path gets resolved"
...
This reverts commit 02049178662d1f7142e9df70502881264bd2ab81.
2012-10-18 12:45:21 -04:00
Guillermo del Angel
d8562298c9
comment out QC metrics until picard jar path gets resolved
2012-10-18 12:43:00 -04:00
Ami Levy Moonshine
262c84a459
(1) Add RR step to the general calling script. \n (2) add more modularity to the script
2012-10-18 10:47:17 -04:00
Mark DePristo
d3fc797cfe
SelectVariants is actually *NOT* NanoSchedulable
2012-10-18 10:42:20 -04:00
Mark DePristo
f20fa9d082
SelectVariants is actually NanoSchedulable
2012-10-18 10:27:05 -04:00
Mark DePristo
97abb98c0b
Bugfix for bad nt / nct argument detection in MicroScheduler
2012-10-18 10:27:05 -04:00
Eric Banks
54f698422c
Better implementation for getSoftEnd() in GATKSAMRecord
2012-10-18 09:01:51 -04:00
Christopher Hartl
d8ca5028dd
Alter the convergence criteria not to focus on beta (which we don't care about here) but rather on the convergence of the predictors. This leads to far quicker convergence in the case where some responses can be completely explained (so some of the true beta values are infinite).
2012-10-18 09:01:31 -04:00
Christopher Hartl
ce70230a2e
Evaluating the logistic regression given a large matrix of predictors was a killer roadblock (>99% runtime). NumPy's applyOverDimension didn't cut it. Instead: write the appropriate function as a c extension and call into it.
...
This is perhaps the most painful thing ever with python, but totally doable (better than a .mex file with matlab -- eh Guillermo?)
2012-10-17 23:52:47 -04:00
Ami Levy Moonshine
acc0fb2f7a
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-10-17 22:16:02 -04:00
Eric Banks
20ffbcc86e
RR optimization: profiling was showing that the BaseCounts class was a major bottleneck because the underlying implementation was a HashMap. Given that the map index was an indexable Enum anyways, it makes a lot more sense to implement as a native array. Knocks 30% off the runtime in bad regions.
2012-10-17 21:44:53 -04:00
Guillermo del Angel
bfb73e1c5b
Fix merge conflicts
2012-10-17 20:13:51 -04:00
Guillermo del Angel
a30ac7a778
Add uniform threading argument and enable multithreading in BQSR - will save over 40 hours of runtime
2012-10-17 20:08:22 -04:00