Commit Graph

11599 Commits (d18dbcbac103c0ce8f0480e04efcdd00a50f3394)

Author SHA1 Message Date
Ryan Poplin a647f1e076 Refactoring the PairHMM util class to allow for multiple implementations which can be specified by the callers via an enum argument. Adding an optimized PairHMM implementation which caches per-read calculations as well as a logless implementation which drastically reduces the runtime of the HMM while also increasing the precision of the result. In the HaplotypeCaller we now lexicographically sort the haplotypes to take maximal benefit of the haplotype offset optimization which only recalculates the HMM matrices after the first differing base in the haplotype. Many thanks to Mauricio for all the initial groundwork for these optimizations. The change to the one HC integration test is in the fourth decimal of HaplotypeScore. 2012-10-20 16:38:18 -04:00
David Roazen 25e7dcc46f Performance tests: only need 1 iteration now that we're no longer running on the farm 2012-10-19 22:11:59 -04:00
Joel Thibault 45f64425a3 Update read metrics per shard rather than locus 2012-10-19 15:29:01 -04:00
Joel Thibault 637e0cf151 CountReads does not permit the use of output files 2012-10-19 15:29:01 -04:00
Joel Thibault a5333006bb Mark @Output as required 2012-10-19 15:29:01 -04:00
Yossi Farjoun 6557b7e364 Qscript for validating the pooledCaller against a single validation set 2012-10-19 14:39:57 -04:00
Khalid Shakir 2ef456d51a Added explicit @ClassType annotations to @Argument for Option[Int] or Option[Double] since scala seems to change the reflected type to Option[Object] on some systems.
Changed ReflectionUtils.getGenericTypes' order of looking for @ClassType since the primitive generic wasn't completely erased, only changed to Object which is incorrect.
More fixes to @Arguments labeled as java.io.File via incorrect @Input annotation.
Put in a default undocumented implementation of @Argument doc() to match the one added to @Input.
2012-10-19 13:20:29 -04:00
Eric Banks 4622896312 Oops, killed contracts 2012-10-19 13:04:05 -04:00
Eric Banks 9c088fe3fe Actually a better implementation of GATKSAMRecord.getSoftStart(). Last commit was all wrong. Oops. 2012-10-19 12:41:24 -04:00
Guillermo del Angel 4f768e2f58 redo QC picard parts 2012-10-19 12:25:46 -04:00
Christopher Hartl 860ab1e539 Merge branch 'master' of gsa2:/humgen/gsa-scr1/chartl/dev/unstable 2012-10-19 12:16:30 -04:00
Eric Banks f7bd4998fc No need for dummy GLs 2012-10-19 12:13:59 -04:00
Eric Banks f08e5a44da Better implementation of GATKSAMRecord.getSoftStart() 2012-10-19 12:11:18 -04:00
Eric Banks deca564aef Merge branch 'master' of ssh://gsa2/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-10-19 12:01:49 -04:00
Eric Banks d3cf37dfaf Bug fix for general ploidy model: when choosing the most likely alternate allele(s), you need to weight the likelihood mass by the ploidy of the specific alleles (otherwise all alt alleles will have the same probability). This fixes Yossi's issue with pooled validation calling. This may brek integration tests, but I will leave that to GdA to handle. 2012-10-19 12:01:45 -04:00
Eric Banks 27d8d3f51e RR optimization: don't recalculate the entire bitset of variant sites for every read added to the sliding window. Instead, reuse as much of the previously calculated bitset as you can (basically from the window start until the start of the new read minus the context size). In some awfully performing regions this cuts down the runtime in half, although in others this doesn't seem to help much (so clearly something else is going on). Note that I still need to fix one last bug here, but it's almost done. 2012-10-19 11:59:34 -04:00
Guillermo del Angel 1658975f43 Intermediate fix 2012-10-19 11:01:56 -04:00
Kristian Cibulskis 6da7fb4132 * resolved [DEV-88] 2012-10-19 10:47:10 -04:00
Khalid Shakir 403654d40a Fixed null checkes in ArgumentTypeDescriptor due to ArgumentMatchValue updates.
Fixed @Arguments such as scatter count that were labeled as java.io.File via incorrect @Input annotation.
2012-10-18 16:57:15 -04:00
Christopher Hartl 5444f33f04 Some final touches (pointers --> variables).
With this we've gone from:

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.034    0.034  483.779  483.779 <string>:1(<module>)

to:

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.041    0.041   31.006   31.006 <string>:1(<module>)

or a reduction of the runtime by 94%.

notbad.gif
2012-10-18 16:45:14 -04:00
Christopher Hartl dd42ca112d Next big holdup: numpy.diag(N*p*(1-p)) is way too slow. Place this into a c routine as well.
Also, after much debugging, the memory leaks and segfaults (well...those that I know about so far) are coming from numpy.linalg.lstsq. Give up on the guessing, and set beta to 0 initially.
2012-10-18 16:08:14 -04:00
Guillermo del Angel a4184716d8 Merge branch 'master' of ssh://gsa3/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-10-18 15:42:31 -04:00
Guillermo del Angel 3db38c5a93 Bug fix: inbreeding coeff shouldn't be computed in ref-only sites 2012-10-18 15:42:14 -04:00
Ami Levy Moonshine a0381f15af Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-10-18 14:49:34 -04:00
Ami Levy Moonshine f665c6ceb9 minor chnage (add 129 to the dbSNP name 2012-10-18 14:48:37 -04:00
Ryan Poplin b4e69239dd In order to be considered an informative read in the PerReadAlleleLikelihoodMap it has to be informative compared to all other alleles not just the worst allele. Also, fixing a bug when there is only one allele in the map. 2012-10-18 14:31:15 -04:00
Mauricio Carneiro 3504f71b6b Fixing a null pointer exception bug for DEV-10 2012-10-18 13:58:38 -04:00
Guillermo del Angel c39578ec89 Undo git mess - revert back to origin and THEN comment out QC metrics 2012-10-18 12:49:26 -04:00
Guillermo del Angel 38656780b0 Revert "comment out QC metrics until picard jar path gets resolved"
This reverts commit 02049178662d1f7142e9df70502881264bd2ab81.
2012-10-18 12:45:21 -04:00
Guillermo del Angel d8562298c9 comment out QC metrics until picard jar path gets resolved 2012-10-18 12:43:00 -04:00
Ami Levy Moonshine 262c84a459 (1) Add RR step to the general calling script. \n (2) add more modularity to the script 2012-10-18 10:47:17 -04:00
Mark DePristo d3fc797cfe SelectVariants is actually *NOT* NanoSchedulable 2012-10-18 10:42:20 -04:00
Mark DePristo f20fa9d082 SelectVariants is actually NanoSchedulable 2012-10-18 10:27:05 -04:00
Mark DePristo 97abb98c0b Bugfix for bad nt / nct argument detection in MicroScheduler 2012-10-18 10:27:05 -04:00
Eric Banks 54f698422c Better implementation for getSoftEnd() in GATKSAMRecord 2012-10-18 09:01:51 -04:00
Christopher Hartl d8ca5028dd Alter the convergence criteria not to focus on beta (which we don't care about here) but rather on the convergence of the predictors. This leads to far quicker convergence in the case where some responses can be completely explained (so some of the true beta values are infinite). 2012-10-18 09:01:31 -04:00
Christopher Hartl ce70230a2e Evaluating the logistic regression given a large matrix of predictors was a killer roadblock (>99% runtime). NumPy's applyOverDimension didn't cut it. Instead: write the appropriate function as a c extension and call into it.
This is perhaps the most painful thing ever with python, but totally doable (better than a .mex file with matlab -- eh Guillermo?)
2012-10-17 23:52:47 -04:00
Ami Levy Moonshine acc0fb2f7a Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-10-17 22:16:02 -04:00
Eric Banks 20ffbcc86e RR optimization: profiling was showing that the BaseCounts class was a major bottleneck because the underlying implementation was a HashMap. Given that the map index was an indexable Enum anyways, it makes a lot more sense to implement as a native array. Knocks 30% off the runtime in bad regions. 2012-10-17 21:44:53 -04:00
Guillermo del Angel bfb73e1c5b Fix merge conflicts 2012-10-17 20:13:51 -04:00
Guillermo del Angel a30ac7a778 Add uniform threading argument and enable multithreading in BQSR - will save over 40 hours of runtime 2012-10-17 20:08:22 -04:00
kshakir f45e7ffbd8 Updated CMIBAMPP to use latest design of script() generating local paths and pushing bits back to <provided_s3_bucket/<provided_s3_prefix><argument fullName>. 2012-10-17 20:00:03 -04:00
kshakir 55ac4ba70b Added another utility that can convert to RemoteFiles.
QScripts will now generate remote versions of files if the caller has not already passed in remote versions (or the QScript replaces the passed in remote references... not good)
Instead of having yet another plugin, combined QStatusMessenger and RemoteFileConverter under general QCommandPlugin trait.
2012-10-17 20:00:03 -04:00
Mauricio Carneiro 32ee2c7dff Refactored the compression interface per sample in ReduceReadsa
The CompressionStash is now responsible for keeping track of all intervals that must be kept uncompressed by all samples. In general this is a list generated by a tumor sample that will enforce all normal samples to abide.
  - Updated ReduceReads integration tests
  - Sliding Window is now using the CompressionStash (single sample).

DEV-104 #resolve #time 3m
2012-10-17 16:40:40 -04:00
Mauricio Carneiro b57df6cac8 Bringing CMI changes into the main GATK repo.
Merge remote-tracking branch 'cmi/master'
2012-10-17 15:23:19 -04:00
Mauricio Carneiro 4dea20b9b5 Release of CMI-0.1.0 2012-10-17 14:59:36 -04:00
Mark DePristo 8288c30e36 Use buffered output for ExactCallLogger 2012-10-17 14:15:11 -04:00
Mark DePristo fa93681f51 Scalability test for EXACT models 2012-10-17 14:15:11 -04:00
Mark DePristo c9e7a947c2 Improve interface of ExactCallLogger, use it to have a more informative AFCalcPerformanceTest 2012-10-17 14:15:11 -04:00
Guillermo del Angel 097bc2329c Merge branch 'develop' 2012-10-17 13:55:41 -04:00