Guillermo del Angel
c39578ec89
Undo git mess - revert back to origin and THEN comment out QC metrics
2012-10-18 12:49:26 -04:00
Guillermo del Angel
38656780b0
Revert "comment out QC metrics until picard jar path gets resolved"
...
This reverts commit 02049178662d1f7142e9df70502881264bd2ab81.
2012-10-18 12:45:21 -04:00
Guillermo del Angel
d8562298c9
comment out QC metrics until picard jar path gets resolved
2012-10-18 12:43:00 -04:00
Ami Levy Moonshine
262c84a459
(1) Add RR step to the general calling script. \n (2) add more modularity to the script
2012-10-18 10:47:17 -04:00
Mark DePristo
d3fc797cfe
SelectVariants is actually *NOT* NanoSchedulable
2012-10-18 10:42:20 -04:00
Mark DePristo
f20fa9d082
SelectVariants is actually NanoSchedulable
2012-10-18 10:27:05 -04:00
Mark DePristo
97abb98c0b
Bugfix for bad nt / nct argument detection in MicroScheduler
2012-10-18 10:27:05 -04:00
Eric Banks
54f698422c
Better implementation for getSoftEnd() in GATKSAMRecord
2012-10-18 09:01:51 -04:00
Christopher Hartl
d8ca5028dd
Alter the convergence criteria not to focus on beta (which we don't care about here) but rather on the convergence of the predictors. This leads to far quicker convergence in the case where some responses can be completely explained (so some of the true beta values are infinite).
2012-10-18 09:01:31 -04:00
Christopher Hartl
ce70230a2e
Evaluating the logistic regression given a large matrix of predictors was a killer roadblock (>99% runtime). NumPy's applyOverDimension didn't cut it. Instead: write the appropriate function as a c extension and call into it.
...
This is perhaps the most painful thing ever with python, but totally doable (better than a .mex file with matlab -- eh Guillermo?)
2012-10-17 23:52:47 -04:00
Ami Levy Moonshine
acc0fb2f7a
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-10-17 22:16:02 -04:00
Eric Banks
20ffbcc86e
RR optimization: profiling was showing that the BaseCounts class was a major bottleneck because the underlying implementation was a HashMap. Given that the map index was an indexable Enum anyways, it makes a lot more sense to implement as a native array. Knocks 30% off the runtime in bad regions.
2012-10-17 21:44:53 -04:00
Guillermo del Angel
bfb73e1c5b
Fix merge conflicts
2012-10-17 20:13:51 -04:00
Guillermo del Angel
a30ac7a778
Add uniform threading argument and enable multithreading in BQSR - will save over 40 hours of runtime
2012-10-17 20:08:22 -04:00
kshakir
f45e7ffbd8
Updated CMIBAMPP to use latest design of script() generating local paths and pushing bits back to <provided_s3_bucket/<provided_s3_prefix><argument fullName>.
2012-10-17 20:00:03 -04:00
kshakir
55ac4ba70b
Added another utility that can convert to RemoteFiles.
...
QScripts will now generate remote versions of files if the caller has not already passed in remote versions (or the QScript replaces the passed in remote references... not good)
Instead of having yet another plugin, combined QStatusMessenger and RemoteFileConverter under general QCommandPlugin trait.
2012-10-17 20:00:03 -04:00
Mauricio Carneiro
32ee2c7dff
Refactored the compression interface per sample in ReduceReadsa
...
The CompressionStash is now responsible for keeping track of all intervals that must be kept uncompressed by all samples. In general this is a list generated by a tumor sample that will enforce all normal samples to abide.
- Updated ReduceReads integration tests
- Sliding Window is now using the CompressionStash (single sample).
DEV-104 #resolve #time 3m
2012-10-17 16:40:40 -04:00
Mauricio Carneiro
b57df6cac8
Bringing CMI changes into the main GATK repo.
...
Merge remote-tracking branch 'cmi/master'
2012-10-17 15:23:19 -04:00
Mauricio Carneiro
4dea20b9b5
Release of CMI-0.1.0
2012-10-17 14:59:36 -04:00
Mark DePristo
8288c30e36
Use buffered output for ExactCallLogger
2012-10-17 14:15:11 -04:00
Mark DePristo
fa93681f51
Scalability test for EXACT models
2012-10-17 14:15:11 -04:00
Mark DePristo
c9e7a947c2
Improve interface of ExactCallLogger, use it to have a more informative AFCalcPerformanceTest
2012-10-17 14:15:11 -04:00
Guillermo del Angel
097bc2329c
Merge branch 'develop'
2012-10-17 13:55:41 -04:00
David Roazen
d6be657966
BQSR profiling: execute multiple operations per thread so that threading overhead doesn't dominate
2012-10-17 13:25:33 -04:00
Christopher Hartl
98b83421a3
A faster implementation of the Plink Bed Reader that's optimized for SNP-major bed files. Reduced runtime from 102 minutes to 12 minutes. Partially tested.
...
Recover gracefully from completely correlated haplotype data: run Penalized IRLS to prevent divergence (usually one step). Tested.
2012-10-17 12:48:14 -04:00
kshakir
0196dbeaca
Added more logging to push/pull of RemoteFiles.
2012-10-17 09:52:17 -04:00
Eric Banks
33df1afe0e
More BaseCounts optimizations for RR.
2012-10-17 00:55:44 -04:00
Eric Banks
19e2b5f0d5
RR optimization: since total count in BaseCounts is requested so often, don't keep computing it from scratch each time.
2012-10-17 00:44:23 -04:00
kshakir
f93b279151
Moved the class field caching from QScript to a ClassFieldCache utility.
...
Using ClassFieldCache to pull values from QScript for passing to done() method of QStatusMessenger.
2012-10-16 18:49:31 -04:00
Guillermo del Angel
f0e04376ec
Add output file tag so caller can specify output vcf
2012-10-16 16:05:12 -04:00
David Roazen
b30e2a5b7d
BQSR: tool to profile the effects of more-granular locking on scalability by # of threads
2012-10-16 14:43:16 -04:00
Ami Levy Moonshine
402ce963f9
changes in the postQC summary tables foramt
2012-10-16 14:11:54 -04:00
Guillermo del Angel
62d9de084f
Changes to specify outputs from inputs arguments per Khalid's request
2012-10-16 13:57:35 -04:00
Mark DePristo
9bcefadd4e
Refactor ExactCallLogger into a separate class
...
-- Update minor integration tests with NanoSchedule due to qual accuracy update
2012-10-16 13:30:09 -04:00
Kristian Cibulskis
b26b7bd8e5
fixed problem with isIntermediate flag being interited from FQ2BAM
...
added support for tumor flag in metadata
2012-10-16 12:20:41 -04:00
Mark DePristo
c74d7061fe
Added AFCalcResultUnitTest
...
-- Ensures that the posteriors remain within reasonable ranges. Fixed bug where normalization of posteriors = {-1e30, 0.0} => {-100000, 0.0} which isn't good. Now tests ensure that the normalization process preserves log10 precision where possible
-- Updated MathUtils to make this possible
2012-10-16 08:11:06 -04:00
Mark DePristo
9b0ab4e941
Cleanup IndependentAllelesDiploidExactAFCalc
...
-- Remove capability to truncate genotype likelihoods -- this wasn't used and isn't really useful after all
-- Added lots of contracts and docs, still more to come.
-- Created a default makeMaxLikelihoods function in ReferenceDiploidExactAFCalc and DiploidExactAFCalc so that multiple subclasses don't just do the default thing
-- Generalized reference bi-allelic model in IndependentAllelesDiploidExactAFCalc so that in principle any bi-allelic reference model can be used.
2012-10-16 08:11:06 -04:00
Mark DePristo
6bd0ec8de4
Proper likelihoods and posterior probability of the joint allele frequency in IndependentAllelesDiploidExactAFCalc
...
-- Fixed minor numerical stability issue in AFCalcResult
-- posterior of joint A/B/C is 1 - (1 - P(D | AF_b == 0)) x (1 - P(D | AF_c == 0)), for any number of alleles, obviously. Now computes the joint posterior like this, and then back-calculates likelihoods that generate these posteriors given the priors. It's not pretty but it's the best thing to do
2012-10-16 08:11:06 -04:00
Mark DePristo
d1511e38ad
Removing ConstrainedAFCalculationModel; AFCalcPerformanceTest
...
-- Superceded by IndependentAFCalc
-- Added support to read in an ExactModelLog in AFCalcPerformanceTest and run the independent alleles model on it.
-- A few misc. bug fixes discovered during running the performance test
2012-10-16 08:11:06 -04:00
kshakir
9fcf71c031
Updated google reflections due to stale slf4j version conflicting with other projects also trying to use Queue as a component.
...
Added targets to build.xml to effectively 'mvn install' packaged GATK/Queue from ant.
TODO: Versions during 'mvn install' are hardcoded at 0.0.1 until a better versioning scheme that works with maven dependencies has been identified.
2012-10-16 02:22:30 -04:00
Ryan Poplin
31be807664
Updating missed integration test.
2012-10-15 22:31:52 -04:00
Ryan Poplin
d27ae67bb6
Updating the multi-step UG integration test.
2012-10-15 22:30:01 -04:00
Kristian Cibulskis
6c0e4895f0
added intervals to MuTect in BAM-PP
...
moved intervals from trait to MuTect class
2012-10-15 22:00:27 -04:00
Kristian Cibulskis
9bb241f06f
Merge branch 'develop' of github.com:broadinstitute/cmi-gatk into develop
2012-10-15 21:59:11 -04:00
David Roazen
cb33f25bfc
Update expected values for HybridSelectionPipelineTest
...
Mark has confirmed that these differences were to be expected
given his recent changes.
2012-10-15 18:32:15 -04:00
kshakir
c4ee31075c
Fixed package error and a few deprecated scala warnings.
2012-10-15 15:29:40 -04:00
kshakir
213cc00abe
Refactored argument matching to support other plugins in addition to file lists.
...
Added plugin support for sending Queue status messages.
Argument parsing can store subclasses of java.io.File, for example RemoteFile.
2012-10-15 15:10:45 -04:00
Mauricio Carneiro
4642e4eb66
Merge branch 'unstable' of https://github.com/broadinstitute/cmi-gatk into unstable
2012-10-15 13:50:03 -04:00
Mauricio Carneiro
69194e5032
Adding intellij example files to the repo
2012-10-15 13:49:09 -04:00
Guillermo del Angel
ff2307031a
Set default parameters for several command line inputs based on refdata content on cloud instances
2012-10-15 13:49:09 -04:00