Ryan Poplin
b4e69239dd
In order to be considered an informative read in the PerReadAlleleLikelihoodMap it has to be informative compared to all other alleles not just the worst allele. Also, fixing a bug when there is only one allele in the map.
2012-10-18 14:31:15 -04:00
Mauricio Carneiro
3504f71b6b
Fixing a null pointer exception bug for DEV-10
2012-10-18 13:58:38 -04:00
Mark DePristo
d3fc797cfe
SelectVariants is actually *NOT* NanoSchedulable
2012-10-18 10:42:20 -04:00
Mark DePristo
f20fa9d082
SelectVariants is actually NanoSchedulable
2012-10-18 10:27:05 -04:00
Mark DePristo
97abb98c0b
Bugfix for bad nt / nct argument detection in MicroScheduler
2012-10-18 10:27:05 -04:00
Mauricio Carneiro
b57df6cac8
Bringing CMI changes into the main GATK repo.
...
Merge remote-tracking branch 'cmi/master'
2012-10-17 15:23:19 -04:00
Mauricio Carneiro
4dea20b9b5
Release of CMI-0.1.0
2012-10-17 14:59:36 -04:00
Mark DePristo
8288c30e36
Use buffered output for ExactCallLogger
2012-10-17 14:15:11 -04:00
Mark DePristo
fa93681f51
Scalability test for EXACT models
2012-10-17 14:15:11 -04:00
Mark DePristo
c9e7a947c2
Improve interface of ExactCallLogger, use it to have a more informative AFCalcPerformanceTest
2012-10-17 14:15:11 -04:00
David Roazen
d6be657966
BQSR profiling: execute multiple operations per thread so that threading overhead doesn't dominate
2012-10-17 13:25:33 -04:00
kshakir
0196dbeaca
Added more logging to push/pull of RemoteFiles.
2012-10-17 09:52:17 -04:00
kshakir
f93b279151
Moved the class field caching from QScript to a ClassFieldCache utility.
...
Using ClassFieldCache to pull values from QScript for passing to done() method of QStatusMessenger.
2012-10-16 18:49:31 -04:00
Guillermo del Angel
f0e04376ec
Add output file tag so caller can specify output vcf
2012-10-16 16:05:12 -04:00
David Roazen
b30e2a5b7d
BQSR: tool to profile the effects of more-granular locking on scalability by # of threads
2012-10-16 14:43:16 -04:00
Guillermo del Angel
62d9de084f
Changes to specify outputs from inputs arguments per Khalid's request
2012-10-16 13:57:35 -04:00
Mark DePristo
9bcefadd4e
Refactor ExactCallLogger into a separate class
...
-- Update minor integration tests with NanoSchedule due to qual accuracy update
2012-10-16 13:30:09 -04:00
Kristian Cibulskis
b26b7bd8e5
fixed problem with isIntermediate flag being interited from FQ2BAM
...
added support for tumor flag in metadata
2012-10-16 12:20:41 -04:00
Mark DePristo
c74d7061fe
Added AFCalcResultUnitTest
...
-- Ensures that the posteriors remain within reasonable ranges. Fixed bug where normalization of posteriors = {-1e30, 0.0} => {-100000, 0.0} which isn't good. Now tests ensure that the normalization process preserves log10 precision where possible
-- Updated MathUtils to make this possible
2012-10-16 08:11:06 -04:00
Mark DePristo
9b0ab4e941
Cleanup IndependentAllelesDiploidExactAFCalc
...
-- Remove capability to truncate genotype likelihoods -- this wasn't used and isn't really useful after all
-- Added lots of contracts and docs, still more to come.
-- Created a default makeMaxLikelihoods function in ReferenceDiploidExactAFCalc and DiploidExactAFCalc so that multiple subclasses don't just do the default thing
-- Generalized reference bi-allelic model in IndependentAllelesDiploidExactAFCalc so that in principle any bi-allelic reference model can be used.
2012-10-16 08:11:06 -04:00
Mark DePristo
6bd0ec8de4
Proper likelihoods and posterior probability of the joint allele frequency in IndependentAllelesDiploidExactAFCalc
...
-- Fixed minor numerical stability issue in AFCalcResult
-- posterior of joint A/B/C is 1 - (1 - P(D | AF_b == 0)) x (1 - P(D | AF_c == 0)), for any number of alleles, obviously. Now computes the joint posterior like this, and then back-calculates likelihoods that generate these posteriors given the priors. It's not pretty but it's the best thing to do
2012-10-16 08:11:06 -04:00
Mark DePristo
d1511e38ad
Removing ConstrainedAFCalculationModel; AFCalcPerformanceTest
...
-- Superceded by IndependentAFCalc
-- Added support to read in an ExactModelLog in AFCalcPerformanceTest and run the independent alleles model on it.
-- A few misc. bug fixes discovered during running the performance test
2012-10-16 08:11:06 -04:00
kshakir
9fcf71c031
Updated google reflections due to stale slf4j version conflicting with other projects also trying to use Queue as a component.
...
Added targets to build.xml to effectively 'mvn install' packaged GATK/Queue from ant.
TODO: Versions during 'mvn install' are hardcoded at 0.0.1 until a better versioning scheme that works with maven dependencies has been identified.
2012-10-16 02:22:30 -04:00
Ryan Poplin
31be807664
Updating missed integration test.
2012-10-15 22:31:52 -04:00
Ryan Poplin
d27ae67bb6
Updating the multi-step UG integration test.
2012-10-15 22:30:01 -04:00
Kristian Cibulskis
6c0e4895f0
added intervals to MuTect in BAM-PP
...
moved intervals from trait to MuTect class
2012-10-15 22:00:27 -04:00
Kristian Cibulskis
9bb241f06f
Merge branch 'develop' of github.com:broadinstitute/cmi-gatk into develop
2012-10-15 21:59:11 -04:00
David Roazen
cb33f25bfc
Update expected values for HybridSelectionPipelineTest
...
Mark has confirmed that these differences were to be expected
given his recent changes.
2012-10-15 18:32:15 -04:00
kshakir
c4ee31075c
Fixed package error and a few deprecated scala warnings.
2012-10-15 15:29:40 -04:00
kshakir
213cc00abe
Refactored argument matching to support other plugins in addition to file lists.
...
Added plugin support for sending Queue status messages.
Argument parsing can store subclasses of java.io.File, for example RemoteFile.
2012-10-15 15:10:45 -04:00
Mauricio Carneiro
4642e4eb66
Merge branch 'unstable' of https://github.com/broadinstitute/cmi-gatk into unstable
2012-10-15 13:50:03 -04:00
Mauricio Carneiro
69194e5032
Adding intellij example files to the repo
2012-10-15 13:49:09 -04:00
Guillermo del Angel
ff2307031a
Set default parameters for several command line inputs based on refdata content on cloud instances
2012-10-15 13:49:09 -04:00
Guillermo del Angel
b7318f1c96
Bug fixes for temp mutect integration
2012-10-15 13:49:09 -04:00
Guillermo del Angel
c66a4d79ba
Further bug fixes to merge cancer/germline fastq-bam pipelines
2012-10-15 13:49:09 -04:00
Guillermo del Angel
7580548b5f
Temp fixes
2012-10-15 13:49:09 -04:00
Mauricio Carneiro
80d92e0c63
Allowing the GATK to have non-required outputs
...
Modified the SAMFileWriterArgumentTypeDescriptor to accept output bam files that are null if they're not required (in the @Output annotation).
This change enables the nWayOut parameter for the IndeRealigner and ReduceReads to operate optionally while maintaining the original single way out.
[#DEV-10 transition:31 resolution:1]
2012-10-15 13:49:08 -04:00
Mauricio Carneiro
a234bacb02
Making nContigs parameter hidden in ReduceReads
...
For now, the het reduction should only be performed for diploids (n=2). We haven't really tested it for other ploidy so it should remain hidden until someone braves it out.
2012-10-15 13:49:08 -04:00
Guillermo del Angel
d7308646e9
Fix bugs so that we can pass in 2 simultaneous samples in metadata (no co-cleaning yet but at least we don't need to run pipeline twice) to produce 2 bams. Pasted temp mutect so it's also run at the end of the run
2012-10-15 13:49:08 -04:00
Kristian Cibulskis
dad7ca281e
upgraded mutation caller with VCF output
...
raw indel calls (non filtered,non vcf)
2012-10-15 13:49:08 -04:00
Guillermo del Angel
31d6c3538b
Some fixes to QC commands in pipeline, and workaround for critical engine bug in GATK that makes it hang when doing small targeted BAM's with a whole exome interval list
2012-10-15 13:49:08 -04:00
Guillermo del Angel
22b79fb4dd
Resolve [DEV-7]: add single-sample VCF calling at end of FASTQ-BAM pipeline. Initial steps of [DEV-4]: queue extensions for Picard QC metrics
2012-10-15 13:49:08 -04:00
Guillermo del Angel
d07df384e7
a) Initial raw version of CMI BAM->VCF pipeline (most likely not working yet, but at least compiles and produces reasonable command lines), b) rename FASTQ->BAM script so name is more descriptive
2012-10-15 13:49:07 -04:00
Kristian Cibulskis
658f355171
initial cancer pipeline with mutations and partial indel support
2012-10-15 13:49:07 -04:00
Guillermo del Angel
3e71b238b0
BAM pipeline fixes: a) temp workaround for DEV-9: -nWayOut argument in IndelRealigner is broken, for now things will only really work in single sample mode, b) correct extension of RealignerTargetCreator output, previous extension caused an error
2012-10-15 13:49:07 -04:00
Guillermo del Angel
91ce0243b0
Minor tweaks to CMIProcessing Pipeline: a) don't hard-code job mem limit to 4 G since it's too much for most AWS instances, leave it instead as input argument, b) minor doc cleanups
2012-10-15 13:49:07 -04:00
Mauricio Carneiro
ccd5b22646
Reimplementation of the BAM procesing pipeline using the metadata information file.
...
Pipeline runs end-to-end using example metadata and has been tested only for cases where everything is ideal.
Next step is to bring this to the cloud, test all different scenario (multiple tumors, single ended, missing parameters etc).
Parallel next step is to add QC metrics.
2012-10-15 13:49:07 -04:00
Mauricio Carneiro
6eedd69248
New version of the pipeline starting from an ALIGNED bam going all the way to reducing using n-way out cleaning
2012-10-15 13:49:07 -04:00
Mauricio Carneiro
6174aa801b
Revised implementation of the RAWBAM => BAM pipeline
...
stripped out all the FQ pipeline and tumor/normal information.
2012-10-15 13:49:06 -04:00
Mauricio Carneiro
59bcd0f0d2
First implementation of the CMI data processing pipeline, handling both germline and cancer BAM/FQ => BAM.
...
Not ready for prime time yet, need more work!
2012-10-15 13:49:06 -04:00