Commit Graph

10845 Commits (32ee2c7dffde3210e2c3b183f5f2fefd3a49af23)

Author SHA1 Message Date
Mauricio Carneiro 32ee2c7dff Refactored the compression interface per sample in ReduceReadsa
The CompressionStash is now responsible for keeping track of all intervals that must be kept uncompressed by all samples. In general this is a list generated by a tumor sample that will enforce all normal samples to abide.
  - Updated ReduceReads integration tests
  - Sliding Window is now using the CompressionStash (single sample).

DEV-104 #resolve #time 3m
2012-10-17 16:40:40 -04:00
Mauricio Carneiro b57df6cac8 Bringing CMI changes into the main GATK repo.
Merge remote-tracking branch 'cmi/master'
2012-10-17 15:23:19 -04:00
Mauricio Carneiro 4dea20b9b5 Release of CMI-0.1.0 2012-10-17 14:59:36 -04:00
Mark DePristo 8288c30e36 Use buffered output for ExactCallLogger 2012-10-17 14:15:11 -04:00
Mark DePristo fa93681f51 Scalability test for EXACT models 2012-10-17 14:15:11 -04:00
Mark DePristo c9e7a947c2 Improve interface of ExactCallLogger, use it to have a more informative AFCalcPerformanceTest 2012-10-17 14:15:11 -04:00
David Roazen d6be657966 BQSR profiling: execute multiple operations per thread so that threading overhead doesn't dominate 2012-10-17 13:25:33 -04:00
kshakir 0196dbeaca Added more logging to push/pull of RemoteFiles. 2012-10-17 09:52:17 -04:00
kshakir f93b279151 Moved the class field caching from QScript to a ClassFieldCache utility.
Using ClassFieldCache to pull values from QScript for passing to done() method of QStatusMessenger.
2012-10-16 18:49:31 -04:00
Guillermo del Angel f0e04376ec Add output file tag so caller can specify output vcf 2012-10-16 16:05:12 -04:00
David Roazen b30e2a5b7d BQSR: tool to profile the effects of more-granular locking on scalability by # of threads 2012-10-16 14:43:16 -04:00
Guillermo del Angel 62d9de084f Changes to specify outputs from inputs arguments per Khalid's request 2012-10-16 13:57:35 -04:00
Mark DePristo 9bcefadd4e Refactor ExactCallLogger into a separate class
-- Update minor integration tests with NanoSchedule due to qual accuracy update
2012-10-16 13:30:09 -04:00
Kristian Cibulskis b26b7bd8e5 fixed problem with isIntermediate flag being interited from FQ2BAM
added support for tumor flag in metadata
2012-10-16 12:20:41 -04:00
Mark DePristo c74d7061fe Added AFCalcResultUnitTest
-- Ensures that the posteriors remain within reasonable ranges.  Fixed bug where normalization of posteriors = {-1e30, 0.0} => {-100000, 0.0} which isn't good.  Now tests ensure that the normalization process preserves log10 precision where possible
-- Updated MathUtils to make this possible
2012-10-16 08:11:06 -04:00
Mark DePristo 9b0ab4e941 Cleanup IndependentAllelesDiploidExactAFCalc
-- Remove capability to truncate genotype likelihoods -- this wasn't used and isn't really useful after all
-- Added lots of contracts and docs, still more to come.
-- Created a default makeMaxLikelihoods function in ReferenceDiploidExactAFCalc and DiploidExactAFCalc so that multiple subclasses don't just do the default thing
-- Generalized reference bi-allelic model in IndependentAllelesDiploidExactAFCalc so that in principle any bi-allelic reference model can be used.
2012-10-16 08:11:06 -04:00
Mark DePristo 6bd0ec8de4 Proper likelihoods and posterior probability of the joint allele frequency in IndependentAllelesDiploidExactAFCalc
-- Fixed minor numerical stability issue in AFCalcResult
-- posterior of joint A/B/C is 1 - (1 - P(D | AF_b == 0)) x (1 - P(D | AF_c == 0)), for any number of alleles, obviously.  Now computes the joint posterior like this, and then back-calculates likelihoods that generate these posteriors given the priors.  It's not pretty but it's the best thing to do
2012-10-16 08:11:06 -04:00
Mark DePristo d1511e38ad Removing ConstrainedAFCalculationModel; AFCalcPerformanceTest
-- Superceded by IndependentAFCalc
-- Added support to read in an ExactModelLog in AFCalcPerformanceTest and run the independent alleles model on it.
-- A few misc. bug fixes discovered during running the performance test
2012-10-16 08:11:06 -04:00
kshakir 9fcf71c031 Updated google reflections due to stale slf4j version conflicting with other projects also trying to use Queue as a component.
Added targets to build.xml to effectively 'mvn install' packaged GATK/Queue from ant.
TODO: Versions during 'mvn install' are hardcoded at 0.0.1 until a better versioning scheme that works with maven dependencies has been identified.
2012-10-16 02:22:30 -04:00
Ryan Poplin 31be807664 Updating missed integration test. 2012-10-15 22:31:52 -04:00
Ryan Poplin d27ae67bb6 Updating the multi-step UG integration test. 2012-10-15 22:30:01 -04:00
Kristian Cibulskis 6c0e4895f0 added intervals to MuTect in BAM-PP
moved intervals from trait to MuTect class
2012-10-15 22:00:27 -04:00
Kristian Cibulskis 9bb241f06f Merge branch 'develop' of github.com:broadinstitute/cmi-gatk into develop 2012-10-15 21:59:11 -04:00
David Roazen cb33f25bfc Update expected values for HybridSelectionPipelineTest
Mark has confirmed that these differences were to be expected
given his recent changes.
2012-10-15 18:32:15 -04:00
kshakir c4ee31075c Fixed package error and a few deprecated scala warnings. 2012-10-15 15:29:40 -04:00
kshakir 213cc00abe Refactored argument matching to support other plugins in addition to file lists.
Added plugin support for sending Queue status messages.
Argument parsing can store subclasses of java.io.File, for example RemoteFile.
2012-10-15 15:10:45 -04:00
Mauricio Carneiro 4642e4eb66 Merge branch 'unstable' of https://github.com/broadinstitute/cmi-gatk into unstable 2012-10-15 13:50:03 -04:00
Mauricio Carneiro 69194e5032 Adding intellij example files to the repo 2012-10-15 13:49:09 -04:00
Guillermo del Angel ff2307031a Set default parameters for several command line inputs based on refdata content on cloud instances 2012-10-15 13:49:09 -04:00
Guillermo del Angel b7318f1c96 Bug fixes for temp mutect integration 2012-10-15 13:49:09 -04:00
Guillermo del Angel c66a4d79ba Further bug fixes to merge cancer/germline fastq-bam pipelines 2012-10-15 13:49:09 -04:00
Guillermo del Angel 7580548b5f Temp fixes 2012-10-15 13:49:09 -04:00
Mauricio Carneiro 80d92e0c63 Allowing the GATK to have non-required outputs
Modified the SAMFileWriterArgumentTypeDescriptor to accept output bam files that are null if they're not required (in the @Output annotation).

This change enables the nWayOut parameter for the IndeRealigner and ReduceReads to operate optionally while maintaining the original single way out.

[#DEV-10 transition:31 resolution:1]
2012-10-15 13:49:08 -04:00
Mauricio Carneiro a234bacb02 Making nContigs parameter hidden in ReduceReads
For now, the het reduction should only be performed for diploids (n=2). We haven't really tested it for other ploidy so it should remain hidden until someone braves it out.
2012-10-15 13:49:08 -04:00
Guillermo del Angel d7308646e9 Fix bugs so that we can pass in 2 simultaneous samples in metadata (no co-cleaning yet but at least we don't need to run pipeline twice) to produce 2 bams. Pasted temp mutect so it's also run at the end of the run 2012-10-15 13:49:08 -04:00
Kristian Cibulskis dad7ca281e upgraded mutation caller with VCF output
raw indel calls (non filtered,non vcf)
2012-10-15 13:49:08 -04:00
Guillermo del Angel 31d6c3538b Some fixes to QC commands in pipeline, and workaround for critical engine bug in GATK that makes it hang when doing small targeted BAM's with a whole exome interval list 2012-10-15 13:49:08 -04:00
Guillermo del Angel 22b79fb4dd Resolve [DEV-7]: add single-sample VCF calling at end of FASTQ-BAM pipeline. Initial steps of [DEV-4]: queue extensions for Picard QC metrics 2012-10-15 13:49:08 -04:00
Guillermo del Angel d07df384e7 a) Initial raw version of CMI BAM->VCF pipeline (most likely not working yet, but at least compiles and produces reasonable command lines), b) rename FASTQ->BAM script so name is more descriptive 2012-10-15 13:49:07 -04:00
Kristian Cibulskis 658f355171 initial cancer pipeline with mutations and partial indel support 2012-10-15 13:49:07 -04:00
Guillermo del Angel 3e71b238b0 BAM pipeline fixes: a) temp workaround for DEV-9: -nWayOut argument in IndelRealigner is broken, for now things will only really work in single sample mode, b) correct extension of RealignerTargetCreator output, previous extension caused an error 2012-10-15 13:49:07 -04:00
Guillermo del Angel 91ce0243b0 Minor tweaks to CMIProcessing Pipeline: a) don't hard-code job mem limit to 4 G since it's too much for most AWS instances, leave it instead as input argument, b) minor doc cleanups 2012-10-15 13:49:07 -04:00
Mauricio Carneiro ccd5b22646 Reimplementation of the BAM procesing pipeline using the metadata information file.
Pipeline runs end-to-end using example metadata  and has been tested only for cases where everything is ideal.
Next step is to bring this to the cloud, test all different scenario (multiple tumors, single ended, missing parameters etc).
Parallel next step is to add QC metrics.
2012-10-15 13:49:07 -04:00
Mauricio Carneiro 6eedd69248 New version of the pipeline starting from an ALIGNED bam going all the way to reducing using n-way out cleaning 2012-10-15 13:49:07 -04:00
Mauricio Carneiro 6174aa801b Revised implementation of the RAWBAM => BAM pipeline
stripped out all the FQ pipeline and tumor/normal information.
2012-10-15 13:49:06 -04:00
Mauricio Carneiro 59bcd0f0d2 First implementation of the CMI data processing pipeline, handling both germline and cancer BAM/FQ => BAM.
Not ready for prime time yet, need more work!
2012-10-15 13:49:06 -04:00
Mauricio Carneiro 322ea1262c First implementation of a generic 'bundled' Data Processing Pipeline for germline and cancer.
not ready for prime time yet!
2012-10-15 13:49:06 -04:00
Mauricio Carneiro f1fb51b222 Reverting the DPP to the original version, going to create a new simplified version for CMI in private. 2012-10-15 13:49:06 -04:00
Mauricio Carneiro 429c96e723 Generic input file name recognition (still need to implement support to FastQ, but it now can at least accept it) 2012-10-15 13:49:06 -04:00
Ryan Poplin 25be94fbb8 Increasing the precision of MathUtils.approximateLog10SumLog10 from 1E-3 to 1E-4. Genotyper integration tests change as a result. Expanding the unit tests of MathUtils.log10sumLog10. 2012-10-15 13:24:32 -04:00