Commit Graph

10888 Commits (9c63cee9fcdb69a7a8e8d77a771ddb2afa18f7cd)

Author SHA1 Message Date
Mauricio Carneiro f085f5d46a Adding default intellij configuration files 2012-10-10 13:25:38 -04:00
Mauricio Carneiro 88297606f0 Adding intellij example configuration files 2012-10-10 13:20:30 -04:00
Guillermo del Angel c0b7d53170 a) Initial raw version of CMI BAM->VCF pipeline (most likely not working yet, but at least compiles and produces reasonable command lines), b) rename FASTQ->BAM script so name is more descriptive 2012-10-10 13:19:05 -04:00
Kristian Cibulskis 2311606de4 initial cancer pipeline with mutations and partial indel support 2012-10-10 13:19:04 -04:00
Guillermo del Angel 45aa59a31c BAM pipeline fixes: a) temp workaround for DEV-9: -nWayOut argument in IndelRealigner is broken, for now things will only really work in single sample mode, b) correct extension of RealignerTargetCreator output, previous extension caused an error 2012-10-10 13:19:04 -04:00
Guillermo del Angel b8c721e6ec Minor tweaks to CMIProcessing Pipeline: a) don't hard-code job mem limit to 4 G since it's too much for most AWS instances, leave it instead as input argument, b) minor doc cleanups 2012-10-10 13:19:04 -04:00
Mauricio Carneiro ca055d8804 Reimplementation of the BAM procesing pipeline using the metadata information file.
Pipeline runs end-to-end using example metadata  and has been tested only for cases where everything is ideal.
Next step is to bring this to the cloud, test all different scenario (multiple tumors, single ended, missing parameters etc).
Parallel next step is to add QC metrics.
2012-10-10 13:19:04 -04:00
Mauricio Carneiro 25ff934e5a New version of the pipeline starting from an ALIGNED bam going all the way to reducing using n-way out cleaning 2012-10-10 13:19:04 -04:00
Mauricio Carneiro 7d4adea183 Revised implementation of the RAWBAM => BAM pipeline
stripped out all the FQ pipeline and tumor/normal information.
2012-10-10 13:19:03 -04:00
Mauricio Carneiro e413b9fe51 First implementation of the CMI data processing pipeline, handling both germline and cancer BAM/FQ => BAM.
Not ready for prime time yet, need more work!
2012-10-10 13:19:03 -04:00
Mauricio Carneiro 0c17709223 First implementation of a generic 'bundled' Data Processing Pipeline for germline and cancer.
not ready for prime time yet!
2012-10-10 13:19:03 -04:00
Mauricio Carneiro 08b6d1559c Reverting the DPP to the original version, going to create a new simplified version for CMI in private. 2012-10-10 13:19:03 -04:00
Mauricio Carneiro f9095c7ab7 Generic input file name recognition (still need to implement support to FastQ, but it now can at least accept it) 2012-10-10 13:19:03 -04:00
Ryan Poplin 15b405d458 Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-10-10 10:47:40 -04:00
Ryan Poplin 2a9ee89c19 Turning on allele trimming for the haplotype caller. 2012-10-10 10:47:26 -04:00
Christopher Hartl 7381d5c243 Since this GRM now matches GCTA output for uncorrected intervals, implement and start proofing methods for LD-correction for genome partitioning. Very rudimentary tests just to solidify current position.
Wish I could do this in the GATK, but it has to run on bed files natively. Phooey.
2012-10-10 01:59:13 -04:00
Khalid Shakir f66284658d RetryMemoryLimit now works with Scatter/Gather. 2012-10-09 21:51:03 -04:00
Johan Dahlberg e9b9e2318c Fixed SortSam bug, for .done file
The *.bai.done file for the .bai file was written in the run directory instead of in the specified output directory.
Changing getName() to getAbsolutePath() fixes this.

Signed-off-by: Joel Thibault <thibault@broadinstitute.org>
2012-10-09 16:25:18 -04:00
Guillermo del Angel 9d7aa3cda8 a) Initial raw version of CMI BAM->VCF pipeline (most likely not working yet, but at least compiles and produces reasonable command lines), b) rename FASTQ->BAM script so name is more descriptive 2012-10-09 11:45:24 -04:00
Ryan Poplin b543bddbb7 Fixing merge conflicts related to the comment formatting in the BQSR. 2012-10-08 10:23:08 -04:00
Ryan Poplin b3cc04976f Fixing BQSR bug reported on the forum for reads that being with insertions. 2012-10-08 10:18:29 -04:00
Eric Banks be9fcba546 Don't allow triggering of polyploid consensus creation in regions where there is more than one het, as it just doesn't work properly. We could probably refactor at some point to make it work, but it's not worth doing that now (especially as it should be rare to have multiple proximal known hets in a single sample exome). 2012-10-07 16:32:48 -04:00
Eric Banks 08ac80c080 RR bug: when the last base in the window around the polyploid consensus is filtered (low quality), the filtered consensus is not flushed and subsequent filtered bases (but importantly not contiguous to this one) are just added to this position. In other words, bases were being added to the wrong genomic positions. Fixed. 2012-10-07 10:52:01 -04:00
Eric Banks 36a26a7da6 md5s failed because I forgot to add --no_cmdline_in_header so it is different depending on where you run from. Fixed. 2012-10-07 08:35:55 -04:00
Eric Banks a5aaa14aaa Fix for GSA-601: Indels dropped during liftover. This was a true bug that was an effect of the switch over to the non-null representation of alleles in the VariantContext. Unfortunately, this tool didn't have integration tests - but it does now. 2012-10-07 01:19:52 -04:00
Eric Banks 82e40340c0 Use StringBuilder over StringBuffer 2012-10-07 00:02:15 -04:00
Eric Banks 5d6aad67e2 Fix for bug reported on forums: VariantsToTable does not handle lists and nested arrays correctly. Added an integration test to cover printing of PLs. 2012-10-07 00:01:27 -04:00
Eric Banks e7798ddd2a Fix for JIRA GSA-598: AD field not handled properly by CombineVariants. It was also not handled by SelectVariants either. We now strip the AD field out whenever combining/selecting makes it invalid due to a changing of the number of ALT alleles. 2012-10-06 23:02:36 -04:00
Eric Banks bfc551f612 Fix for GSA-589: SelectVariants with -number gives biased results. The implementation was not good and it's not worth keeping this busted code around given that we have a working implementation of a fractional random sampling already in place, so I removed it. 2012-10-06 22:39:49 -04:00
Eric Banks e8a6460a33 After merging with Yossi's fix I can confirm that the AD is fixed when going through the HC too. Added similar fixes to DP and FS annotations too. 2012-10-05 16:37:42 -04:00
Eric Banks b7639d7ceb Merge branch 'master' of ssh://gsa2/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-10-05 16:21:17 -04:00
Eric Banks 52326942cf Merge branch 'master' of ssh://gsa2/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-10-05 16:15:07 -04:00
Eric Banks 04853252a0 Possible fix for reduced reads coming from the HaplotypeCaller in the AD 2012-10-05 16:15:04 -04:00
Yossi Farjoun ef90beb827 - forgot to use git rm to delete a file from git. Now that VCF is deleted.
- uncommented a HC test that I missed.
2012-10-05 16:14:51 -04:00
Yossi Farjoun 6874a5ce76 This bam and bai are needed for testing the ADAnnotation tests (both UG and HC)
The vcf file was mistakenly added previously, now removed.
2012-10-05 16:10:41 -04:00
Yossi Farjoun d419a33ed1 * Added an integration test for AD annotation in the Haplotype caller.
* Corrected FS Anotation for UG as for AD.
* HC still does not annotate ReducedReads correctly (for FS nor AD)
2012-10-05 15:23:59 -04:00
Yossi Farjoun dc4dcb4140 fixed AD annotation for a ReducedReads BAM file. Added an integration test for this case with a new reduced BAM in private/testdata 2012-10-05 14:20:07 -04:00
Eric Banks f840d9edbd HC test should continue using 3 alt alleles for indels 2012-10-05 02:03:34 -04:00
Eric Banks c66ef17cd0 Add a separate max alt alleles argument for indels that defaults to 2 instead of 3. PLEASE TAKE NOTE. 2012-10-04 13:52:14 -04:00
Christopher Hartl beaa1ac07e Turns out GCTA replaces a missing variant with the mean dosage (2*frequency), but then normalizes the genetic distance by the number of non-missing genotype pairs. An odd thing to do, but with this the GRMs are confluent (up to a small tolerance) 2012-10-04 13:29:38 -04:00
Christopher Hartl 01dcdf2830 Waypoint: GRM is identical with GCTA if no genotypes are missing. Not sure how GCTA is treating these, but it's definitely not strictly excluding them. 2012-10-04 12:53:03 -04:00
Scott Frazer 3ffba77656 Revert "initial cancer pipeline with mutations and partial indel support"
This reverts commit 4a2e5b1fcc3ad53dbb26d43eed1220b0257e9901.
2012-10-04 11:37:54 -04:00
Kristian Cibulskis 0afde9906a initial cancer pipeline with mutations and partial indel support 2012-10-04 11:37:11 -04:00
Eric Banks e13e61673b Merge branch 'master' of ssh://gsa2/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-10-04 10:54:23 -04:00
Guillermo del Angel 49db96c8ad BAM pipeline fixes: a) temp workaround for DEV-9: -nWayOut argument in IndelRealigner is broken, for now things will only really work in single sample mode, b) correct extension of RealignerTargetCreator output, previous extension caused an error 2012-10-04 10:53:13 -04:00
Eric Banks dfddc4bb0e Protect against cases where there are counts but no quals 2012-10-04 10:52:30 -04:00
Eric Banks 0c46845c92 Refactored the BaseCounts classes so that they are safer and allow for calculations on the most probable base (which is not necessarily the most common base). 2012-10-04 10:37:11 -04:00
Mark DePristo b6e20e083a Copied DiploidExactAFCalc to placeholder OptimizedDiploidExact
-- Will be removed.  Only commiting now to fix public -> private dependency
2012-10-03 20:16:38 -07:00
Mark DePristo 51cafa73e6 Removing public -> private dependency 2012-10-03 20:05:03 -07:00
Mark DePristo f6a2ca6e7f Fixes / TODOs for meaningful results with AFCalculationResult
-- Right now the state of the AFCaclulationResult can be corrupt (ie, log10 likelihoods can be -Infinity).  Forced me to disable reasonable contracts.  Needs to be thought through
-- exactCallsLog should be optional
-- Update UG integration tests as the calculation of the normalized posteriors is done in a marginally different way so the output is rounded slightly differently.
2012-10-03 19:55:12 -07:00