Commit Graph

10845 Commits (32ee2c7dffde3210e2c3b183f5f2fefd3a49af23)

Author SHA1 Message Date
Ryan Poplin 08b8ce6903 Fixing merge conflicts related to the comment formatting in the BQSR. 2012-10-10 16:03:58 -04:00
Ryan Poplin 45717349dc Fixing BQSR bug reported on the forum for reads that begin with insertions. 2012-10-10 16:01:37 -04:00
David Roazen 40a3b5bfe2 Revert "Testing github auto-mirroring attempt #2; please ignore"
This reverts commit aacbe369446af8d7901820bf828ed15d72497005.
2012-10-10 15:28:50 -04:00
David Roazen fba6a084e4 Testing github auto-mirroring attempt #2; please ignore 2012-10-10 15:28:13 -04:00
David Roazen 267d1ff59c Revert "Testing the new github auto-mirroring; please ignore"
This reverts commit bd8b321132167f6f393f234ea0e93edcfd8701ff.
2012-10-10 15:07:48 -04:00
David Roazen 66ee3f230f Testing the new github auto-mirroring; please ignore 2012-10-10 15:06:50 -04:00
Mauricio Carneiro e9eaa33c0b adding some directories to gitignore 2012-10-10 13:26:13 -04:00
Mauricio Carneiro 29195cd3aa Removed the intellij files from the root and made an example package for new users. This allows users to start at the same page and then change it as they see fit without interfering with the repo (thanks guillermo!) 2012-10-10 13:25:38 -04:00
Mauricio Carneiro fdf29503fb removing annoying xml from IDEA configuration 2012-10-10 13:25:38 -04:00
Mauricio Carneiro e29bcab42e Updating Intellij enviroment and adding Scala 2012-10-10 13:25:38 -04:00
Mauricio Carneiro f085f5d46a Adding default intellij configuration files 2012-10-10 13:25:38 -04:00
Mauricio Carneiro 88297606f0 Adding intellij example configuration files 2012-10-10 13:20:30 -04:00
Guillermo del Angel c0b7d53170 a) Initial raw version of CMI BAM->VCF pipeline (most likely not working yet, but at least compiles and produces reasonable command lines), b) rename FASTQ->BAM script so name is more descriptive 2012-10-10 13:19:05 -04:00
Kristian Cibulskis 2311606de4 initial cancer pipeline with mutations and partial indel support 2012-10-10 13:19:04 -04:00
Guillermo del Angel 45aa59a31c BAM pipeline fixes: a) temp workaround for DEV-9: -nWayOut argument in IndelRealigner is broken, for now things will only really work in single sample mode, b) correct extension of RealignerTargetCreator output, previous extension caused an error 2012-10-10 13:19:04 -04:00
Guillermo del Angel b8c721e6ec Minor tweaks to CMIProcessing Pipeline: a) don't hard-code job mem limit to 4 G since it's too much for most AWS instances, leave it instead as input argument, b) minor doc cleanups 2012-10-10 13:19:04 -04:00
Mauricio Carneiro ca055d8804 Reimplementation of the BAM procesing pipeline using the metadata information file.
Pipeline runs end-to-end using example metadata  and has been tested only for cases where everything is ideal.
Next step is to bring this to the cloud, test all different scenario (multiple tumors, single ended, missing parameters etc).
Parallel next step is to add QC metrics.
2012-10-10 13:19:04 -04:00
Mauricio Carneiro 25ff934e5a New version of the pipeline starting from an ALIGNED bam going all the way to reducing using n-way out cleaning 2012-10-10 13:19:04 -04:00
Mauricio Carneiro 7d4adea183 Revised implementation of the RAWBAM => BAM pipeline
stripped out all the FQ pipeline and tumor/normal information.
2012-10-10 13:19:03 -04:00
Mauricio Carneiro e413b9fe51 First implementation of the CMI data processing pipeline, handling both germline and cancer BAM/FQ => BAM.
Not ready for prime time yet, need more work!
2012-10-10 13:19:03 -04:00
Mauricio Carneiro 0c17709223 First implementation of a generic 'bundled' Data Processing Pipeline for germline and cancer.
not ready for prime time yet!
2012-10-10 13:19:03 -04:00
Mauricio Carneiro 08b6d1559c Reverting the DPP to the original version, going to create a new simplified version for CMI in private. 2012-10-10 13:19:03 -04:00
Mauricio Carneiro f9095c7ab7 Generic input file name recognition (still need to implement support to FastQ, but it now can at least accept it) 2012-10-10 13:19:03 -04:00
Ryan Poplin 15b405d458 Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-10-10 10:47:40 -04:00
Ryan Poplin 2a9ee89c19 Turning on allele trimming for the haplotype caller. 2012-10-10 10:47:26 -04:00
Christopher Hartl 7381d5c243 Since this GRM now matches GCTA output for uncorrected intervals, implement and start proofing methods for LD-correction for genome partitioning. Very rudimentary tests just to solidify current position.
Wish I could do this in the GATK, but it has to run on bed files natively. Phooey.
2012-10-10 01:59:13 -04:00
Khalid Shakir f66284658d RetryMemoryLimit now works with Scatter/Gather. 2012-10-09 21:51:03 -04:00
Johan Dahlberg e9b9e2318c Fixed SortSam bug, for .done file
The *.bai.done file for the .bai file was written in the run directory instead of in the specified output directory.
Changing getName() to getAbsolutePath() fixes this.

Signed-off-by: Joel Thibault <thibault@broadinstitute.org>
2012-10-09 16:25:18 -04:00
Guillermo del Angel 9d7aa3cda8 a) Initial raw version of CMI BAM->VCF pipeline (most likely not working yet, but at least compiles and produces reasonable command lines), b) rename FASTQ->BAM script so name is more descriptive 2012-10-09 11:45:24 -04:00
Ryan Poplin b543bddbb7 Fixing merge conflicts related to the comment formatting in the BQSR. 2012-10-08 10:23:08 -04:00
Ryan Poplin b3cc04976f Fixing BQSR bug reported on the forum for reads that being with insertions. 2012-10-08 10:18:29 -04:00
Eric Banks be9fcba546 Don't allow triggering of polyploid consensus creation in regions where there is more than one het, as it just doesn't work properly. We could probably refactor at some point to make it work, but it's not worth doing that now (especially as it should be rare to have multiple proximal known hets in a single sample exome). 2012-10-07 16:32:48 -04:00
Eric Banks 08ac80c080 RR bug: when the last base in the window around the polyploid consensus is filtered (low quality), the filtered consensus is not flushed and subsequent filtered bases (but importantly not contiguous to this one) are just added to this position. In other words, bases were being added to the wrong genomic positions. Fixed. 2012-10-07 10:52:01 -04:00
Eric Banks 36a26a7da6 md5s failed because I forgot to add --no_cmdline_in_header so it is different depending on where you run from. Fixed. 2012-10-07 08:35:55 -04:00
Eric Banks a5aaa14aaa Fix for GSA-601: Indels dropped during liftover. This was a true bug that was an effect of the switch over to the non-null representation of alleles in the VariantContext. Unfortunately, this tool didn't have integration tests - but it does now. 2012-10-07 01:19:52 -04:00
Eric Banks 82e40340c0 Use StringBuilder over StringBuffer 2012-10-07 00:02:15 -04:00
Eric Banks 5d6aad67e2 Fix for bug reported on forums: VariantsToTable does not handle lists and nested arrays correctly. Added an integration test to cover printing of PLs. 2012-10-07 00:01:27 -04:00
Eric Banks e7798ddd2a Fix for JIRA GSA-598: AD field not handled properly by CombineVariants. It was also not handled by SelectVariants either. We now strip the AD field out whenever combining/selecting makes it invalid due to a changing of the number of ALT alleles. 2012-10-06 23:02:36 -04:00
Eric Banks bfc551f612 Fix for GSA-589: SelectVariants with -number gives biased results. The implementation was not good and it's not worth keeping this busted code around given that we have a working implementation of a fractional random sampling already in place, so I removed it. 2012-10-06 22:39:49 -04:00
Eric Banks e8a6460a33 After merging with Yossi's fix I can confirm that the AD is fixed when going through the HC too. Added similar fixes to DP and FS annotations too. 2012-10-05 16:37:42 -04:00
Eric Banks b7639d7ceb Merge branch 'master' of ssh://gsa2/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-10-05 16:21:17 -04:00
Eric Banks 52326942cf Merge branch 'master' of ssh://gsa2/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-10-05 16:15:07 -04:00
Eric Banks 04853252a0 Possible fix for reduced reads coming from the HaplotypeCaller in the AD 2012-10-05 16:15:04 -04:00
Yossi Farjoun ef90beb827 - forgot to use git rm to delete a file from git. Now that VCF is deleted.
- uncommented a HC test that I missed.
2012-10-05 16:14:51 -04:00
Yossi Farjoun 6874a5ce76 This bam and bai are needed for testing the ADAnnotation tests (both UG and HC)
The vcf file was mistakenly added previously, now removed.
2012-10-05 16:10:41 -04:00
Yossi Farjoun d419a33ed1 * Added an integration test for AD annotation in the Haplotype caller.
* Corrected FS Anotation for UG as for AD.
* HC still does not annotate ReducedReads correctly (for FS nor AD)
2012-10-05 15:23:59 -04:00
Yossi Farjoun dc4dcb4140 fixed AD annotation for a ReducedReads BAM file. Added an integration test for this case with a new reduced BAM in private/testdata 2012-10-05 14:20:07 -04:00
Eric Banks f840d9edbd HC test should continue using 3 alt alleles for indels 2012-10-05 02:03:34 -04:00
Eric Banks c66ef17cd0 Add a separate max alt alleles argument for indels that defaults to 2 instead of 3. PLEASE TAKE NOTE. 2012-10-04 13:52:14 -04:00
Christopher Hartl beaa1ac07e Turns out GCTA replaces a missing variant with the mean dosage (2*frequency), but then normalizes the genetic distance by the number of non-missing genotype pairs. An odd thing to do, but with this the GRMs are confluent (up to a small tolerance) 2012-10-04 13:29:38 -04:00