Commit Graph

10966 Commits (0a56fe5bc33f9dfd40e25005f2e037bc36e9ffdc)

Author SHA1 Message Date
Guillermo del Angel 32e377a0db Fix bugs so that we can pass in 2 simultaneous samples in metadata (no co-cleaning yet but at least we don't need to run pipeline twice) to produce 2 bams. Pasted temp mutect so it's also run at the end of the run 2012-10-12 14:39:28 -04:00
David Roazen da1cffbfca Run performance tests in gsa-engineering queue on gsa4 rather than gsa queue
Running the performance tests on the farm wasn't working out very well --
it's been too long since they've run to completion. Switching back to
running them on gsa4 for now.
2012-10-12 14:21:27 -04:00
Guillermo del Angel dc03a09722 Merge branch 'develop' into unstable 2012-10-12 14:19:42 -04:00
Kristian Cibulskis c1706ef0ef upgraded mutation caller with VCF output
raw indel calls (non filtered,non vcf)
2012-10-12 14:18:12 -04:00
Guillermo del Angel 5971006678 Bug fix when running nondiploid mode in UG with EMIT_ALL_SITES: if site was reference-only, QUAL is produced OK but genotypes were being set to no-call because of unnecessary likelihood normalization. May change integration test md5 which I'll fix later today 2012-10-12 12:45:55 -04:00
Eric Banks 81532a0529 Missing file are user errors. 2012-10-12 09:48:12 -04:00
Eric Banks fa77a83783 Update the out of space error to include another permutation 2012-10-12 09:38:12 -04:00
Eric Banks 85525d9e6e Make Geraldine's life easier: from now on we treat problems where a temp file cannot be found when running the GATK with multiple threads as User Errors (since they are 99.9% of the time). This is an extremely large class of errors in Tableau and on the forums. Helpful error message tells users exactly what we tell them on the forums anyways (Geraldine: feel free to edit). 2012-10-12 09:19:50 -04:00
Eric Banks ad60300bee Catch malformed BAM files at the source since this is the largest class of errors in Tableau. 2012-10-12 09:07:57 -04:00
Eric Banks 593c8065d9 Fix docs for BadMateFilter 2012-10-12 08:35:45 -04:00
Christopher Hartl 6b9987cf1b Merge branch 'master' of gsa2:/humgen/gsa-scr1/chartl/dev/unstable 2012-10-12 00:48:42 -04:00
Christopher Hartl c1211ad3a1 Full test suite of LD-corrected GRM calculation. The correctness of this code is now largely verified. Matches GCTA when no correction is used (up to 6 decimal places). Bed reading relies on a particular test directory that is still local. The rest is all generated in unit test fashion. 2012-10-12 00:46:02 -04:00
David Roazen 3861212dab Fix inefficiency in FilePointer GenomeLoc validation
Validation of GenomeLocs in the FilePointer class was extremely inefficient
when the GenomeLocs were added one at a time rather than all at once.

Appears to mostly fix GSA-604
2012-10-11 19:55:14 -04:00
Guillermo del Angel 47e9d967fe Merging in from cmi-develop branch - staying in this branch for now 2012-10-11 15:35:43 -04:00
Guillermo del Angel 77949ec740 Some fixes to QC commands in pipeline, and workaround for critical engine bug in GATK that makes it hang when doing small targeted BAM's with a whole exome interval list 2012-10-11 15:08:30 -04:00
Ami Levy Moonshine ef3882f439 PhaseByTransmission: small typo /n. variantCallQC_summaryTablesOnly.R: small changes (more to come) /n GeneralCallingPipeline.scala: the new pipeline script. It is not as clean as I want it to be, but it works. I still going to work on it a little bit more. Also, it does not include yet: (1) the RR step (2) need better eval step (3) need to include other targets (currently it eork on the CEU Trio) 2012-10-11 14:51:41 -04:00
Guillermo del Angel af5a6fdace Resolve [DEV-7]: add single-sample VCF calling at end of FASTQ-BAM pipeline. Initial steps of [DEV-4]: queue extensions for Picard QC metrics 2012-10-11 11:09:49 -04:00
Mark DePristo 9b19f5ce99 No longer include stack traces for user exceptions in GATK logs
-- Was taking a shocking large amount of space on the server, and slowing down Tableau so much all stack traces had to be disabled
2012-10-10 20:41:03 -04:00
Ryan Poplin 08b8ce6903 Fixing merge conflicts related to the comment formatting in the BQSR. 2012-10-10 16:03:58 -04:00
Ryan Poplin 45717349dc Fixing BQSR bug reported on the forum for reads that begin with insertions. 2012-10-10 16:01:37 -04:00
David Roazen 40a3b5bfe2 Revert "Testing github auto-mirroring attempt #2; please ignore"
This reverts commit aacbe369446af8d7901820bf828ed15d72497005.
2012-10-10 15:28:50 -04:00
David Roazen fba6a084e4 Testing github auto-mirroring attempt #2; please ignore 2012-10-10 15:28:13 -04:00
David Roazen 267d1ff59c Revert "Testing the new github auto-mirroring; please ignore"
This reverts commit bd8b321132167f6f393f234ea0e93edcfd8701ff.
2012-10-10 15:07:48 -04:00
David Roazen 66ee3f230f Testing the new github auto-mirroring; please ignore 2012-10-10 15:06:50 -04:00
Mauricio Carneiro e9eaa33c0b adding some directories to gitignore 2012-10-10 13:26:13 -04:00
Mauricio Carneiro 29195cd3aa Removed the intellij files from the root and made an example package for new users. This allows users to start at the same page and then change it as they see fit without interfering with the repo (thanks guillermo!) 2012-10-10 13:25:38 -04:00
Mauricio Carneiro fdf29503fb removing annoying xml from IDEA configuration 2012-10-10 13:25:38 -04:00
Mauricio Carneiro e29bcab42e Updating Intellij enviroment and adding Scala 2012-10-10 13:25:38 -04:00
Mauricio Carneiro f085f5d46a Adding default intellij configuration files 2012-10-10 13:25:38 -04:00
Mauricio Carneiro 88297606f0 Adding intellij example configuration files 2012-10-10 13:20:30 -04:00
Guillermo del Angel c0b7d53170 a) Initial raw version of CMI BAM->VCF pipeline (most likely not working yet, but at least compiles and produces reasonable command lines), b) rename FASTQ->BAM script so name is more descriptive 2012-10-10 13:19:05 -04:00
Kristian Cibulskis 2311606de4 initial cancer pipeline with mutations and partial indel support 2012-10-10 13:19:04 -04:00
Guillermo del Angel 45aa59a31c BAM pipeline fixes: a) temp workaround for DEV-9: -nWayOut argument in IndelRealigner is broken, for now things will only really work in single sample mode, b) correct extension of RealignerTargetCreator output, previous extension caused an error 2012-10-10 13:19:04 -04:00
Guillermo del Angel b8c721e6ec Minor tweaks to CMIProcessing Pipeline: a) don't hard-code job mem limit to 4 G since it's too much for most AWS instances, leave it instead as input argument, b) minor doc cleanups 2012-10-10 13:19:04 -04:00
Mauricio Carneiro ca055d8804 Reimplementation of the BAM procesing pipeline using the metadata information file.
Pipeline runs end-to-end using example metadata  and has been tested only for cases where everything is ideal.
Next step is to bring this to the cloud, test all different scenario (multiple tumors, single ended, missing parameters etc).
Parallel next step is to add QC metrics.
2012-10-10 13:19:04 -04:00
Mauricio Carneiro 25ff934e5a New version of the pipeline starting from an ALIGNED bam going all the way to reducing using n-way out cleaning 2012-10-10 13:19:04 -04:00
Mauricio Carneiro 7d4adea183 Revised implementation of the RAWBAM => BAM pipeline
stripped out all the FQ pipeline and tumor/normal information.
2012-10-10 13:19:03 -04:00
Mauricio Carneiro e413b9fe51 First implementation of the CMI data processing pipeline, handling both germline and cancer BAM/FQ => BAM.
Not ready for prime time yet, need more work!
2012-10-10 13:19:03 -04:00
Mauricio Carneiro 0c17709223 First implementation of a generic 'bundled' Data Processing Pipeline for germline and cancer.
not ready for prime time yet!
2012-10-10 13:19:03 -04:00
Mauricio Carneiro 08b6d1559c Reverting the DPP to the original version, going to create a new simplified version for CMI in private. 2012-10-10 13:19:03 -04:00
Mauricio Carneiro f9095c7ab7 Generic input file name recognition (still need to implement support to FastQ, but it now can at least accept it) 2012-10-10 13:19:03 -04:00
Ryan Poplin 15b405d458 Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-10-10 10:47:40 -04:00
Ryan Poplin 2a9ee89c19 Turning on allele trimming for the haplotype caller. 2012-10-10 10:47:26 -04:00
Christopher Hartl 7381d5c243 Since this GRM now matches GCTA output for uncorrected intervals, implement and start proofing methods for LD-correction for genome partitioning. Very rudimentary tests just to solidify current position.
Wish I could do this in the GATK, but it has to run on bed files natively. Phooey.
2012-10-10 01:59:13 -04:00
Khalid Shakir f66284658d RetryMemoryLimit now works with Scatter/Gather. 2012-10-09 21:51:03 -04:00
Johan Dahlberg e9b9e2318c Fixed SortSam bug, for .done file
The *.bai.done file for the .bai file was written in the run directory instead of in the specified output directory.
Changing getName() to getAbsolutePath() fixes this.

Signed-off-by: Joel Thibault <thibault@broadinstitute.org>
2012-10-09 16:25:18 -04:00
Guillermo del Angel 9d7aa3cda8 a) Initial raw version of CMI BAM->VCF pipeline (most likely not working yet, but at least compiles and produces reasonable command lines), b) rename FASTQ->BAM script so name is more descriptive 2012-10-09 11:45:24 -04:00
Ryan Poplin b543bddbb7 Fixing merge conflicts related to the comment formatting in the BQSR. 2012-10-08 10:23:08 -04:00
Ryan Poplin b3cc04976f Fixing BQSR bug reported on the forum for reads that being with insertions. 2012-10-08 10:18:29 -04:00
Eric Banks be9fcba546 Don't allow triggering of polyploid consensus creation in regions where there is more than one het, as it just doesn't work properly. We could probably refactor at some point to make it work, but it's not worth doing that now (especially as it should be rare to have multiple proximal known hets in a single sample exome). 2012-10-07 16:32:48 -04:00