Mauricio Carneiro
f085f5d46a
Adding default intellij configuration files
2012-10-10 13:25:38 -04:00
Mauricio Carneiro
88297606f0
Adding intellij example configuration files
2012-10-10 13:20:30 -04:00
Guillermo del Angel
c0b7d53170
a) Initial raw version of CMI BAM->VCF pipeline (most likely not working yet, but at least compiles and produces reasonable command lines), b) rename FASTQ->BAM script so name is more descriptive
2012-10-10 13:19:05 -04:00
Kristian Cibulskis
2311606de4
initial cancer pipeline with mutations and partial indel support
2012-10-10 13:19:04 -04:00
Guillermo del Angel
45aa59a31c
BAM pipeline fixes: a) temp workaround for DEV-9: -nWayOut argument in IndelRealigner is broken, for now things will only really work in single sample mode, b) correct extension of RealignerTargetCreator output, previous extension caused an error
2012-10-10 13:19:04 -04:00
Guillermo del Angel
b8c721e6ec
Minor tweaks to CMIProcessing Pipeline: a) don't hard-code job mem limit to 4 G since it's too much for most AWS instances, leave it instead as input argument, b) minor doc cleanups
2012-10-10 13:19:04 -04:00
Mauricio Carneiro
ca055d8804
Reimplementation of the BAM procesing pipeline using the metadata information file.
...
Pipeline runs end-to-end using example metadata and has been tested only for cases where everything is ideal.
Next step is to bring this to the cloud, test all different scenario (multiple tumors, single ended, missing parameters etc).
Parallel next step is to add QC metrics.
2012-10-10 13:19:04 -04:00
Mauricio Carneiro
25ff934e5a
New version of the pipeline starting from an ALIGNED bam going all the way to reducing using n-way out cleaning
2012-10-10 13:19:04 -04:00
Mauricio Carneiro
7d4adea183
Revised implementation of the RAWBAM => BAM pipeline
...
stripped out all the FQ pipeline and tumor/normal information.
2012-10-10 13:19:03 -04:00
Mauricio Carneiro
e413b9fe51
First implementation of the CMI data processing pipeline, handling both germline and cancer BAM/FQ => BAM.
...
Not ready for prime time yet, need more work!
2012-10-10 13:19:03 -04:00
Mauricio Carneiro
0c17709223
First implementation of a generic 'bundled' Data Processing Pipeline for germline and cancer.
...
not ready for prime time yet!
2012-10-10 13:19:03 -04:00
Mauricio Carneiro
08b6d1559c
Reverting the DPP to the original version, going to create a new simplified version for CMI in private.
2012-10-10 13:19:03 -04:00
Mauricio Carneiro
f9095c7ab7
Generic input file name recognition (still need to implement support to FastQ, but it now can at least accept it)
2012-10-10 13:19:03 -04:00
Ryan Poplin
15b405d458
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-10-10 10:47:40 -04:00
Ryan Poplin
2a9ee89c19
Turning on allele trimming for the haplotype caller.
2012-10-10 10:47:26 -04:00
Christopher Hartl
7381d5c243
Since this GRM now matches GCTA output for uncorrected intervals, implement and start proofing methods for LD-correction for genome partitioning. Very rudimentary tests just to solidify current position.
...
Wish I could do this in the GATK, but it has to run on bed files natively. Phooey.
2012-10-10 01:59:13 -04:00
Khalid Shakir
f66284658d
RetryMemoryLimit now works with Scatter/Gather.
2012-10-09 21:51:03 -04:00
Johan Dahlberg
e9b9e2318c
Fixed SortSam bug, for .done file
...
The *.bai.done file for the .bai file was written in the run directory instead of in the specified output directory.
Changing getName() to getAbsolutePath() fixes this.
Signed-off-by: Joel Thibault <thibault@broadinstitute.org>
2012-10-09 16:25:18 -04:00
Guillermo del Angel
9d7aa3cda8
a) Initial raw version of CMI BAM->VCF pipeline (most likely not working yet, but at least compiles and produces reasonable command lines), b) rename FASTQ->BAM script so name is more descriptive
2012-10-09 11:45:24 -04:00
Ryan Poplin
b543bddbb7
Fixing merge conflicts related to the comment formatting in the BQSR.
2012-10-08 10:23:08 -04:00
Ryan Poplin
b3cc04976f
Fixing BQSR bug reported on the forum for reads that being with insertions.
2012-10-08 10:18:29 -04:00
Eric Banks
be9fcba546
Don't allow triggering of polyploid consensus creation in regions where there is more than one het, as it just doesn't work properly. We could probably refactor at some point to make it work, but it's not worth doing that now (especially as it should be rare to have multiple proximal known hets in a single sample exome).
2012-10-07 16:32:48 -04:00
Eric Banks
08ac80c080
RR bug: when the last base in the window around the polyploid consensus is filtered (low quality), the filtered consensus is not flushed and subsequent filtered bases (but importantly not contiguous to this one) are just added to this position. In other words, bases were being added to the wrong genomic positions. Fixed.
2012-10-07 10:52:01 -04:00
Eric Banks
36a26a7da6
md5s failed because I forgot to add --no_cmdline_in_header so it is different depending on where you run from. Fixed.
2012-10-07 08:35:55 -04:00
Eric Banks
a5aaa14aaa
Fix for GSA-601: Indels dropped during liftover. This was a true bug that was an effect of the switch over to the non-null representation of alleles in the VariantContext. Unfortunately, this tool didn't have integration tests - but it does now.
2012-10-07 01:19:52 -04:00
Eric Banks
82e40340c0
Use StringBuilder over StringBuffer
2012-10-07 00:02:15 -04:00
Eric Banks
5d6aad67e2
Fix for bug reported on forums: VariantsToTable does not handle lists and nested arrays correctly. Added an integration test to cover printing of PLs.
2012-10-07 00:01:27 -04:00
Eric Banks
e7798ddd2a
Fix for JIRA GSA-598: AD field not handled properly by CombineVariants. It was also not handled by SelectVariants either. We now strip the AD field out whenever combining/selecting makes it invalid due to a changing of the number of ALT alleles.
2012-10-06 23:02:36 -04:00
Eric Banks
bfc551f612
Fix for GSA-589: SelectVariants with -number gives biased results. The implementation was not good and it's not worth keeping this busted code around given that we have a working implementation of a fractional random sampling already in place, so I removed it.
2012-10-06 22:39:49 -04:00
Eric Banks
e8a6460a33
After merging with Yossi's fix I can confirm that the AD is fixed when going through the HC too. Added similar fixes to DP and FS annotations too.
2012-10-05 16:37:42 -04:00
Eric Banks
b7639d7ceb
Merge branch 'master' of ssh://gsa2/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-10-05 16:21:17 -04:00
Eric Banks
52326942cf
Merge branch 'master' of ssh://gsa2/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-10-05 16:15:07 -04:00
Eric Banks
04853252a0
Possible fix for reduced reads coming from the HaplotypeCaller in the AD
2012-10-05 16:15:04 -04:00
Yossi Farjoun
ef90beb827
- forgot to use git rm to delete a file from git. Now that VCF is deleted.
...
- uncommented a HC test that I missed.
2012-10-05 16:14:51 -04:00
Yossi Farjoun
6874a5ce76
This bam and bai are needed for testing the ADAnnotation tests (both UG and HC)
...
The vcf file was mistakenly added previously, now removed.
2012-10-05 16:10:41 -04:00
Yossi Farjoun
d419a33ed1
* Added an integration test for AD annotation in the Haplotype caller.
...
* Corrected FS Anotation for UG as for AD.
* HC still does not annotate ReducedReads correctly (for FS nor AD)
2012-10-05 15:23:59 -04:00
Yossi Farjoun
dc4dcb4140
fixed AD annotation for a ReducedReads BAM file. Added an integration test for this case with a new reduced BAM in private/testdata
2012-10-05 14:20:07 -04:00
Eric Banks
f840d9edbd
HC test should continue using 3 alt alleles for indels
2012-10-05 02:03:34 -04:00
Eric Banks
c66ef17cd0
Add a separate max alt alleles argument for indels that defaults to 2 instead of 3. PLEASE TAKE NOTE.
2012-10-04 13:52:14 -04:00
Christopher Hartl
beaa1ac07e
Turns out GCTA replaces a missing variant with the mean dosage (2*frequency), but then normalizes the genetic distance by the number of non-missing genotype pairs. An odd thing to do, but with this the GRMs are confluent (up to a small tolerance)
2012-10-04 13:29:38 -04:00
Christopher Hartl
01dcdf2830
Waypoint: GRM is identical with GCTA if no genotypes are missing. Not sure how GCTA is treating these, but it's definitely not strictly excluding them.
2012-10-04 12:53:03 -04:00
Scott Frazer
3ffba77656
Revert "initial cancer pipeline with mutations and partial indel support"
...
This reverts commit 4a2e5b1fcc3ad53dbb26d43eed1220b0257e9901.
2012-10-04 11:37:54 -04:00
Kristian Cibulskis
0afde9906a
initial cancer pipeline with mutations and partial indel support
2012-10-04 11:37:11 -04:00
Eric Banks
e13e61673b
Merge branch 'master' of ssh://gsa2/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-10-04 10:54:23 -04:00
Guillermo del Angel
49db96c8ad
BAM pipeline fixes: a) temp workaround for DEV-9: -nWayOut argument in IndelRealigner is broken, for now things will only really work in single sample mode, b) correct extension of RealignerTargetCreator output, previous extension caused an error
2012-10-04 10:53:13 -04:00
Eric Banks
dfddc4bb0e
Protect against cases where there are counts but no quals
2012-10-04 10:52:30 -04:00
Eric Banks
0c46845c92
Refactored the BaseCounts classes so that they are safer and allow for calculations on the most probable base (which is not necessarily the most common base).
2012-10-04 10:37:11 -04:00
Mark DePristo
b6e20e083a
Copied DiploidExactAFCalc to placeholder OptimizedDiploidExact
...
-- Will be removed. Only commiting now to fix public -> private dependency
2012-10-03 20:16:38 -07:00
Mark DePristo
51cafa73e6
Removing public -> private dependency
2012-10-03 20:05:03 -07:00
Mark DePristo
f6a2ca6e7f
Fixes / TODOs for meaningful results with AFCalculationResult
...
-- Right now the state of the AFCaclulationResult can be corrupt (ie, log10 likelihoods can be -Infinity). Forced me to disable reasonable contracts. Needs to be thought through
-- exactCallsLog should be optional
-- Update UG integration tests as the calculation of the normalized posteriors is done in a marginally different way so the output is rounded slightly differently.
2012-10-03 19:55:12 -07:00