Guillermo del Angel
6f8e7692d4
Merge branch 'master' of ssh://gsa4.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-08-03 12:24:37 -04:00
Guillermo del Angel
9e25b209e0
First pass of implementation of Reduced Reads with HaplotypeCaller. Main changes: a) Active region: scale PL's by representative count to determine whether region is active. b) Scale per-read, per-haplotype likelihoods by read representative counts. A read representative count is (temporarily) defined as the average representative count over all bases in read, TBD whether this is good enough to avoid biases in GL's. c) DeBruijn assembler inserts kmers N times in graph, where N is min representative count of read over kmer span - TBD again whether this is the best approach. d) Bug fixes in FragmentUtils: logic to merge fragments was wrong in cases where there is discrepancy of overlaps between unclipped/soft clipped bases. Didn't affect things before but RR makes prevalence of hard-clipped bases in CIGARs more prevalent so this was exposed. e) Cache read representative counts along with read likelihoods associated with a Haplotype. Code can/should be cleaned up and unified with PairHMMIndelErrorModelCode, as well as refactored to support arbitrary ploidy in HaplotypeCaller
2012-08-03 12:24:23 -04:00
Ryan Poplin
8817fc70d1
Merged bug fix from Stable into Unstable
2012-08-03 10:45:01 -04:00
Ryan Poplin
f40d0a0a28
Updating VQSR to work with the MNP and symbolic variants that are coming out of the HaplotypeCaller. Integration tests change because of the MNPs in dbSNP.
2012-08-03 10:44:36 -04:00
Joel Thibault
51bd03cc36
Add RemoveProgramRecords annotation to ActiveRegionWalker
2012-08-03 09:54:16 -04:00
Joel Thibault
addbfd6437
Add a RemoveProgramRecords annotation
...
* Add the RemoveProgramRecords annotation to LocusWalker
2012-08-03 09:54:16 -04:00
Joel Thibault
524d7ea306
Choose whether to keep program records based on Walker
...
* Add keepProgramRecords argument
* Make removeProgramRecords / keepProgramRecords override default
2012-08-03 09:54:16 -04:00
Mark DePristo
e04989f76d
Bugfix for new PASS position in dictionary in BCF2
2012-08-03 09:42:21 -04:00
Mark DePristo
d22b8cf86b
VisualizeContextTree now loops over M, I, and D states generating trees and analyzes
2012-08-02 17:30:30 -04:00
Mark DePristo
fb5dabce18
Update BCF2 to include a minor version number so we can rev (and report errors) with BCF2
...
-- We are no likely to fail with an error when reading old BCF files, rather than just giving bad results
-- Added new class BCFVersion that consolidates all of the version management of BCF
2012-08-02 17:30:30 -04:00
Eric Banks
b4f4d86c77
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-08-02 11:49:03 -04:00
Ryan Poplin
3ece4c4993
Merged bug fix from Stable into Unstable
2012-08-02 11:41:36 -04:00
Ryan Poplin
6f7a236cfc
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/stable
2012-08-02 11:41:20 -04:00
Ryan Poplin
cb8bc18aeb
Fix for error in HaplotypeCaller. HC has a UG argument collection for the UG engine but some of those arguments aren't appropriate to set.
2012-08-02 11:41:06 -04:00
Eric Banks
e3f89fb054
Missing/malformed GATK report files are user errors
2012-08-02 11:33:21 -04:00
Eric Banks
cc01f844d4
Merged bug fix from Stable into Unstable
2012-08-02 11:25:28 -04:00
Eric Banks
0381fd7c83
Hmm, I thought I used the right md5s last time. Let's try again.
2012-08-02 11:25:10 -04:00
Mark DePristo
2f585b91be
Update ex2.vcf and .bcf test files to new spec
2012-08-01 17:10:35 -04:00
Mark DePristo
c3c3d18611
Update BCF2 to put PASS as offset 0 not at the end
...
-- Unfortunately this commit breaks backward compatibility with all existing BCF2 files...
2012-08-01 17:09:22 -04:00
Mark DePristo
25c773ef33
Adding VE integration test file to private/testdata
2012-08-01 15:45:12 -04:00
Mark DePristo
ccac77d888
Bugfix for incorrect allele counting in IndelSummary
...
-- Previous version would count all alt alleles as present in a sample, even if only 1 were present, because of the way VariantEval subsetted VCs
-- Updated code for subsetting VCs by sample to be clearer about how it handles rederiving alleles
-- Update a few pieces of code to get previous correct behavior
-- Updated a few MD5s as now ref calls at sites in dbSNP are counted as having a comp sites, and therefore show up in known sites when Novelty strat is on (which I think is correct)
-- Walkers that used old subsetting function with true are now using clearer version that does rederive alleles by default
2012-08-01 15:45:12 -04:00
Joel Thibault
2b25df3d53
Add removeProgramRecords argument
...
* Add unit test for the removeProgramRecords
2012-08-01 15:33:05 -04:00
Ryan Poplin
d53105668b
Merged bug fix from Stable into Unstable
2012-08-01 14:53:06 -04:00
Ryan Poplin
fabca66d09
Another fix to VQSR docs
2012-08-01 14:52:49 -04:00
Ryan Poplin
2be29ebd22
Merged bug fix from Stable into Unstable
2012-08-01 14:35:30 -04:00
Ryan Poplin
4093909a56
Updating VQSR docs. Removing references to old best practices pages.
2012-08-01 14:30:24 -04:00
Eric Banks
52b93cab62
Merged bug fix from Stable into Unstable
2012-08-01 13:17:36 -04:00
Eric Banks
22bf052828
Fixing BQSR GATK docs
2012-08-01 13:17:16 -04:00
Guillermo del Angel
9ac72dbd4d
Merged bug fix from Stable into Unstable
2012-08-01 10:56:45 -04:00
Guillermo del Angel
84cd23f891
Merge branch 'master' of ssh://gsa4.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/stable
2012-08-01 10:55:37 -04:00
Guillermo del Angel
01265f78e6
Add sanity check and possible bug fix for forum user: if haplotypes cannot be created from given alleles when genotyping indels (e.g. too close to contig boundary, etc.) in pool mode, empty allele list, signifying site can't be genotyped
2012-08-01 10:50:00 -04:00
Eric Banks
459832ee16
Fixed bug in FastaAlternateReferenceMaker when input VCF has overlapping deletions as reported a while back on GS
2012-08-01 10:45:04 -04:00
Eric Banks
a4a41458ef
Update docs of FastaAlternateReferenceMaker as promised in older GS thread
2012-08-01 10:33:41 -04:00
Eric Banks
687df2341d
Merged bug fix from Stable into Unstable
2012-08-01 10:27:15 -04:00
Eric Banks
05bf6e3726
Updating md5s in pipeline tests so that they finally pass
2012-08-01 10:27:00 -04:00
Eric Banks
38e5419b11
Merged bug fix from Stable into Unstable
2012-08-01 09:50:31 -04:00
Eric Banks
56f8afab97
Requested by Geraldine: adding a utility to register deprecated walkers (and the major version of the first release since they were removed) so that the User Error printed out for e.g. CountCovariates now states: Walker CountCovariates is no longer available in the GATK; it has been deprecated since version 2.0.
2012-08-01 09:50:00 -04:00
Eric Banks
7cf4b63d76
Disabling indel quals in BaseRecalibrator as it should be, not PrintReads.
2012-08-01 09:23:04 -04:00
Guillermo del Angel
0528337467
Merge branch 'master' of ssh://gsa4.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-07-31 18:17:50 -04:00
Guillermo del Angel
4a23f3cd11
Simple cleanup of pool caller code - since usage is much more general than just calling pools, AF calculation models and GL calculation models are renamed from Pool -> GeneralPloidy. Also, don't have users specify special arguments for -glm and -pnrm. Instead, when running UG with sample ploidy != 2, the correct general ploidy modules are automatically detected and loaded. -glm now reverts to old [SNP|INDEL|BOTH] usage
2012-07-31 16:34:20 -04:00
Eric Banks
6cb10cef96
Fixed older GS reported bug. Actually, the problem really lies in Picard (can't set max records in RAM without it throwing an exception, reported on their JIRA) so I just masked out the problem by removing this never-used argument from this rarely-used tool.
2012-07-31 16:00:36 -04:00
Eric Banks
ab53d73459
Quick fix to user error catching
2012-07-31 15:50:32 -04:00
Eric Banks
10111450aa
Fixed AlignmentUtils bug for handling Ns in the CIGAR string. Added a UG integration test that calls a BAM with such reads (provided by a user on GetSatisfaction).
2012-07-31 15:37:22 -04:00
Eric Banks
fff78ab462
Archiving VQSRv3
2012-07-31 14:34:42 -04:00
Ryan Poplin
4f10386bd4
Del/Ins ratio should really be Ins/Del ratio on the summary page of the variant QC report.
2012-07-31 14:23:36 -04:00
Mark DePristo
f7133ffc31
Cleanup syntax errors from BQSR reorganization
2012-07-31 08:11:05 -04:00
Mark DePristo
762a3d9b50
Move BQSR.R to utils/recalibration in R
2012-07-31 08:11:04 -04:00
Mark DePristo
dad9bb1192
Changes order of writing BaseRecalibrator results so that if R blows up you still get a meaningful tree
2012-07-31 08:11:04 -04:00
Mark DePristo
0c4e729e13
Working version of adaptive context calculations
...
-- Uses chi2 test for independences to determine if subcontext is worth representing. Give excellent visual results
-- Writes out analysis output file producing excellent results in R
-- Trivial reformatting of MathUtils
2012-07-31 08:11:04 -04:00
Mark DePristo
93640b382e
Preliminary version of adaptive context covariate algorithm
...
-- Works according to visual inspection of output tree
2012-07-31 08:11:04 -04:00