Ryan Poplin
ac033ce41a
Intermediate commit of new bubble assembly graph traversal algorithm for the HaplotypeCaller. Adding functionality for a path from an assembly graph to calculate its own cigar string from each of the bubbles instead of doing a massive Smith-Waterman alignment between the path's full base composition and the reference.
2013-01-31 11:32:19 -05:00
Ryan Poplin
85dabd321f
Adding unit tests for hierarchicalBayesianQualityEstimate function
2013-01-30 13:26:07 -05:00
Ryan Poplin
07fe3dd1ef
Merge branch 'master' of github.com:broadinstitute/gsa-unstable
2013-01-30 13:19:24 -05:00
David Roazen
9985f82a7a
Move BaseUtils back to the GATK by request, along with associated utility methods
2013-01-30 13:09:44 -05:00
Ryan Poplin
2967776458
The Empirical quality column in the recalibration report can't be compared in the BQSRGatherer because the value is calculated using the Bayesian estimate with different priors. This value should never be used from a recalibration report anyway except during plotting.
2013-01-30 12:28:14 -05:00
Ryan Poplin
59311aeea2
Getting back null values from the tables is perfectly reasonable if those covariates don't appear in your table. Need to handle them gracefully.
2013-01-30 10:06:14 -05:00
Ryan Poplin
e7d7d70247
Merge branch 'master' of github.com:broadinstitute/gsa-unstable
2013-01-30 10:01:06 -05:00
Mauricio Carneiro
29fd536c28
Updating licenses manually
...
Please check that your commit hook is properly pointing at ../../private/shell/pre-commit
Conflicts:
public/java/test/org/broadinstitute/variant/VariantBaseTest.java
2013-01-29 17:27:53 -05:00
David Roazen
a536e1da84
Move some VCF/VariantContext methods back to the GATK based on feedback
...
-Moved some of the more specialized / complex VariantContext and VCF utility
methods back to the GATK.
-Due to this re-shuffling, was able to return things like the Pair class back
to the GATK as well.
2013-01-29 16:56:55 -05:00
Ryan Poplin
cba89e98ad
Refactoring the Bayesian empirical quality estimates to be in a single unit-testable function.
2013-01-29 15:50:46 -05:00
Guillermo del Angel
1d5b29e764
Unit tests for repeat covariates: generate 100 random reads consisting of tandem repeat units of random content and size, and check that covariates match expected values at all positions in reads.
...
Fixed corner case where value of covariate at border between 2 tandem repeats of different length/content wasn't consistent
2013-01-29 15:23:02 -05:00
Guillermo del Angel
c11197e361
Refactored repeat covariates to eliminate duplicated code - now all inherit from basic RepeatCovariate abstract class. Comprehensive unit tests coming...
2013-01-29 10:10:24 -05:00
Ryan Poplin
35543b9cba
updating BQSR integration test values for the PR half of BQSR.
2013-01-29 09:47:57 -05:00
Ryan Poplin
bf25196a0b
Merge branch 'master' of github.com:broadinstitute/gsa-unstable
2013-01-28 22:33:13 -05:00
Ryan Poplin
1f254d29df
Don't set the empirical quality when reading in the recal table because then we won't be using the new quality estimates for the prior since the value is cached.
2013-01-28 22:16:43 -05:00
Guillermo del Angel
5995f01a01
Big intermediate commit (mostly so that I don't have to go again through merge/rebase hell) in expanding BQSR capabilities. Far from done yet:
...
a) Add option to stratify CalibrateGenotypeLikelihoods by repeat - will add integration test in next push.
b) Simulator to produce BAM files with given error profile - for now only given SNP/indel error rate can be given. A bad context can be specified and if such context is present then error rate is increased to given value.
c) Rewrote RepeatLength covariate to do the right thing - not fully working yet, work in progress.
d) Additional experimental covariates to log repeat unit and combined repeat unit+length. Needs code refactoring/testing
2013-01-28 19:55:46 -05:00
Ryan Poplin
d665a8ba0c
The Bayesian calculation of Qemp in the BQSR is now hierarchical. This fixes issues in which the covariate bins were very sparse and the prior estimate being used was the original quality score. This resulted in large correction factors for each covariate which breaks the equation. There is also now a new option, qlobalQScorePrior, which can be used to ignore the given (very high) quality scores and instead use this value as the prior.
2013-01-28 15:56:33 -05:00
Ryan Poplin
aab160372a
No need to sort the BQSR tables by default.
2013-01-28 11:26:01 -05:00
David Roazen
f63f27aa13
org.broadinstitute.variant refactor, part 2
...
-removed sting dependencies from test classes
-removed org.apache.log4j dependency
-misc cleanup
2013-01-28 09:03:46 -05:00
Mauricio Carneiro
2a4ccfe6fd
Updated all JAVA file licenses accordingly
...
GSATDG-5
2013-01-10 17:06:41 -05:00
Eric Banks
264cc9e78d
Resolve protected->public dependencies for BQSR by wrapping the BQSR-specific arguments in a new class.
...
Instead of the GATK Engine creating a new BaseRecalibrator (not clean), it just keeps track of the arguments (clean).
There are still some dependency issues, but it looks like they are related to Ami's code. Need to look into it further.
2013-01-08 16:23:29 -05:00
Eric Banks
f0bd1b5ae5
Okay, all public->protected dependencies are gone except for the BQSR arguments. I'll need to think through this but should be able to make that work too.
2013-01-08 15:46:32 -05:00
Eric Banks
47d030a52d
Oops, move the covariates over too
2013-01-07 15:47:25 -05:00
Eric Banks
35699a8376
Move bqsr utils to protected
2013-01-07 15:41:21 -05:00