Mark DePristo
5a4e2a5fa4
Test code to ensure that pNonRef is being computed correctly for at least 1 genotype, bi and tri allelic
2012-10-15 07:53:55 -04:00
Mark DePristo
ee2f12e2ac
Simpler naming convention for AlleleFrequencyCalculation => AFCalc
2012-10-15 07:53:55 -04:00
Mark DePristo
cf3f9d6ee8
Reorganize and cleanup AFCalculations
...
-- Now contained in a package called afcalc
-- Extracted standard alone classes from private static classes in ExactAF
-- Most fields are now private, with accessors
-- Overall cleaner organization now
2012-10-15 07:53:55 -04:00
Mark DePristo
13211231c7
Restructure and cleanup ExactAFCalculations
...
-- Now there's no duplication between exact old and constrained models. The behavior is controlled by an overloaded abstract function
-- No more static function to access the linear exact model -- you have to create the surrounding class. Updated code in the system
-- Everything passes unit tests
2012-10-15 07:53:54 -04:00
Mark DePristo
f800f3fb88
Optimized diploid exact AF calculation uses maxACs to stop the calculation by maxAC by allele
...
-- Added unit tests to ensure the approximation isn't so far from our reference implementation (DiploidExactAFCalculation)
2012-10-15 07:53:54 -04:00
Mark DePristo
efad215edb
Greedy version of function to compute the max achievable AC for each alt allele
...
-- walks over the genotypes in VC, and computes for each alt allele the maximum AC we need to consider in that alt allele dimension. Does the calculation based on the PLs in each genotype g, choosing to update the max AC for the alt alleles corresponding to that PL. Only takes the first lowest PL, if there are multiple genotype configurations with the same PL value. It takes values in the order of the alt alleles.
2012-10-15 07:53:54 -04:00
Mark DePristo
7666a58773
Function to compute the max achievable AC for each alt allele
...
-- Additional minor cleanup of ExactAFCalculation
2012-10-15 07:53:53 -04:00
Ryan Poplin
2a9ee89c19
Turning on allele trimming for the haplotype caller.
2012-10-10 10:47:26 -04:00
Eric Banks
e8a6460a33
After merging with Yossi's fix I can confirm that the AD is fixed when going through the HC too. Added similar fixes to DP and FS annotations too.
2012-10-05 16:37:42 -04:00
Yossi Farjoun
ef90beb827
- forgot to use git rm to delete a file from git. Now that VCF is deleted.
...
- uncommented a HC test that I missed.
2012-10-05 16:14:51 -04:00
Yossi Farjoun
d419a33ed1
* Added an integration test for AD annotation in the Haplotype caller.
...
* Corrected FS Anotation for UG as for AD.
* HC still does not annotate ReducedReads correctly (for FS nor AD)
2012-10-05 15:23:59 -04:00
Eric Banks
f840d9edbd
HC test should continue using 3 alt alleles for indels
2012-10-05 02:03:34 -04:00
Eric Banks
e13e61673b
Merge branch 'master' of ssh://gsa2/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-10-04 10:54:23 -04:00
Eric Banks
0c46845c92
Refactored the BaseCounts classes so that they are safer and allow for calculations on the most probable base (which is not necessarily the most common base).
2012-10-04 10:37:11 -04:00
Mark DePristo
b6e20e083a
Copied DiploidExactAFCalc to placeholder OptimizedDiploidExact
...
-- Will be removed. Only commiting now to fix public -> private dependency
2012-10-03 20:16:38 -07:00
Mark DePristo
3e01a76590
Clean up AlleleFrequencyCalculation classes
...
-- Added a true base class that only does truly common tasks (like manage call logging)
-- This base class provides the only public method (getLog10PNonRef) and calls into a protected compute function that's abstract
-- Split ExactAF into superclass ExactAF with common data structures and two subclasses: DiploidExact and GeneralPloidyExact
-- Added an abstract reduceScope function that manages the simplification of the input VariantContext in the case where there are too many alleles or other constraints require us to only attempt a smaller computation
-- All unit tests pass
2012-10-03 19:55:11 -07:00
Eric Banks
dcd31e654d
Turn off RR tests while I debug
2012-09-21 17:26:00 -04:00
Mauricio Carneiro
2c3dc291c0
Added positive/negative strand to the synthetic reads
2012-09-21 10:00:48 -04:00
Mauricio Carneiro
ee31a54a03
Merged bug fix from Stable into Unstable
2012-09-19 16:09:45 -04:00
Mauricio Carneiro
7cf9911924
Fixed ReduceReads bug where variant regions were missing.
...
This affected variant regions with more than 100 reads and less than 250 reads. Only bams reduced with GATK v2 and 2.1 were affected.
2012-09-19 16:09:08 -04:00
Ryan Poplin
26e35e5ee2
updating BQSR integration tests
2012-09-19 14:10:34 -04:00
Ryan Poplin
b99099f05c
The BaseRecalibrator and DelocalizedBaseRecalibrator have gotten out of sync. Fixing.
2012-09-19 12:30:26 -04:00
Guillermo del Angel
bebd5c14b8
Update general ploidy md5's due to bad merge of md5's in previous commit, and new shortened interval definition for EMIT_ALL_CONFIDENT_SITES was buggy
2012-09-18 20:12:15 -04:00
Guillermo del Angel
ca010160a9
Merge fix
2012-09-14 14:05:21 -04:00
Guillermo del Angel
6b37350bc0
Two hairy bugs in pool caller: a) Site error model wasn't counting errors in insertions correctly - Alleles passed in had padded ref byte, but event base in PileupElement doesn't have it. As a result, mismatch rate was grossly overestimated with insertions and we missed several calls we should have made. Integration test reflects changes. b) Adding a ref GL to the exact model is correct mathematically but AFResult wasn't filled properly. As a result, QUAL was junk in pure ref sites, and in all other sites the last ref GL introduced wasn't properly updating Pr(AF>0). c) Added integration test that covers -out_mode EMIT_ALL_CONFIDENT_SITES. Not fully sure if the math is 100% correct (for both diploid and generalized case) but at least now diploid and non-diploid cases behave similarly. md5 of this new test will fail since it's taking me a long time to run so I'll update from Bamboo output shortly
2012-09-14 13:13:22 -04:00
Eric Banks
0206e09a6a
Merge branch 'master' of ssh://gsa2/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-09-12 15:18:27 -04:00
Eric Banks
d94d0d15c2
Complete overhaul of previous commits to make it all work with scatter-gather. Now tracks output files correctly and can print to stdout.
2012-09-12 15:15:40 -04:00
Ryan Poplin
849a2b8839
Adding HC integration test for _structural_ insertions and deletions.
2012-09-12 12:23:00 -04:00
Eric Banks
994a4ff387
Track all outputs from BQSR (.table, .csv., and .pdf) as @Output arguments. Updated integration tests because we no longer have command-line options not to generate plots (now just don't provide a pdf) or to keep the intermediate csv (now, just provide a filename on the command-line). This is currently busted because we can't access the original filenames from the Engine's storage/stub system and therefore cannot call out to the Rscript with the executor (which requires filename strings).
2012-09-12 11:24:53 -04:00
Mark DePristo
bfbf1686cd
Fixed nasty bug with defaulting to diploid no-call genotypes
...
-- For the pooled caller we were writing diploid no-calls even when other samples were haploid. Changed maxPloidy function to return a defaultPloidy, rather than 0, in the case where all samples are missing.
-- VCF/BCF Writers now create missing genotypes with the ploidy of other samples, or 2 if none are available at all.
-- Updating integration tests for general ploidy, as previously we wrote ./. even when other calls were 0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/1/1/1/1/1, but now we write ./././././././././././././././././././././././. (ugly but correct)
2012-09-12 07:08:03 -04:00
Guillermo del Angel
13831106d5
Fix GSA-535: storing likelihoods in allele map was busted when running HaplotypeCaller, only the last likelihood of a haplotype was being stored, as opposed to the max likelihood of all haplotypes mapping to an allele
2012-09-11 11:01:26 -04:00
Mark DePristo
1b0ce511a6
Updating BQSR tests due to my change to reset BQSR calibration data
2012-08-31 19:51:09 -04:00
Ryan Poplin
57d997f06f
Fixing bug from when FragmentUtils merging function moved over to the soft clipped start instead of the unclipped start
2012-08-30 10:10:43 -04:00
Ryan Poplin
35baf0b155
This along with Mauricio's previous commit (thanks!) fixes GSA-522. There are no longer any modifications to reads in the map calls of ActiveRegion walkers. Added the bam which identified this error as a new integration test.
2012-08-30 09:07:36 -04:00
Mark DePristo
0f4acaae1b
Update MD5s with new FS score
2012-08-28 08:06:47 -04:00
Ryan Poplin
fe3069b278
Merged bug fix from Stable into Unstable
2012-08-22 14:40:34 -04:00
Ryan Poplin
e5cfdb4811
Bug fix for popular _Duplicate allele added to VariantContext_ error reported on the forum. It seems to be due to lower case bases in the reference being treated as reference mismatches. We would try to turn these mismatches into SNP events, for example c/C. We now uppercase the result from IndexedFastaSequenceFile.getSubsequenceAt()
2012-08-22 14:39:35 -04:00
Ryan Poplin
63213e8eb5
Expanding the HaplotypeCaller integration tests to cover a wider range of data
2012-08-22 14:18:44 -04:00
Guillermo del Angel
901f47d8af
Final step (for now) in VA refactoring: update MD5's because, a) since it's not guaranteed that we'll iterate through reads/pileups in the same order, the rank sum dithering will change annotations, b) FS uses new generic threshold to distinguish uninformative reads (it used to use ad-hoc thresholds), c) AD definition changed and throws away uninformative reads, d) shortened general ploidy integration tests for quicker debugging. May have missed some MD5's in the update so there may be lingering test failures still
2012-08-22 11:38:51 -04:00
Eric Banks
286b658fab
Re-enabling parallelism in the BaseRecalibrator now that the release is out.
2012-08-20 21:25:14 -04:00
Eric Banks
154f65e0de
Temporarily disabling multi-threaded usage of BaseRecalibrator for performance reasons.
2012-08-20 12:43:17 -04:00
Eric Banks
2df04dc48a
Fix for performance problem in GGA mode related to previous --regenotype commit. Instead of trying to hack around the determination of the calculation model when it's not needed, just simply overload the calculateGenotypes() method to add one that does simple genotyping. Re-enabling the Pool Caller integration tests.
2012-08-16 13:05:17 -04:00
Eric Banks
9035b554fb
Adding tests for the --solid_nocall_strategy argument
2012-08-15 23:13:24 -04:00
Mark DePristo
3556c36668
Disable general ploidy integration tests because they are running forever
2012-08-15 21:13:16 -04:00
Mark DePristo
243af0adb1
Expanded the BQSR reporting script
...
-- Includes header page
-- Table of arguments (Arguments)
-- Summary of counts (RecalData0)
-- Summary of counts by qual (RecalData1)
-- Fixed bug in output that resulted in covariates list always being null (updated md5s accordingly)
-- BQSR.R loads all relevant libaries now, include gplots, grid, and gsalib to run correctly
2012-08-12 13:45:14 -04:00
Ryan Poplin
2a113977a9
Resolving merge conflicts with the new MD5s
2012-08-10 11:47:00 -04:00
Ryan Poplin
5f82ffd5d8
Adding LowQual filter to the output of the HaplotypeCaller.
2012-08-10 11:25:14 -04:00
Mauricio Carneiro
58420098ac
Merged bug fix from Stable into Unstable
2012-08-09 13:02:23 -04:00
Mauricio Carneiro
c6132ebe26
Fixed divide by zero bug when downsampler goes over regions where reads are all filtered out. Added Guillermo's bug report as an integration test
2012-08-09 13:02:11 -04:00
Guillermo del Angel
5be7e0621d
Merge branch 'master' of ssh://gsa4.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-08-09 09:58:34 -04:00