Eric Banks
08ac80c080
RR bug: when the last base in the window around the polyploid consensus is filtered (low quality), the filtered consensus is not flushed and subsequent filtered bases (but importantly not contiguous to this one) are just added to this position. In other words, bases were being added to the wrong genomic positions. Fixed.
2012-10-07 10:52:01 -04:00
Eric Banks
e8a6460a33
After merging with Yossi's fix I can confirm that the AD is fixed when going through the HC too. Added similar fixes to DP and FS annotations too.
2012-10-05 16:37:42 -04:00
Yossi Farjoun
ef90beb827
- forgot to use git rm to delete a file from git. Now that VCF is deleted.
...
- uncommented a HC test that I missed.
2012-10-05 16:14:51 -04:00
Yossi Farjoun
d419a33ed1
* Added an integration test for AD annotation in the Haplotype caller.
...
* Corrected FS Anotation for UG as for AD.
* HC still does not annotate ReducedReads correctly (for FS nor AD)
2012-10-05 15:23:59 -04:00
Eric Banks
f840d9edbd
HC test should continue using 3 alt alleles for indels
2012-10-05 02:03:34 -04:00
Eric Banks
c66ef17cd0
Add a separate max alt alleles argument for indels that defaults to 2 instead of 3. PLEASE TAKE NOTE.
2012-10-04 13:52:14 -04:00
Eric Banks
e13e61673b
Merge branch 'master' of ssh://gsa2/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-10-04 10:54:23 -04:00
Eric Banks
dfddc4bb0e
Protect against cases where there are counts but no quals
2012-10-04 10:52:30 -04:00
Eric Banks
0c46845c92
Refactored the BaseCounts classes so that they are safer and allow for calculations on the most probable base (which is not necessarily the most common base).
2012-10-04 10:37:11 -04:00
Mark DePristo
b6e20e083a
Copied DiploidExactAFCalc to placeholder OptimizedDiploidExact
...
-- Will be removed. Only commiting now to fix public -> private dependency
2012-10-03 20:16:38 -07:00
Mark DePristo
51cafa73e6
Removing public -> private dependency
2012-10-03 20:05:03 -07:00
Mark DePristo
f8ef4332de
Count the number of evaluations in AFResult; expand unit tests
...
-- AFResult now tracks the number of evaluations (turns through the model calculation) so we can now compute the scaling of exact model itself as a function of n samples
-- Added unittests for priors (flat and human)
-- Discovered nasty general ploidy bug (enabled with Guillermo_FIXME)
2012-10-03 19:55:11 -07:00
Mark DePristo
de941ddbbe
Cleanup Exact model, better unit tests
...
-- Added combinatorial unit tests for both Diploid and General (in diploid-case) for 2 and 3 alleles in all combinations of sample types (i.e., AA, AB, BB and equiv. for tri-allelic). More assert statements to ensure quality of the result.
-- Added docs (DOCUMENT YOUR CODE!) to AlleleFrequencyCalculationResult, with proper input error handling and contracts. Made mutation functions all protected
-- No longer need to call reset on your AlleleFrequencyCalculationResult -- it'd done for you in the calculation function. reset is a protected method now, so it's all cleaner and nicer this way
-- TODO still -- need to add edge-case tests for non-informative samples (0,0,0), for the impact of priors, and I need to add some way to test the result of the pNonRef
2012-10-03 19:55:11 -07:00
Mark DePristo
3e01a76590
Clean up AlleleFrequencyCalculation classes
...
-- Added a true base class that only does truly common tasks (like manage call logging)
-- This base class provides the only public method (getLog10PNonRef) and calls into a protected compute function that's abstract
-- Split ExactAF into superclass ExactAF with common data structures and two subclasses: DiploidExact and GeneralPloidyExact
-- Added an abstract reduceScope function that manages the simplification of the input VariantContext in the case where there are too many alleles or other constraints require us to only attempt a smaller computation
-- All unit tests pass
2012-10-03 19:55:11 -07:00
Mark DePristo
1c52db4cdd
Add exactCallsLog output file to ExactModel and StandardCallerArgumentCollection
...
-- This allows us to log all of the information about the exact model call (alleles, priors, PLs, result, and runtime) to a file for later debugging / optimization
2012-10-03 19:55:11 -07:00
Eric Banks
2df5be702c
Added an argument to RR to allow polyploid consensus creation (by default it is turned off). This will eventually be replaced by the known SNPs track trigger.
2012-09-28 11:44:25 -04:00
Eric Banks
11a71e0390
RR bug: when determining the most common base at a position, break ties by which base has the highest sum of base qualities. Otherwise, sites with 1 Q2 N and 1 Q30 C are ending up as Ns in the consensus. I think perhaps we don't even care about which base has the most observations - it should just be determined by which has the highest sum of base qualities - but I'm not sure that's what users would expect.
2012-09-24 21:46:14 -04:00
Eric Banks
6a73265a06
RR bug: we were adding synthetic reads from the header only before the variant region, which meant that reads that overlap the variant region but that weren't used for the consensus (because e.g. of low base quality for the spanning base) were never being used at all. Instead, add synthetic reads from before and spanning the variant region.
2012-09-24 13:29:37 -04:00
Eric Banks
ef680e1e13
RR fix: push the header removal all the way into the inner loops so that we literally remove a read from the general header only if it was added to the polyploid header. Add comments.
2012-09-24 11:14:18 -04:00
Eric Banks
0187f04a90
Proper fix for a previous RR bug fix: only remove reads from the header if they were actually used in the creation of the polyploid consensus.
2012-09-23 00:39:19 -04:00
Eric Banks
344083051b
Reverting the fix to the generalized ploidy exact model since it cannot handle it computationally. Will file this in the JIRA.
2012-09-22 23:07:28 -04:00
Eric Banks
ced652b3dd
RR bug: we need to call removeFromHeader() for reads that were used in creating a polyploid consensus or else they are reused later in creating synthetic reads. In the worst case, this bug caused the tool to create 2 copies of the reduced read.
2012-09-22 21:50:10 -04:00
Eric Banks
60b93acf7d
RR bug: we need to test that the mapping and base quals are >= the MIN values and not just >. This was causing us to drop Q20 bases.
2012-09-22 21:32:29 -04:00
Eric Banks
dcd31e654d
Turn off RR tests while I debug
2012-09-21 17:26:00 -04:00
Eric Banks
21251c29c2
Off-by-one error in sliding window manifests itself at end of a coverage region dropping the last covered base.
2012-09-21 17:22:30 -04:00
Mauricio Carneiro
2c3dc291c0
Added positive/negative strand to the synthetic reads
2012-09-21 10:00:48 -04:00
Mauricio Carneiro
51cb5098e4
Fixed the alignment issues with reads that started with empty consensus headers
2012-09-21 10:00:47 -04:00
Mauricio Carneiro
aa1d2f3a5b
Not every consensus is well aligned. Need to check more, but starting position has been fixed.
2012-09-21 10:00:45 -04:00
Mauricio Carneiro
97874b92d1
Program runs, but the consensus reads are all out of place and need more tags
2012-09-21 10:00:44 -04:00
Mauricio Carneiro
3494a52ddc
another intermediate commit to update changes from stable
2012-09-21 10:00:43 -04:00
Mauricio Carneiro
a89ff7b5dd
Intermediate commit to resolve conflicts coming from stable
2012-09-21 10:00:41 -04:00
Eric Banks
1316b579f0
Bad news folks: BQSR scatter-gather was totally busted; you absolutely cannot trust any BQSR table that was a product of SG (for any version of BQSR). I fixed BQSR-gathering, rewrote (and enabled) the unit test, and confirmed that outputs are now identical whether or not SG is used to create the table.
2012-09-20 14:14:34 -04:00
Eric Banks
4b7edc72d1
Fixing edge case bug in the Exact model (both standard and generalized) where we could abort prematurely in the special case of multiple polymorphic alleles and samples with widely different depths of coverage (e.g. exome and low-pass). In these cases it was possible to call the site bi-allelic when in fact it was multi-allelic (but it wouldn't cause it to create a monomorphic call).
2012-09-20 10:59:42 -04:00
Mauricio Carneiro
ee31a54a03
Merged bug fix from Stable into Unstable
2012-09-19 16:09:45 -04:00
Mauricio Carneiro
7cf9911924
Fixed ReduceReads bug where variant regions were missing.
...
This affected variant regions with more than 100 reads and less than 250 reads. Only bams reduced with GATK v2 and 2.1 were affected.
2012-09-19 16:09:08 -04:00
Ryan Poplin
26e35e5ee2
updating BQSR integration tests
2012-09-19 14:10:34 -04:00
Ryan Poplin
b99099f05c
The BaseRecalibrator and DelocalizedBaseRecalibrator have gotten out of sync. Fixing.
2012-09-19 12:30:26 -04:00
Ryan Poplin
7a7103a757
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-09-19 10:39:18 -04:00
Guillermo del Angel
bebd5c14b8
Update general ploidy md5's due to bad merge of md5's in previous commit, and new shortened interval definition for EMIT_ALL_CONFIDENT_SITES was buggy
2012-09-18 20:12:15 -04:00
Guillermo del Angel
ca010160a9
Merge fix
2012-09-14 14:05:21 -04:00
Guillermo del Angel
6b37350bc0
Two hairy bugs in pool caller: a) Site error model wasn't counting errors in insertions correctly - Alleles passed in had padded ref byte, but event base in PileupElement doesn't have it. As a result, mismatch rate was grossly overestimated with insertions and we missed several calls we should have made. Integration test reflects changes. b) Adding a ref GL to the exact model is correct mathematically but AFResult wasn't filled properly. As a result, QUAL was junk in pure ref sites, and in all other sites the last ref GL introduced wasn't properly updating Pr(AF>0). c) Added integration test that covers -out_mode EMIT_ALL_CONFIDENT_SITES. Not fully sure if the math is 100% correct (for both diploid and generalized case) but at least now diploid and non-diploid cases behave similarly. md5 of this new test will fail since it's taking me a long time to run so I'll update from Bamboo output shortly
2012-09-14 13:13:22 -04:00
Eric Banks
0206e09a6a
Merge branch 'master' of ssh://gsa2/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-09-12 15:18:27 -04:00
Eric Banks
d94d0d15c2
Complete overhaul of previous commits to make it all work with scatter-gather. Now tracks output files correctly and can print to stdout.
2012-09-12 15:15:40 -04:00
Ryan Poplin
c9111bb23e
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-09-12 14:46:50 -04:00
Ryan Poplin
849a2b8839
Adding HC integration test for _structural_ insertions and deletions.
2012-09-12 12:23:00 -04:00
Eric Banks
994a4ff387
Track all outputs from BQSR (.table, .csv., and .pdf) as @Output arguments. Updated integration tests because we no longer have command-line options not to generate plots (now just don't provide a pdf) or to keep the intermediate csv (now, just provide a filename on the command-line). This is currently busted because we can't access the original filenames from the Engine's storage/stub system and therefore cannot call out to the Rscript with the executor (which requires filename strings).
2012-09-12 11:24:53 -04:00
Mark DePristo
bfbf1686cd
Fixed nasty bug with defaulting to diploid no-call genotypes
...
-- For the pooled caller we were writing diploid no-calls even when other samples were haploid. Changed maxPloidy function to return a defaultPloidy, rather than 0, in the case where all samples are missing.
-- VCF/BCF Writers now create missing genotypes with the ploidy of other samples, or 2 if none are available at all.
-- Updating integration tests for general ploidy, as previously we wrote ./. even when other calls were 0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/1/1/1/1/1, but now we write ./././././././././././././././././././././././. (ugly but correct)
2012-09-12 07:08:03 -04:00
Ryan Poplin
35d15278af
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-09-11 14:34:17 -04:00
Guillermo del Angel
13831106d5
Fix GSA-535: storing likelihoods in allele map was busted when running HaplotypeCaller, only the last likelihood of a haplotype was being stored, as opposed to the max likelihood of all haplotypes mapping to an allele
2012-09-11 11:01:26 -04:00
Ryan Poplin
aa9829b55c
fixing typo
2012-09-10 13:36:37 -04:00