Commit Graph

187 Commits (08ac80c0804dceee76502de7de630095031f39b2)

Author SHA1 Message Date
Eric Banks 08ac80c080 RR bug: when the last base in the window around the polyploid consensus is filtered (low quality), the filtered consensus is not flushed and subsequent filtered bases (but importantly not contiguous to this one) are just added to this position. In other words, bases were being added to the wrong genomic positions. Fixed. 2012-10-07 10:52:01 -04:00
Eric Banks e8a6460a33 After merging with Yossi's fix I can confirm that the AD is fixed when going through the HC too. Added similar fixes to DP and FS annotations too. 2012-10-05 16:37:42 -04:00
Yossi Farjoun ef90beb827 - forgot to use git rm to delete a file from git. Now that VCF is deleted.
- uncommented a HC test that I missed.
2012-10-05 16:14:51 -04:00
Yossi Farjoun d419a33ed1 * Added an integration test for AD annotation in the Haplotype caller.
* Corrected FS Anotation for UG as for AD.
* HC still does not annotate ReducedReads correctly (for FS nor AD)
2012-10-05 15:23:59 -04:00
Eric Banks f840d9edbd HC test should continue using 3 alt alleles for indels 2012-10-05 02:03:34 -04:00
Eric Banks c66ef17cd0 Add a separate max alt alleles argument for indels that defaults to 2 instead of 3. PLEASE TAKE NOTE. 2012-10-04 13:52:14 -04:00
Eric Banks e13e61673b Merge branch 'master' of ssh://gsa2/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-10-04 10:54:23 -04:00
Eric Banks dfddc4bb0e Protect against cases where there are counts but no quals 2012-10-04 10:52:30 -04:00
Eric Banks 0c46845c92 Refactored the BaseCounts classes so that they are safer and allow for calculations on the most probable base (which is not necessarily the most common base). 2012-10-04 10:37:11 -04:00
Mark DePristo b6e20e083a Copied DiploidExactAFCalc to placeholder OptimizedDiploidExact
-- Will be removed.  Only commiting now to fix public -> private dependency
2012-10-03 20:16:38 -07:00
Mark DePristo 51cafa73e6 Removing public -> private dependency 2012-10-03 20:05:03 -07:00
Mark DePristo f8ef4332de Count the number of evaluations in AFResult; expand unit tests
-- AFResult now tracks the number of evaluations (turns through the model calculation) so we can now compute the scaling of exact model itself as a function of n samples
-- Added unittests for priors (flat and human)
-- Discovered nasty general ploidy bug (enabled with Guillermo_FIXME)
2012-10-03 19:55:11 -07:00
Mark DePristo de941ddbbe Cleanup Exact model, better unit tests
-- Added combinatorial unit tests for both Diploid and General (in diploid-case) for 2 and 3 alleles in all combinations of sample types (i.e., AA, AB, BB and equiv. for tri-allelic).  More assert statements to ensure quality of the result.
-- Added docs (DOCUMENT YOUR CODE!) to AlleleFrequencyCalculationResult, with proper input error handling and contracts.  Made mutation functions all protected
-- No longer need to call reset on your AlleleFrequencyCalculationResult -- it'd done for you in the calculation function.  reset is a protected method now, so it's all cleaner and nicer this way
-- TODO still -- need to add edge-case tests for non-informative samples (0,0,0), for the impact of priors, and I need to add some way to test the result of the pNonRef
2012-10-03 19:55:11 -07:00
Mark DePristo 3e01a76590 Clean up AlleleFrequencyCalculation classes
-- Added a true base class that only does truly common tasks (like manage call logging)
   -- This base class provides the only public method (getLog10PNonRef) and calls into a protected compute function that's abstract
   -- Split ExactAF into superclass ExactAF with common data structures and two subclasses: DiploidExact and GeneralPloidyExact
   -- Added an abstract reduceScope function that manages the simplification of the input VariantContext in the case where there are too many alleles or other constraints require us to only attempt a smaller computation
   -- All unit tests pass
2012-10-03 19:55:11 -07:00
Mark DePristo 1c52db4cdd Add exactCallsLog output file to ExactModel and StandardCallerArgumentCollection
-- This allows us to log all of the information about the exact model call (alleles, priors, PLs, result, and runtime) to a file for later debugging / optimization
2012-10-03 19:55:11 -07:00
Eric Banks 2df5be702c Added an argument to RR to allow polyploid consensus creation (by default it is turned off). This will eventually be replaced by the known SNPs track trigger. 2012-09-28 11:44:25 -04:00
Eric Banks 11a71e0390 RR bug: when determining the most common base at a position, break ties by which base has the highest sum of base qualities. Otherwise, sites with 1 Q2 N and 1 Q30 C are ending up as Ns in the consensus. I think perhaps we don't even care about which base has the most observations - it should just be determined by which has the highest sum of base qualities - but I'm not sure that's what users would expect. 2012-09-24 21:46:14 -04:00
Eric Banks 6a73265a06 RR bug: we were adding synthetic reads from the header only before the variant region, which meant that reads that overlap the variant region but that weren't used for the consensus (because e.g. of low base quality for the spanning base) were never being used at all. Instead, add synthetic reads from before and spanning the variant region. 2012-09-24 13:29:37 -04:00
Eric Banks ef680e1e13 RR fix: push the header removal all the way into the inner loops so that we literally remove a read from the general header only if it was added to the polyploid header. Add comments. 2012-09-24 11:14:18 -04:00
Eric Banks 0187f04a90 Proper fix for a previous RR bug fix: only remove reads from the header if they were actually used in the creation of the polyploid consensus. 2012-09-23 00:39:19 -04:00
Eric Banks 344083051b Reverting the fix to the generalized ploidy exact model since it cannot handle it computationally. Will file this in the JIRA. 2012-09-22 23:07:28 -04:00
Eric Banks ced652b3dd RR bug: we need to call removeFromHeader() for reads that were used in creating a polyploid consensus or else they are reused later in creating synthetic reads. In the worst case, this bug caused the tool to create 2 copies of the reduced read. 2012-09-22 21:50:10 -04:00
Eric Banks 60b93acf7d RR bug: we need to test that the mapping and base quals are >= the MIN values and not just >. This was causing us to drop Q20 bases. 2012-09-22 21:32:29 -04:00
Eric Banks dcd31e654d Turn off RR tests while I debug 2012-09-21 17:26:00 -04:00
Eric Banks 21251c29c2 Off-by-one error in sliding window manifests itself at end of a coverage region dropping the last covered base. 2012-09-21 17:22:30 -04:00
Mauricio Carneiro 2c3dc291c0 Added positive/negative strand to the synthetic reads 2012-09-21 10:00:48 -04:00
Mauricio Carneiro 51cb5098e4 Fixed the alignment issues with reads that started with empty consensus headers 2012-09-21 10:00:47 -04:00
Mauricio Carneiro aa1d2f3a5b Not every consensus is well aligned. Need to check more, but starting position has been fixed. 2012-09-21 10:00:45 -04:00
Mauricio Carneiro 97874b92d1 Program runs, but the consensus reads are all out of place and need more tags 2012-09-21 10:00:44 -04:00
Mauricio Carneiro 3494a52ddc another intermediate commit to update changes from stable 2012-09-21 10:00:43 -04:00
Mauricio Carneiro a89ff7b5dd Intermediate commit to resolve conflicts coming from stable 2012-09-21 10:00:41 -04:00
Eric Banks 1316b579f0 Bad news folks: BQSR scatter-gather was totally busted; you absolutely cannot trust any BQSR table that was a product of SG (for any version of BQSR). I fixed BQSR-gathering, rewrote (and enabled) the unit test, and confirmed that outputs are now identical whether or not SG is used to create the table. 2012-09-20 14:14:34 -04:00
Eric Banks 4b7edc72d1 Fixing edge case bug in the Exact model (both standard and generalized) where we could abort prematurely in the special case of multiple polymorphic alleles and samples with widely different depths of coverage (e.g. exome and low-pass). In these cases it was possible to call the site bi-allelic when in fact it was multi-allelic (but it wouldn't cause it to create a monomorphic call). 2012-09-20 10:59:42 -04:00
Mauricio Carneiro ee31a54a03 Merged bug fix from Stable into Unstable 2012-09-19 16:09:45 -04:00
Mauricio Carneiro 7cf9911924 Fixed ReduceReads bug where variant regions were missing.
This affected variant regions with more than 100 reads and less than 250 reads. Only bams reduced with GATK v2 and 2.1 were affected.
2012-09-19 16:09:08 -04:00
Ryan Poplin 26e35e5ee2 updating BQSR integration tests 2012-09-19 14:10:34 -04:00
Ryan Poplin b99099f05c The BaseRecalibrator and DelocalizedBaseRecalibrator have gotten out of sync. Fixing. 2012-09-19 12:30:26 -04:00
Ryan Poplin 7a7103a757 Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-09-19 10:39:18 -04:00
Guillermo del Angel bebd5c14b8 Update general ploidy md5's due to bad merge of md5's in previous commit, and new shortened interval definition for EMIT_ALL_CONFIDENT_SITES was buggy 2012-09-18 20:12:15 -04:00
Guillermo del Angel ca010160a9 Merge fix 2012-09-14 14:05:21 -04:00
Guillermo del Angel 6b37350bc0 Two hairy bugs in pool caller: a) Site error model wasn't counting errors in insertions correctly - Alleles passed in had padded ref byte, but event base in PileupElement doesn't have it. As a result, mismatch rate was grossly overestimated with insertions and we missed several calls we should have made. Integration test reflects changes. b) Adding a ref GL to the exact model is correct mathematically but AFResult wasn't filled properly. As a result, QUAL was junk in pure ref sites, and in all other sites the last ref GL introduced wasn't properly updating Pr(AF>0). c) Added integration test that covers -out_mode EMIT_ALL_CONFIDENT_SITES. Not fully sure if the math is 100% correct (for both diploid and generalized case) but at least now diploid and non-diploid cases behave similarly. md5 of this new test will fail since it's taking me a long time to run so I'll update from Bamboo output shortly 2012-09-14 13:13:22 -04:00
Eric Banks 0206e09a6a Merge branch 'master' of ssh://gsa2/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-09-12 15:18:27 -04:00
Eric Banks d94d0d15c2 Complete overhaul of previous commits to make it all work with scatter-gather. Now tracks output files correctly and can print to stdout. 2012-09-12 15:15:40 -04:00
Ryan Poplin c9111bb23e Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-09-12 14:46:50 -04:00
Ryan Poplin 849a2b8839 Adding HC integration test for _structural_ insertions and deletions. 2012-09-12 12:23:00 -04:00
Eric Banks 994a4ff387 Track all outputs from BQSR (.table, .csv., and .pdf) as @Output arguments. Updated integration tests because we no longer have command-line options not to generate plots (now just don't provide a pdf) or to keep the intermediate csv (now, just provide a filename on the command-line). This is currently busted because we can't access the original filenames from the Engine's storage/stub system and therefore cannot call out to the Rscript with the executor (which requires filename strings). 2012-09-12 11:24:53 -04:00
Mark DePristo bfbf1686cd Fixed nasty bug with defaulting to diploid no-call genotypes
-- For the pooled caller we were writing diploid no-calls even when other samples were haploid.  Changed maxPloidy function to return a defaultPloidy, rather than 0, in the case where all samples are missing.
-- VCF/BCF Writers now create missing genotypes with the ploidy of other samples, or 2 if none are available at all.
-- Updating integration tests for general ploidy, as previously we wrote ./. even when other calls were 0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/1/1/1/1/1, but now we write ./././././././././././././././././././././././. (ugly but correct)
2012-09-12 07:08:03 -04:00
Ryan Poplin 35d15278af Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-09-11 14:34:17 -04:00
Guillermo del Angel 13831106d5 Fix GSA-535: storing likelihoods in allele map was busted when running HaplotypeCaller, only the last likelihood of a haplotype was being stored, as opposed to the max likelihood of all haplotypes mapping to an allele 2012-09-11 11:01:26 -04:00
Ryan Poplin aa9829b55c fixing typo 2012-09-10 13:36:37 -04:00