Commit Graph

8999 Commits (8507cd7440f3aa58b9f47bbc8a2d434eeb5de9d3)

Author SHA1 Message Date
Mark DePristo 8507cd7440 Throw UserException for bad dict / chain files 2012-04-10 07:22:43 -04:00
Mark DePristo 979a84a252 Bugfix for thread unsafe PL cache
-- See https://getsatisfaction.com/gsa/topics/unifiedgenotyper_error_indel?utm_content=topic_link&utm_medium=email&utm_source=new_topic
-- Solution is to use a fixed cache that's never updated on the fly.  My changes limit us to having no more than 500 alleles at a site, which I hope is ok but easy enough to up to a ridiculously large number.
2012-03-27 22:42:30 -04:00
Eric Banks d3f2bc4361 Pre-allocate 10 alt alleles worth of PLs in the cache for efficiency. This effectively means that we never need to re-allocate the cache in the future because we can't ever really handle that many alt alleles. 2012-03-23 13:51:00 -04:00
Mark DePristo ff26f2bf68 HierarchicalMicroScheduler no longer attempts to wrap exceptions
-- This behavior, which isn't obviously valuable at all, continued to grab and rethrow exceptions in the HMS that, if run without NT, would show up as more meaningful errors.  Now HMS simply checks whether the throwable it received on error was a RuntimeException.  If so, it is stored and rethrow without wrapping later.  If it isn't, only in this case is the exception wrapped in a ReviewedStingException.
-- Added a QC walker ErrorThrowingWalker that will throw a UserException, ReviewedStingException, and NullPointerException from map as specified on the command line
-- Added IT that ensures that all three types are thrown properly (i.e., you catch a NullPointerException when you ask for one to be thrown) with and without threading enabled.
-- I believe this will finally put to rest all of these annoying HMS captures.
2012-03-23 11:27:21 -04:00
Ryan Poplin ab288354e9 Better error message for malformed input recal file. 2012-03-23 10:47:01 -04:00
Guillermo del Angel b8cd959461 Potential corner condition bug fix: protect against null pointer exceptions when computing consensus indel bases when UG is discovering alt alleles. If an alt allele has non-standard bases, skip allele gracefully instead of adding null object into list 2012-03-22 10:06:22 -04:00
Eric Banks 58245bfa2f Bug fix: check to see whether there's a BasePileup before asking for one. 2012-03-21 12:44:09 -04:00
Eric Banks 07c3bd32b3 Bug fix: merge NO_VARIATION records with those of another type. The sad part is that this WAS covered by integration tests but someone updated the MD5s without actually paying attention... 2012-03-21 12:42:13 -04:00
Eric Banks dcf2fa361d Minor cleanup 2012-03-21 12:14:31 -04:00
Eric Banks ab1c48745b Need to catch RuntimeExceptions coming out of Picard too so that they show up as UserErrors (some BAM errors are thrown as REs). 2012-03-21 12:13:52 -04:00
Khalid Shakir d0056d6c71 Updated HSP dbsnp from 132 to 135 along with other minor patches. 2012-03-19 14:36:38 -04:00
Eric Banks 5c5d8e7cd3 Minor: cleaner way of turning off index-on-the-fly checking in case we want to turn it back on. 2012-03-18 00:53:29 -04:00
Eric Banks 344a938a70 When checking to make sure that we have cached enough data in the PL array, use the converted index value since that's what will be used as an index into the array. 2012-03-18 00:36:30 -04:00
Guillermo del Angel a05a7f287d TMP: disable checking of whether on the fly index is equal to index after run completed 2012-03-16 21:14:45 -04:00
Eric Banks a7578e85e8 Rewriting a few of the indel integration tests for multi-allelics. The old tests were running b37 calls against a b36 reference, so the calls were all ref. The new tests are run against the pilot1 data and then those calls are fed back into the the same bam to test genotype given alleles, with a sprinkling of bi- and tri-allelics. 2012-03-16 14:21:27 -04:00
Eric Banks 7424041a17 Updating integration tests to deal with the new GL framework. Now multi-allelic indel calls are correct. 2012-03-16 12:50:39 -04:00
Eric Banks dce6b91f7d Add a conversion from the deprecated PL ordering to the new one. We need this for the DiploidSNPGenotypeLikelihoods which still use the old ordering. My intention is for this to be a temporary patch, but changing the ordering in DiploidSNPGenotypeLikelihoods is not appriopriate for committing to stable as it will break all of the external tools (e.g. MuTec) that are built on top of the class. We will have to talk to e.g. Kristian to see how disruptive this will be. Added unit tests to the GL conversions and indexing. 2012-03-16 11:14:37 -04:00
Eric Banks 41068b6985 The commit constitutes a major refactoring of the UG as far as the genotype likelihoods are concerned. I hate to do this in stable, but the VCFs currently being produced by the UG are totally busted. I am trying to make just the necessary changes in stable, doing everything else in unstable later. Now all GL calculations are unified into the GenotypeLikelihoods class - please try and use this functionality from now on instead of duplicating the code. 2012-03-15 16:08:58 -04:00
Eric Banks f7c2c818fe Exact model memory optimization: instead of having a later matrix column pull in data from earlier ones (requiring us to keep them around until all dependencies are hit), the earlier columns push data into their dependents immediately and then are removed. This does trade off speed a little bit (because we need to call approximateLog10Sum each time we add to a dependent instead of once in an array at the end). Note that this commit would normally not get pushed into stable, but I'm about to make a very disruptive push into stable that would make merging this from unstable a nightmare. 2012-03-14 14:02:36 -04:00
Mark DePristo bb2c10b785 Capture the class of the exception in GATKRunReport
-- As suggested by David.
2012-03-14 12:16:22 -04:00
Eric Banks 5200f7f919 When creating a synthetic VC based on the passed in alleles, set the reference base for indel. 2012-03-13 10:59:58 -04:00
Eric Banks 1675bd4dd7 When creating a synthetic VC based on the passed in alleles, set the length correctly. 2012-03-13 10:55:52 -04:00
Eric Banks 04cafffaa7 Merge remote-tracking branch 'unstable/master' 2012-03-12 08:43:43 -04:00
Eric Banks b4749757f8 Fixes for SLOD: 1) didn't work properly for multi-allelics (randomly chose an allele, possibly one that wasn't genotyped in the full context); 2) in cases when there were more alt alleles than the max allowed and the user is calculating SB, we would recompute the best alt alleles(s); 3) for some reason, we were recomputing the LOD for the full context when we'd already done that. Given that this passes integration tests on my end, this should be the last commit before the release. 2012-03-12 01:07:07 -04:00
Mark DePristo 1ee46e5c06 Collect only the bare essentials in the GATKRunReport
Now looks like:
<GATK-run-report>
   <id>D7D31ULwTSxlAwnEOSmW6Z4PawXwMxEz</id>
   <start-time>2012/03/10 20.21.19</start-time>
   <end-time>2012/03/10 20.21.19</end-time>
   <run-time>0</run-time>
   <walker-name>CountReads</walker-name>
   <svn-version>1.4-483-g63ecdb2</svn-version>
   <total-memory>85000192</total-memory>
   <max-memory>129957888</max-memory>
   <user-name>depristo</user-name>
   <host-name>10.0.1.10</host-name>
   <java>Apple Inc.-1.6.0_26</java>
   <machine>Mac OS X-x86_64</machine>
   <iterations>105</iterations>
</GATK-run-report>

No longer capturing command line or directory information, to minimize people's concerns with phone home and privacy
2012-03-10 20:27:14 -05:00
Mark DePristo bd883031a4 Final version of QualQuantizer
-- Docs everywhere
-- Contracts everywhere
-- More unit tests
-- Better error checking
-- Marginally nicer interface to QuantizeQualsWalker
2012-03-09 16:00:08 -05:00
Mark DePristo 4b404cae48 Final evaluation script for quantizing quality scores 2012-03-09 16:00:08 -05:00
Mark DePristo e2c62572f9 Further upgrades to CalibrateGenotypeLikelihoods.R
- Uses modified yates correction of e + 1 / n + 2 to estimate error rates
- Now shows ALL and per read group information
- Better limits on diff plots so we can see more information
2012-03-09 16:00:08 -05:00
Mark DePristo fceb2bf25b Updating CalibrateGenotypeLikelihoods.R to display Q93 not filter them out 2012-03-09 16:00:07 -05:00
Mark DePristo 3ba2e5667c CalibrateGenotypesLikelihoods include pOfDGivenD now 2012-03-09 16:00:07 -05:00
Mark DePristo 1011f3862b CalibrateGenotypeLikelihoods now emits the position of the variant for debugging
-- Refactored some duplicated code (FYI, code duplication = root of all evil) into shared functions
-- Added long-missing integrationtests
-- CHRIS/RYAN -- it would be very good to add an integration test covering external VCF files as I believe we rely on this functionality and it's not tested at all
2012-03-09 16:00:07 -05:00
Mark DePristo 8158348e01 Prints xlim = 30 and xlim = 99 in CalibrateGenotypeLikelihoods.R 2012-03-09 16:00:07 -05:00
David Roazen 91d10431d3 BAMScheduler: detect contigs from the interval list that are not in the merged BAM header's sequence dictionary
This is a quick-and-dirty patch for the null pointer error Mauricio reported earlier.

Later on we might want to address in a more general way the fact that we validate user intervals
against the reference but not against the merged BAM header produced by the engine at runtime.
2012-03-09 15:20:16 -05:00
David Roazen bc65f6326f Detect incomplete reads from BAM schedule file in BAMSchedule before they become buffer underflows
This fix is similar, but distinct from the earlier fix to GATKBAMIndex. If we fail to read in
a complete 3-integer bin header from the BAM schedule file that the engine has written, throw a
ReviewedStingException (since this is our problem, not the user's) rather than allowing a
cryptic buffer underflow error to occur.

Note that this change does not fix the underlying problem in the engine, if there is one
(there may be an as-yet-undetected bug in the code that writes the bam schedule). It will
just make it easier for us to identify what's going wrong in the future.
2012-03-09 12:33:48 -05:00
David Roazen 32dee7ed9b Avoid buffer underflow in GATKBAMIndex by detecting premature EOF in BAM indices
GATKBAMIndex would allow an extremely confusing BufferUnderflowException to be
thrown when a BAM index file was truncated or corrupt. Now, a UserException is
thrown in this situation instructing the user to re-index the BAM.

Added a unit test for this case as well.
2012-03-08 15:30:44 -05:00
Guillermo del Angel c04853eae6 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-03-08 12:30:04 -05:00
Guillermo del Angel 858acf8616 Hidden mode in ValidationAmplicons to support ILMN output format (same as Sequenom, with just shuffled columns) 2012-03-08 12:29:44 -05:00
Andrey Sivachenko 56f074b520 docs updated 2012-03-07 18:47:15 -05:00
Andrey Sivachenko 117ea605ac Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-03-07 18:35:07 -05:00
Andrey Sivachenko 497a1b059e transition to JEXL completed, old parameters setting individual cutoffs now deprecated 2012-03-07 18:34:11 -05:00
Andrey Sivachenko fbd2f04a04 JEXL support added; intermediate commit, not yet functional 2012-03-07 17:29:42 -05:00
Mark DePristo 20d10dfa35 EvalQuantizedQuals now tests the impact on reduced reads as well 2012-03-07 13:10:08 -05:00
Mark DePristo 0376d73ece Improved, public version of ErrorRateByCycle
-- A cleaner table output (molten).  For those interested in seeing how this can be done with GATKReports look here for a nice clean example
-- Integration tests
-- Minor improvements to GATKReportTable with methods to getPrimaryKeys
2012-03-07 13:10:08 -05:00
Christopher Hartl a6a8fc0521 Merge branch 'master' of ssh://ni.broadinstitute.org/humgen/gsa-scr1/chartl/dev/unstable 2012-03-07 10:05:43 -05:00
Eric Banks c4824a77f5 Some to-do items for the reduced reads calling script 2012-03-07 10:03:10 -05:00
Christopher Hartl 155839e901 Commit of VQSRV3 with Random Forest Bridge and Decision Tree engines. Lots of code duplication with the variant recalibrator in public, but also some subtle changes (i.e. to the engines and data manager). Code worked when it overwrote the stuff in public, but couldn't commit that. Will push if it works for private as well. 2012-03-07 09:46:43 -05:00
Mark DePristo 26dcec08d5 Bugfix for QualQuantizerUnitTest
-- Enabled failing provider
-- Fixed incorrect expectation in unit test
2012-03-07 09:30:03 -05:00
Mark DePristo 8ef654aa77 Minor improvements to QuantizeQuals
-- Commenting out excessive debugging in the walker
-- Scala script to quantize BAM, run calibrate genotype likelihoods, call snps, and compare them to the full bam call set for 1, 2, 4, 8, 16, 32, and 64 quantization levels
2012-03-06 16:56:59 -05:00
Mark DePristo 569be953b9 Bugfix for VariantEval
-- We weren't properly handling the case where a site had both a SNP and indel in both eval and comp.  These would naturally pair off as SNP x SNP and INDEL x INDEL in eval, but we'd still invoke update2 with (null, SNP) and (null, INDEL) resulting most conspicously as incorrect false negatives in the validation report.
-- Updating misc. integrationtests, as the counting of comps (in particular for dbSNP) was inflated because of this effect.
2012-03-06 16:56:59 -05:00
Mark DePristo 5f35f5d338 QualQuantizer scales the penalty by the log of the two error rates
-- Old equation was |E1 - E*| * N1.  New equation is |log10(E1) - log10(E2)| * N1 which is equivalent to E1 * N1/E2
2012-03-06 16:56:58 -05:00