gatk-3.8

Commit Graph

Author	SHA1	Message	Date
Eric Banks	66cbaaee31	Fixed nasty bug in BQSR csv file creation: numbers larger than 999 in the Errors column were printed out with commas (which looks like a separate column). This wasn't caught earlier because there are no integration tests covering the csv. I'll add one into unstable in a sec.	2012-11-09 08:33:55 -05:00
David Roazen	422e16c62e	BaseRecalibration: don't cache instances of ReadCovariates across reads Caching and reusing ReadCovariates instances across reads sounds good in theory, but: -it doesn't work unless you zero out the internal arrays before each read -the internal arrays must be sized proportionally to the maximum POSSIBLE recalibrated read length (5000!!!), instead of the ACTUAL read lengths By contrast, creating a new instance per read is basically equivalent to doing an efficient low-level memset-style clear on a much smaller array (since we use the actual rather than the maximum read length to create it). So this should be faster than caching instances and calling clear() but slower than caching instances and not calling clear(). Credit to Ryan to proposing this approach.	2012-10-25 17:02:55 -04:00
Mark DePristo	cc8c12b954	Committing a broken version of BaseRecalibration -- I'm committing because there's some kind of fundamental problem with the ReadCovariates cache, in that historical data isn't being cleared / computed properly, and I'd rather it fail for a while than leave it in JIRA. -- The integration tests test the -nct with PrintReads to get 1, 2, 4 and the 4 fails. But that's because of this incorrect calculation -- Updating GATKPerformanceOverTime with the new @ClassType annotation	2012-10-25 14:46:35 -04:00
David Roazen	32a6d7000a	Thread-safe ReadGroupCovariate The ReadGroupCovariate class was not thread-safe. This led to horrible race conditions in multithreaded runs of the BQSR where (for example) the same read group could get inserted into the reverse lookup table twice with different IDs. Should fix the intermittent crash reported in GSA-492.	2012-10-24 15:22:50 -04:00
David Roazen	ac87ed47bb	BQSR: allow logging recal table updates to a file For testing/debugging purposes only	2012-10-01 14:18:34 -04:00
Eric Banks	1316b579f0	Bad news folks: BQSR scatter-gather was totally busted; you absolutely cannot trust any BQSR table that was a product of SG (for any version of BQSR). I fixed BQSR-gathering, rewrote (and enabled) the unit test, and confirmed that outputs are now identical whether or not SG is used to create the table.	2012-09-20 14:14:34 -04:00
Mark DePristo	087247f1f0	Allow longs and doubles in recalibration report to allow some backward compatibility	2012-09-19 19:23:44 -04:00
Ryan Poplin	7a7103a757	Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-09-19 10:39:18 -04:00
Ryan Poplin	0ea543e1fd	Removing testing scaffolding from delocalized BQSR. The output recal table reports the data as doubles instead of integers. This changes the mapping-based BQSR integration tests. Final intermediate push before delocalized BQSR replaces previous BQSR.	2012-09-19 10:39:06 -04:00
Eric Banks	d94d0d15c2	Complete overhaul of previous commits to make it all work with scatter-gather. Now tracks output files correctly and can print to stdout.	2012-09-12 15:15:40 -04:00
Eric Banks	994a4ff387	Track all outputs from BQSR (.table, .csv., and .pdf) as @Output arguments. Updated integration tests because we no longer have command-line options not to generate plots (now just don't provide a pdf) or to keep the intermediate csv (now, just provide a filename on the command-line). This is currently busted because we can't access the original filenames from the Engine's storage/stub system and therefore cannot call out to the Rscript with the executor (which requires filename strings).	2012-09-12 11:24:53 -04:00
David Roazen	d2f3d6d22f	Revert "Separated out the DoC calculations from the XHMM pipeline, so that CalcDepthOfCoverage can be used for calculating joint coverage on a per-base accounting over multiple samples (e.g., family samples)" This reverts commit 075c56060e0ffcce39631693ef39cf5f8c3a4d5a.	2012-09-10 15:52:39 -04:00
Menachem Fromer	0b717e2e2e	Separated out the DoC calculations from the XHMM pipeline, so that CalcDepthOfCoverage can be used for calculating joint coverage on a per-base accounting over multiple samples (e.g., family samples)	2012-09-10 15:32:41 -04:00
Mark DePristo	c9ea213c9b	Make BaseRecalibration thread-safe -- In the process uncovered two strange things 1 -- qualityScoreByFullCovariateKey was created but never used. Seems like a cache? 2 -- Discovered nasty bug in BaseRecalibrator: https://jira.broadinstitute.org/browse/GSA-534	2012-08-31 13:42:42 -04:00
Mark DePristo	817ece37a2	General infrastructure for ReadTransformers -- These are like read filters but can be applied either on input, on output, of handled by the walker -- Previous example of BAQ now uses the general framework -- Resulted in massive conceptual cleanup of SAMDataSource and ReadProperties! Yeah! -- BQSR now uses this framework. We can now do BQSR on input, on output, or within a walker -- PrintReads now handles all read transformers in the walker in map, enabling us to parallelize PrintReads with BAQ and BQSR -- Currently BQSR is excepting in parallel, which subsequent commit with fix -- Removed global variable setting in GenomeAnalysisEngine for BAQ, as command line parameters are cleanly handled by ReadTransformer infrastructure -- In principle ReadFilters are just a special kind of ReadTransformer, but this refactoring is larger than I can do. It's a JIRA entry -- Many files touched simply due to the refactoring and renaming of classes	2012-08-31 13:42:41 -04:00
Ryan Poplin	e12ae65d33	Changing the commenting style in the BQSR	2012-08-29 11:27:45 -04:00
Ryan Poplin	18eca3544e	Initial commit of the delocalized BQSR written as a read walker.	2012-08-28 15:24:20 -04:00
Mark DePristo	dcc972a557	Usability cleanup for BQSR -- I'm seeing a lot of people trying to use BinaryTagCovariate in the community. They really shouldn't do this, so I moved it to private. -- Throw an exception if its required bintag argument is missing -- Check explicitly if user is requesting DinucCovariate and tell them that its been retired in favor of ContextCovariate -- Show the type (Required, Experimental, Standard) of the covariates when running --list	2012-08-25 14:53:00 -04:00
Eric Banks	ded0e11b45	Killing off some FindBugs 'Realiability' issues	2012-08-16 14:00:48 -04:00
Eric Banks	dac3958461	Killing off some FindBugs 'Usability' issues	2012-08-16 13:32:44 -04:00
Eric Banks	f368e568db	Implementing support in BaseRecalibrator for SOLiD no call strategies other than throwing an exception. For some reason we never transfered these capabilities into BQSRv2 earlier.	2012-08-15 22:52:56 -04:00
Mark DePristo	f032e0aba4	A bit better output for ContextCovariate context size logging	2012-08-12 13:45:52 -04:00
Mark DePristo	243af0adb1	Expanded the BQSR reporting script -- Includes header page -- Table of arguments (Arguments) -- Summary of counts (RecalData0) -- Summary of counts by qual (RecalData1) -- Fixed bug in output that resulted in covariates list always being null (updated md5s accordingly) -- BQSR.R loads all relevant libaries now, include gplots, grid, and gsalib to run correctly	2012-08-12 13:45:14 -04:00
Mark DePristo	458bbdee8f	Add useful logger.info telling us the mismatch and indel context sizes	2012-08-12 10:27:05 -04:00
Eric Banks	0a2a646a52	Other random FindBugs fixes	2012-08-08 14:56:27 -04:00
Eric Banks	a0196c9f5b	Quick pass of FindBugs 'method invokes inefficient Number constructor' fixes.	2012-08-08 14:34:16 -04:00
Mark DePristo	80b94a4f9a	AdaptiveContexts implement pruning to a given chi2 p value -- Added bonferroni corrected p-value pruning, so you tell it how significant of a different you are willing to collapse in the tree, and it prunes the tree down to this maximum threshold -- Penalty is now a phred-scaled p-value not the raw chi2 value -- Split command line arguments in VisualizeContextTree into separate arguments for each type of pruning	2012-08-07 17:22:39 -04:00
Mark DePristo	982c735c76	VisualizeAdaptiveTree now considers only leaf nodes when computing max/min penalty	2012-08-07 17:22:39 -04:00
Mark DePristo	2f004665fb	Fixing public -> private dep	2012-08-06 11:42:55 -04:00
Mark DePristo	7bf5ca51ee	Major bugfix for adaptive contexts -- Basically I was treating the context history in the wrong direction, effectively predicting the further bases in the context based on the closer one. Totally backward. Updated the code to build the tree in the right direction. -- Added a few more useful outputs for analysis (minPenalty and maxPenalty) -- Misc. cleanup of the code -- Overall I'm not 100% certain this is even the right way to think about the problem. Clearly this is producing a reasonable output but the sum of chi2 values over the entire tree is just enormous. Perhaps a MCMC convergence / sampling criterion would be a better way to think about this problem?	2012-08-06 11:42:55 -04:00
Mark DePristo	b4841548f1	Bug fixes and misc. improvements to running the adaptive context tools -- Better output file name defaults -- Fixed nasty bug where I included non-existant quals in the contexts to process because they showed up in the Cycle covariate -- Data is processed in qual order now, so it's easier to see progress -- Logger messages explaining where we are in the process -- When in UPDATE mode we still write out the information for an equivalent prune by depth for post analysis	2012-08-06 11:42:55 -04:00
Mark DePristo	e1bba91836	Ready for full-scale evaluation adaptive BQSR contexts -- VisualizeContextTree now can write out an equivalent BQSR table determined after adaptive context merging of all RG x QUAL x CONTEXT trees -- Docs, algorithm descriptions, etc so that it makes sense what's going on -- VisualizeContextTree should really be simplified when into a single tool that just visualize the trees when / if we decide to make adaptive contexts standard part of BQSR -- Misc. cleaning, organization of the code (recalibation tests were in private but corresponding actual files were public)	2012-08-03 16:02:53 -04:00
Mark DePristo	0c4e729e13	Working version of adaptive context calculations -- Uses chi2 test for independences to determine if subcontext is worth representing. Give excellent visual results -- Writes out analysis output file producing excellent results in R -- Trivial reformatting of MathUtils	2012-07-31 08:11:04 -04:00
Mark DePristo	93640b382e	Preliminary version of adaptive context covariate algorithm -- Works according to visual inspection of output tree	2012-07-31 08:11:04 -04:00
Mark DePristo	315d25409f	Improvement to RecalDatum and VisualizeContextTree -- Reorganize functions in RecalDatum so that error rate can be computed indepentently. Added unit tests. Removed equals() method, which is a buggy without it's associated implementation for hashcode -- New class RecalDatumTree based on QualIntervals that inherits from RecalDatum but includes the concept of sub data -- VisualizeContextTree now uses RecalDatumTree and can trivially compute the penalty function for merging nodes, which it displays in the graph	2012-07-31 08:11:04 -04:00
Mark DePristo	57b45bfb1e	Extensive unit tests, contacts, and documentation for RecalDatum	2012-07-31 08:11:03 -04:00
Mark DePristo	e00ed8bc5e	Cleanup BQSR classes -- Moved most of BQSR classes (which are used throughout the codebase) to utils.recalibration. It's better in my opinion to keep commonly used code in utils, and only specialized code in walkers. As code becomes embedded throughout GATK its should be refactored to live in utils -- Removed unncessary imports of BQSR in VQSR v3 -- Now ready to refactor QualQuantizer and unit test into a subclass of RecalDatum, refactor unit tests into RecalDatum unit tests, and generalize into hierarchical recal datum that can be used in QualQuantizer and the analysis of adaptive context covariate -- Update PluginManager to sort the plugins and interfaces. This allows us to have a deterministic order in which the plugin classes come back, which caused BQSR integration tests to temporarily change because I moved my classes around a bit.	2012-07-31 08:11:03 -04:00
Eric Banks	8dbc9cb29c	Add the ability to emit the original quals in the OQ tag	2012-07-17 15:52:56 -04:00
Eric Banks	305db8c0d1	Total rewrite of the isGATKLite() functionality with help of Khalid/David. PluginManager was not working for us.	2012-07-17 15:11:03 -04:00
Eric Banks	62c5228048	1) Revert previous change - indel recalibration is turned on by default and users of the Lite version will need to turn it off to avoid a User Error. 2) Implemented the engine.isGATKLite() method.	2012-07-17 12:23:40 -04:00
Eric Banks	40618ac471	A bunch of BQSR changes: 1) by default we do not emit indel quals, but they can be turned on with --enable_indel_quals. 2) We check whether or not we are running in Lite mode (not done yet) and if so and the user is trying to recalibrate indels, we throw a User Error (not supported). 3) Like v1 we now allow the user to set the qual value below which we don't recalibrate (this was the remaining source of differences in the v1 vs. v2 plots).	2012-07-17 10:52:43 -04:00
Eric Banks	dd571d9aa0	Added a --no_indel_quals argument that when used with -BQSR inhibits the writing of base insertion and base deletion quality tags.	2012-07-04 01:22:20 -04:00
Eric Banks	a4670113bd	Refactored/renamed the nested integer array; cleaned up code a bit.	2012-07-03 00:12:33 -04:00
Eric Banks	cac72bce91	Initial version of int indexed mapping for BQSR. Will be cleaned up in a bit.	2012-07-02 14:33:33 -04:00
Eric Banks	96ea334bf2	Disable caching in BQSR for now since it significantly slows down computation; will look into this in a bit.	2012-06-28 15:27:44 -04:00
Eric Banks	1fafd9f6c8	NestedHashMap-based implementation of BQSRv2 along with a few minor optimizations. Not a huge runtime upgrade over the long bitset approach, but it allows us to implement further optimizations going forward. Integration test change because the original version had a bug in the quantized qual table creation.	2012-06-27 16:55:49 -04:00
Eric Banks	783b7f6899	Misc cleanup	2012-06-15 10:39:19 -04:00
Eric Banks	0c218e4822	Refactoring mostly for readability (and small performance improvement)	2012-06-15 10:36:41 -04:00
Eric Banks	4895fe2289	No more extraneous array creation in BQSR covariate classes; now covariates push their data directly to the ReadCovariates class as it's calculated (no more going through CovariateValues.java)	2012-06-15 02:32:00 -04:00
Eric Banks	5c3c6cbc40	Long -> long conversions in BQSR	2012-06-14 09:07:02 -04:00

1 2

75 Commits (c0261f75ce67b35dfb6c6308785633bf95a7be24)