gatk-3.8

Commit Graph

Author	SHA1	Message	Date
Mauricio Carneiro	ca11ab39e7	BitSets keys to lower BQSR's memory footprint Infrastructure: * Generic BitSet implementation with any precision (up to long) * Two's complement implementation of the bit set handles negative numbers (cycle covariate) * Memoized implementation of the BitSet utils for better performance. * All exponents are now calculated with bit shifts, fixing numerical precision issues with the double Math.pow. * Replace log/sqrt with bitwise logic to get rid of numerical issues BQSR: * All covariates output BitSets and have the functionality to decode them back into Object values. * Covariates are responsible for determining the size of the key they will use (number of bits). * Generalized KeyManager implementation combines any arbitrary number of covariates into one bitset key with event type * No more NestedHashMaps. Single key system now fits in one hash to reduce hash table objects overhead Tests: * Unit tests added to every method of BitSetUtils * Unit tests added to the generalized key system infrastructure of BQSRv2 (KeyManager) * Unit tests added to the cycle and context covariates (will add unit tests to all covariates)	2012-03-16 13:01:48 -04:00
Eric Banks	2314787767	Generalizing to avoid JDK 1.7 incompatibilities	2012-03-12 22:50:59 -04:00
Ryan Poplin	14a77b1e71	Getting rid of redundant methods in MathUtils. Adding unit tests for approximateLog10SumLog10 and normalizeFromLog10. Increasing the precision of the Jacobian approximation used by approximateLog10SumLog which changes the UG+HC integration tests ever so slightly.	2012-03-05 12:28:32 -05:00
Mauricio Carneiro	d379c3763a	DNA Sequence to BitSet and vice-versa conversion tools * Turns DNA sequences (for context covariates) into bit sets for maximum compression * Allows variable context size representation guaranteeing uniqueness. * Works with long precision, so it is limited to a context size of 31 bases (can be extended with BigNumber precision if necessary). * Unit Tests added	2012-02-29 19:25:20 -05:00
Mauricio Carneiro	75783af6fc	int <-> BitSet conversion utils for MathUtils * added unit tests.	2012-02-21 14:10:36 -05:00
Mauricio Carneiro	4a57add6d0	First implementation of DiagnoseTargets * calculates and interprets the coverage of a given interval track * allows to expand intervals by specified number of bases * classifies targets as CALLABLE, LOW_COVERAGE, EXCESSIVE_COVERAGE and POOR_QUALITY. * outputs text file for now (testing purposes only), soon to be VCF. * filters are overly aggressive for now.	2012-02-03 17:12:43 -05:00
Mauricio Carneiro	3dd6a1f962	Adding some generic sum and average functions to MathUtils	2012-02-03 17:12:43 -05:00
Guillermo del Angel	966387ca0b	Next intermediate commit in the pool caller. Lots of bug fixes and now we can emit true vcf's with calls in discovery mode (still of unknown quality) - old validation mode is temporarily broken,will be fixed in next refactoring.	2012-01-23 09:22:31 -05:00
Guillermo del Angel	b123416c4c	Resolve stale merge changes	2012-01-18 20:56:36 -05:00
Guillermo del Angel	2eb45340e1	Initial, raw, mostly untested version of new pool caller that also does allele discovery. Still needs debugging/refining. Main modification is that there is a new operation mode, set by argument -ALLELE_DISCOVERY_MODE, which if true will determine optimal alt allele at each computable site and will compute AC distribution on it. Current implementation is not working yet if there's more than one pool and it will only output biallelic sites, no functionality for true multi-allelics yet	2012-01-18 20:54:10 -05:00
Eric Banks	e7fe9910f7	Create the temp storage for calculating cell values just once as per Mark's TODO	2012-01-12 10:27:10 -05:00
Eric Banks	25d0d53d88	Moving the approximate summing of log10 vals to MathUtils; keeping the more efficient implementation of fast rounding.	2012-01-10 12:38:47 -05:00
Mauricio Carneiro	4a208c7c06	Refactor of the downsampling machinery to accept different strategies * Implemented Adaptive downsampler * Added integration test * Added option to RRead scala script to choose downsampling strategy	2012-01-03 09:29:47 -05:00
Mauricio Carneiro	cd68cc239b	Added knuth-shuffle (KS) and randomSubset using KS to MathUtils * Knuth-shuffle is a simple, yet effective array permutator (hope this is good english). * added a simple randomSubset that returns a random subset without repeats of any given array with the same probability for every permutation. * added unit tests to both functions	2012-01-03 09:29:46 -05:00
Eric Banks	079932ba2a	The log10cache needs to be larger if we want to handle 10K samples in the UG.	2011-12-13 23:36:10 -05:00
Mauricio Carneiro	5ad3dfcd62	BugFix: byte overflow in SyntheticRead compressed base counts * fixed and added unit test	2011-11-21 17:11:50 -05:00
Mauricio Carneiro	36600fd8e9	added MQ of low MQ/BQ to consensus RMS Bases that were excluded for MQ and BQ filters are now contributing to the MQ RMS (but not to consensus base counts and variant/not variant region triggers).	2011-11-01 17:46:12 -04:00
Guillermo del Angel	9afccd11b1	Minor refactoring: add ability to MathUtils.normalizeFromLog10 to not go to linear domain but just substract max value from log values and return. Use this function in snp and indel GL computation.	2011-09-25 21:18:56 -04:00
Guillermo del Angel	a807205fc3	a) Minor optimization to softMax() computation to avoid redundant operations, results in about 5-10% increase in speed in indel calling. b) Added (but left commented out since it may affect integration tests and to isolate commits) fix to per-sample DP reporting, so that deletions are included in count. c) Bug fix to avoid having non-reference genotypes assigned to samples with PL=0,0,0. Correct behavior should be to no-call these samples, and to ignore these samples when computing AC distribution since their likelihoods are not informative.	2011-09-09 18:00:23 -04:00
Mauricio Carneiro	fd540592ab	Added RMS calculation for consensus MQ Consensus MQ is now the average of the RMS of the mapping qualities of the reads making each site.	2011-08-30 02:45:20 -04:00
Mauricio Carneiro	bb557266ca	Merge branches to get new RodBinding framework Conflicts: private/java/src/org/broadinstitute/sting/gatk/walkers/replication_validation/ReplicationValidationWalker.java	2011-08-10 18:23:01 -04:00
Eric Banks	197169e47b	Submitting patch from Larry Singh to make MathUtils compatible with java 1.7	2011-08-08 13:34:04 -04:00
Mauricio Carneiro	b22a3d6508	Functional VCF output. It is outputting a VCF with the 'second best guess' for the alternate allele correctly. Annotations are added at the pool level, but may get overwritten at the lane and site level. Still need to implement the merging of the the annotations at higher levels.	2011-08-04 17:49:08 -04:00
Mauricio Carneiro	a58ddab93b	minQual and minPower filters added. VCF output added. Calls are now made based on the likelihood AC model. Two filters are applied: minQual and minPower. Output is now a VCF file with the variant context. It's now called the gatk's PoolCaller, no longer Replication Validation framework. Lots of testing ensue....	2011-07-28 18:58:36 -04:00
Mauricio Carneiro	8d7ef1bb51	Complete refactor of the ReplicationValidation framework, plus the following new functionality: * merges all pools in a lane. * merges all lanes in a site.	2011-07-21 21:39:00 -04:00
Mark DePristo	9992c373be	Optimize imports run on the whole project, public and private. I just got too tired of all of the unused imports floating around. Confirmed that the system builds after the changes.	2011-07-17 20:29:58 -04:00
David Roazen	3c9497788e	Reorganized the codebase beneath top-level public and private directories, removing the playground and oneoffprojects directories in the process. Updated build.xml accordingly.	2011-06-28 06:55:19 -04:00

27 Commits (9e10779fa77e34564e6050c768636a31f196e05b)