gatk-3.8

Commit Graph

Author	SHA1	Message	Date
Eric Banks	07c3bd32b3	Bug fix: merge NO_VARIATION records with those of another type. The sad part is that this WAS covered by integration tests but someone updated the MD5s without actually paying attention...	2012-03-21 12:42:13 -04:00
Eric Banks	dcf2fa361d	Minor cleanup	2012-03-21 12:14:31 -04:00
Eric Banks	ab1c48745b	Need to catch RuntimeExceptions coming out of Picard too so that they show up as UserErrors (some BAM errors are thrown as REs).	2012-03-21 12:13:52 -04:00
Ryan Poplin	9e10779fa7	Caching log calculations cut the non-Map runtime of HaplotypeCaller in half. Moved the qual log cache used in HC and PairHMM into a common place and added unit tests.	2012-03-21 08:45:42 -04:00
Mauricio Carneiro	0e93cf5297	Taking care of bad cigars in the GATK * fixed BadCigarFilter to filter out reads starting/ending in deletion and that have adjacent I/D events. * added Unit tests for BadCigarFilter * updated all exceptions in LocusIteratorByState to tell the user that he can instead run with -rf BadCigar * added the BadCigar filter to ReduceReads and RealignTargetCreator (if your walker blows up with these malformed reads, you may want to add it too)	2012-03-20 14:32:57 -04:00
Eric Banks	b290152542	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-03-20 08:56:04 -04:00
Eric Banks	5e79046c98	Minor change but I realized from Mark's commit that the code I stole it from was flawed	2012-03-20 08:55:56 -04:00
Mark DePristo	5ecfc49f74	Minor cleanup of MergeIntervalLists (example, please look) -- Note that isDone() is override to return true. This causes the GATK to cleanly stop processing early.	2012-03-20 07:49:27 -04:00
Mark DePristo	36636eb323	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-03-20 07:47:24 -04:00
Eric Banks	ade1971581	Since we allow any generic header types, there's no longer any reason to check for supported types	2012-03-20 00:12:17 -04:00
Eric Banks	4910ef86d9	Added a to-do for Khalid	2012-03-19 23:12:58 -04:00
Eric Banks	5a3afd768d	Walker to merge multiple bed/interval files into a single consensus. 'Walker' is used loosely here; there must be a better way to do this, but I don't know how within the GATK framework.	2012-03-19 22:42:48 -04:00
Eric Banks	2324c5a74f	Simplified the interface for simple VCF header lines by making the VCFSimpleHeaderLine not abstract anymore - now any arbitrary header line with an ID (e.g. the contig and ALT lines) can be part of this class without having to define new classes. Also, renamed the 'named' header line to 'id' since that's more accurate.	2012-03-19 21:29:24 -04:00
Ryan Poplin	069ccdfdd4	Fixing broken HC integration tests while changes to exact model are being formulated.	2012-03-19 16:56:51 -04:00
Mauricio Carneiro	633b5c687d	Fixing MD5's (new GATKReport header was missing from old md5's)	2012-03-19 15:28:45 -04:00
Mauricio Carneiro	9cf4df15e5	BQSR recal script (just so we can scatter-gather)	2012-03-19 15:28:45 -04:00
Khalid Shakir	875dc5ef95	Re-added non-verbose MultiallelicSummary to HSP eval.	2012-03-19 14:40:31 -04:00
Khalid Shakir	e8b083ac20	Merged bug fix from Stable into Unstable	2012-03-19 14:37:36 -04:00
Khalid Shakir	d0056d6c71	Updated HSP dbsnp from 132 to 135 along with other minor patches.	2012-03-19 14:36:38 -04:00
Roger Zurawicki	7afb333811	GATK Report code cleanup - Updated the documentation on the code - Made the table.write() method private and updated necessary files. - Added a constructor to GATKReport that takes GATKReportTables - Optimized my code Signed-off-by: Mauricio Carneiro <carneiro@broadinstitute.org>	2012-03-19 11:53:57 -04:00
Mauricio Carneiro	0d4ea30d6d	Updating the BQSR Gatherer to the new file format This is important for quick turnaround in the analysis cycle of the new covariates. Also added a dummy unit test that doesn't really test anything (disabled), but helps in debugging.	2012-03-19 09:02:27 -04:00
Mark DePristo	37d979d98d	GATK performance over time includes GATK 1.5	2012-03-18 19:49:26 -04:00
Ryan Poplin	1c67a62fc0	Updating LikelihoodCalculationEngineUnitTest	2012-03-18 16:39:58 -04:00
Ryan Poplin	943b1d34f8	intermediate commit to aid in debugging HC / exact model changes. HC integration tests will still fail	2012-03-18 15:50:27 -04:00
Ryan Poplin	c4f4d16490	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-03-18 14:27:42 -04:00
Eric Banks	9223e451a3	Merged bug fix from Stable into Unstable	2012-03-18 00:54:19 -04:00
Eric Banks	5c5d8e7cd3	Minor: cleaner way of turning off index-on-the-fly checking in case we want to turn it back on.	2012-03-18 00:53:29 -04:00
Eric Banks	344a938a70	When checking to make sure that we have cached enough data in the PL array, use the converted index value since that's what will be used as an index into the array.	2012-03-18 00:36:30 -04:00
Ryan Poplin	4f2f1cbca9	misc optimizations to the HMM code related to allocating and initializing the big state space arrays	2012-03-17 14:07:11 -04:00
Guillermo del Angel	a27a9ccba2	Merged bug fix from Stable into Unstable	2012-03-16 21:15:30 -04:00
Guillermo del Angel	a05a7f287d	TMP: disable checking of whether on the fly index is equal to index after run completed	2012-03-16 21:14:45 -04:00
Eric Banks	539d51f324	Resolving conflicts	2012-03-16 14:36:07 -04:00
Eric Banks	be9e48ba29	Merged bug fix from Stable into Unstable	2012-03-16 14:33:53 -04:00
Eric Banks	a7578e85e8	Rewriting a few of the indel integration tests for multi-allelics. The old tests were running b37 calls against a b36 reference, so the calls were all ref. The new tests are run against the pilot1 data and then those calls are fed back into the the same bam to test genotype given alleles, with a sprinkling of bi- and tri-allelics.	2012-03-16 14:21:27 -04:00
Mauricio Carneiro	ec4a870a0f	Added @PG tag to ReduceReads Pulled out the functionality from Indel Realigner and Table Recalibrator into Utils.setupWriter to make everyone else's life's easier if they want to include the PG tag in their walkers.	2012-03-16 14:09:07 -04:00
Mauricio Carneiro	e4cbeddf2d	adding on-the-fly recalibration test data	2012-03-16 13:18:16 -04:00
Mauricio Carneiro	3bfca0ccfd	BitSet implementation of the on-the-fly recalibration using the CSV format file. Infrastructure: * Added static interface to all different clipping algorithms of low quality tail clipping * Added reverse direction pileup element event lookup (indels) to the PileupElement and LocusIteratorByState * Complete refactor of the KeyManager. Much cleaner implementation that handles keys with no optional covariates (necessary for on-the-fly recalibration) * EventType is now an independent enum with added capabilities. All functionality is now centralized. BQSR and RecalibrateBases: * On-the-fly recalibration is now generic and uses the same bit set structure as BQSR for a reduced memory footprint * Refactored the object creation to take advantage of the compact key structure * Replaced nested hash maps with single hash maps indexed by bitsets * Eliminated low quality tails from the context covariate (using ReadClipper's write N's algorithm). * Excluded contexts with N's from the output file. * Fixed cycle covariate for discrete platforms (need to check flow cycle platforms now!) * Redfined error for indels to look at the previous base in negative strand reads (using new PE functionality) * Added the covariate ID (for optional covariates) to the output for disambiguation purposes * Refactored CovariateKeySet -- eventType functionality is now handled by the EventType enum. * Reduced memory usage of the BQSR script to 4 Tests: * Refactored BQSRKeyManagerUnitTest to handle the new implementation of the key manager * Added tests for keys without optional covariates * Added tests for on-the-fly recalibration (but more tests are necessary)	2012-03-16 13:02:15 -04:00
Mauricio Carneiro	ca11ab39e7	BitSets keys to lower BQSR's memory footprint Infrastructure: * Generic BitSet implementation with any precision (up to long) * Two's complement implementation of the bit set handles negative numbers (cycle covariate) * Memoized implementation of the BitSet utils for better performance. * All exponents are now calculated with bit shifts, fixing numerical precision issues with the double Math.pow. * Replace log/sqrt with bitwise logic to get rid of numerical issues BQSR: * All covariates output BitSets and have the functionality to decode them back into Object values. * Covariates are responsible for determining the size of the key they will use (number of bits). * Generalized KeyManager implementation combines any arbitrary number of covariates into one bitset key with event type * No more NestedHashMaps. Single key system now fits in one hash to reduce hash table objects overhead Tests: * Unit tests added to every method of BitSetUtils * Unit tests added to the generalized key system infrastructure of BQSRv2 (KeyManager) * Unit tests added to the cycle and context covariates (will add unit tests to all covariates)	2012-03-16 13:01:48 -04:00
Eric Banks	7424041a17	Updating integration tests to deal with the new GL framework. Now multi-allelic indel calls are correct.	2012-03-16 12:50:39 -04:00
Eric Banks	dce6b91f7d	Add a conversion from the deprecated PL ordering to the new one. We need this for the DiploidSNPGenotypeLikelihoods which still use the old ordering. My intention is for this to be a temporary patch, but changing the ordering in DiploidSNPGenotypeLikelihoods is not appriopriate for committing to stable as it will break all of the external tools (e.g. MuTec) that are built on top of the class. We will have to talk to e.g. Kristian to see how disruptive this will be. Added unit tests to the GL conversions and indexing.	2012-03-16 11:14:37 -04:00
Eric Banks	41068b6985	The commit constitutes a major refactoring of the UG as far as the genotype likelihoods are concerned. I hate to do this in stable, but the VCFs currently being produced by the UG are totally busted. I am trying to make just the necessary changes in stable, doing everything else in unstable later. Now all GL calculations are unified into the GenotypeLikelihoods class - please try and use this functionality from now on instead of duplicating the code.	2012-03-15 16:08:58 -04:00
Ryan Poplin	e86ce8f3d6	updating HaplotypeCaller integration tests to reflect all the recent changes.	2012-03-15 14:56:35 -04:00
Ryan Poplin	0c6b34e9df	Fixing a bug identified by the ActivityProfile unit tests	2012-03-15 14:24:30 -04:00
Ryan Poplin	252b830aa8	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-03-15 11:56:04 -04:00
Ryan Poplin	0fa5a7af05	Adding contracts and unit tests for HaplotypeCaller GenotypingEngine	2012-03-15 11:55:48 -04:00
Ryan Poplin	c1f454fbe6	cleaning up and expanding LikelihoodCalculationEngine unit tests	2012-03-15 08:53:11 -04:00
Mauricio Carneiro	c865950923	fixing my typo on the md5.	2012-03-14 22:00:03 -04:00
Ryan Poplin	1212a65140	Adding contracts and unit tests for HaplotypeCaller LikelihoodCalculationEngine	2012-03-14 21:26:01 -04:00
Ryan Poplin	1429ddcf55	Adding contracts and unit tests for HaplotypeCaller LikelihoodCalculationEngine	2012-03-14 21:25:43 -04:00
Mauricio Carneiro	c045542442	ReduceReads default downsampling strategy is now NORMAL Adaptive downsampler had an undesirable behavior in strange regions of the genome. This is a temporary fix, both downsamplers will be made obsolete when engine's positional downsampler gets generalized to read walkers.	2012-03-14 17:29:47 -04:00

... 3 4 5 6 7 ...

9274 Commits (f9f8589692fece0185a7e8e059b75ee4672d1c8d) All Branches Search

9274 Commits (f9f8589692fece0185a7e8e059b75ee4672d1c8d)

All Branches