gatk-3.8

Commit Graph

Author	SHA1	Message	Date
Eric Banks	47a0f5859e	Don't run these tests if not GAKT lite	2012-10-31 22:56:38 -04:00
Eric Banks	f8af8a2355	Moving UG integration tests to protected since they use protected-only contamination filtering. Adding a new UGLite integration test to confirm that contamination filtering is ignored in lite.	2012-10-31 21:28:07 -04:00
Eric Banks	eccb76c304	Only run UG in the bundle for chr20	2012-10-30 15:09:46 -04:00
Eric Banks	e1e480a0b9	Bug fix: don't add no-call alleles to the list of ALT alleles being validated.	2012-10-30 14:54:29 -04:00
Eric Banks	2aa28abe0a	Fixing md5s to reflect the new HapMap file	2012-10-30 14:27:10 -04:00
Eric Banks	8a402024c2	Updating bundle script to handle new naming convention of CEU trio best practices callset	2012-10-30 09:11:56 -04:00
Eric Banks	c95e893920	Better error message for unused ALT alleles	2012-10-29 21:51:35 -04:00
Eric Banks	b6a1967f12	Better documentation for ValidateVariants so that people realize it's used for strict validation of the VCF file. Added an option to turn off strict validation and an integration test to cover it.	2012-10-29 21:47:09 -04:00
Ryan Poplin	21fa5f70ca	Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-10-29 18:53:41 -04:00
Ryan Poplin	5ee2feb2a3	updating pipeline test md5s	2012-10-29 18:53:27 -04:00
Eric Banks	be902375ac	'Bug' fix: fix the error message from the vcf validator so people realize that the file fails strict validation but still adheres to the spec.	2012-10-29 16:29:27 -04:00
Ryan Poplin	4e661847b2	DelocalizedBaseRecalibrator becomes the BaseRecalibrator.	2012-10-29 12:53:39 -04:00
Eric Banks	ac99437eec	Bug fixes to hapmap conversion in VariantsToVCF	2012-10-29 01:45:33 -04:00
Eric Banks	43625f652e	Shoot, mixed up the md5s last time.	2012-10-27 19:43:46 -04:00
Andrey Sivachenko	f3ac5d404d	updating vcf header attribute descriptions in order to reflect correctly what's actually being written...	2012-10-26 23:52:21 -04:00
Andrey Sivachenko	b4fbf6280a	fixing missing sample genotype bug, missing AD/DP bug, and putting annotations in more natural order (Ref/Alt)	2012-10-26 23:48:40 -04:00
Mark DePristo	ac5e58a265	Bugfix for GSA-540 / Update metadata maps when adding lines to VCFHeader -- https://jira.broadinstitute.org/browse/GSA-540 -- http://gatkforums.broadinstitute.org/discussion/1433/possible-bug-and-fix-in-java-code-of-vcfheader-org-broadinstitute-sting-utils-codecs-vcf-vcfheader	2012-10-26 16:34:16 -04:00
Mark DePristo	fa9b2a91d0	Bugfix for GSA-552 -- https://jira.broadinstitute.org/browse/GSA-552 -- User reports a null exception while using VariantsToVCF: http://gatkforums.broadinstitute.org/discussion/1461/nullpointerexception-converting-vcf3-to-vcf-using-variantstovcf The problem is that he left out an input VCF file for the --variant argument and the command-line argument parsing code didn't catch this, so we NPE out later on.	2012-10-26 16:34:16 -04:00
Eric Banks	682a72faf7	Hmm, thought I got all the md5s last time. Apparently not.	2012-10-26 16:10:12 -04:00
Eric Banks	f66d812778	Merge branch 'master' of ssh://gsa2/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-10-26 13:20:41 -04:00
Eric Banks	a8704ca73f	Adding TODO notes for Ami	2012-10-26 13:20:27 -04:00
Mark DePristo	251983b8fb	Add GATK-wide command line argument to control the maximum runtime allowed for the GATK -- Providing this optional argument -maxRuntime (in -maxRuntimeUnits units) causes the GATK to exit gracefully when the max. runtime has been exceeded. By cleanly I mean that the engine simply stops at the next available cycle in the walker as through the end of processing had been reached. This means that all output files are closed properly, etc. -- Emits an info message that looks like "INFO 10:36:52,723 MicroScheduler - Aborting execution (cleanly) because the runtime has exceeded the requested maximum 10.0000 s". Otherwise there's currently no way to differentiate a truly completed run from a timelimit exceeded run, which may be a useful thing for a future update -- Resolves GSA-630 / GATK max runtime to deal with bad LSA calling? -- Added new JIRA entry for Ami to restart chr1 macarthur with this argument set to -maxRuntime 1 -maxRuntimeUnits DAYS to see if we can do all of chr1 in one weekend.	2012-10-26 13:18:34 -04:00
Eric Banks	ed11b7dab2	Fix UG parallelization test	2012-10-26 12:10:44 -04:00
Eric Banks	7a706ed345	Fix some of the broken integration tests	2012-10-26 11:23:44 -04:00
Eric Banks	ebebec7fdb	Accidentally left one test disabled	2012-10-26 02:15:32 -04:00
Eric Banks	b06f689d4b	Merge branch 'master' of ssh://gsa2/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-10-26 02:13:26 -04:00
Eric Banks	a53e03d525	Do not let reduced reads get removed in the contamination down-sampling	2012-10-26 02:13:04 -04:00
Eric Banks	bf3d61ce82	The default value for --contamination_fraction_to_filter is now 0.05 (5%) in both UG and HC. Users of GATK-lite get pushed down to 0% by default (since it's not enabled) or get a user error if they try to set it.	2012-10-26 01:04:51 -04:00
Eric Banks	91f2c847a3	Fixing problem reported on forum for VF: DP couldn't be filtered from the FORMAT field, only from the INFO field. Fixed and added integration test.	2012-10-26 00:57:40 -04:00
Mark DePristo	6b8b7df651	Queue now understands -nct and requests the appropriate number of cores from LSF, SGE, etc -- NCT wasn't previously recognized by Queue as needing more processors per machine. This commit fixes this. Also a potential cause of poor GATKPerformanceOverTime, in that runs with -nct could flood a node and cause it to have hundreds of cores in contention.	2012-10-25 17:26:58 -04:00
David Roazen	422e16c62e	BaseRecalibration: don't cache instances of ReadCovariates across reads Caching and reusing ReadCovariates instances across reads sounds good in theory, but: -it doesn't work unless you zero out the internal arrays before each read -the internal arrays must be sized proportionally to the maximum POSSIBLE recalibrated read length (5000!!!), instead of the ACTUAL read lengths By contrast, creating a new instance per read is basically equivalent to doing an efficient low-level memset-style clear on a much smaller array (since we use the actual rather than the maximum read length to create it). So this should be faster than caching instances and calling clear() but slower than caching instances and not calling clear(). Credit to Ryan to proposing this approach.	2012-10-25 17:02:55 -04:00
Ami Levy Moonshine	dde3060bb8	add the CEUtrio best practices results (UG + PBT) to the bundle	2012-10-25 15:36:17 -04:00
Ami Levy Moonshine	90b9971033	Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-10-25 15:32:29 -04:00
David Roazen	884d031e72	NestedIntegerArray: Pre-allocate only the first two dimensions It turns out that pre-allocating the entire tree was too expensive in terms of memory when using large values for the -mcs and -ics parameters. Pre-allocating the first two dimensions prevents us from ever locking the root node during a put(). Contention between threads over lower levels of the tree should be minimal given that puts are rare compared to gets. Also output dimensions and pre-allocation info at startup. If pre-allocation takes longer than usual this gives the user a sense of what is causing the delay.	2012-10-25 15:17:42 -04:00
Mark DePristo	cc8c12b954	Committing a broken version of BaseRecalibration -- I'm committing because there's some kind of fundamental problem with the ReadCovariates cache, in that historical data isn't being cleared / computed properly, and I'd rather it fail for a while than leave it in JIRA. -- The integration tests test the -nct with PrintReads to get 1, 2, 4 and the 4 fails. But that's because of this incorrect calculation -- Updating GATKPerformanceOverTime with the new @ClassType annotation	2012-10-25 14:46:35 -04:00
Eric Banks	e93ff3ea6e	Let's go back to having the SB/SLOD NOT computed by default. If you recall, it was only enabled by default because we thought we were going to use it when we made VQSR use random forests. But since we decided not to change VQSR, there's no reason to triple the computation for every variant site anymore.	2012-10-25 12:45:23 -04:00
Eric Banks	6dc7d872ec	Fix GenotypeAndValidate to handle SNPs and indels as reported on the forum. Recent changes to the UnifiedArgumentCollection made this stop working. Adding in JIRA to create integration tests for this tool.	2012-10-25 10:06:13 -04:00
Eric Banks	c53c55da12	Re-enable tests	2012-10-25 09:37:08 -04:00
Eric Banks	e6652f7777	Added integration test for contamination down-sampling	2012-10-25 09:36:05 -04:00
Eric Banks	df9e0b7045	Merge branch 'master' of ssh://gsa2/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-10-25 02:49:54 -04:00
Eric Banks	72714ee43e	Minor patches to get the contamination down-sampling working for indels. Adding @Hidden logging output for easy debugging.	2012-10-25 02:47:42 -04:00
Eric Banks	c6b57fffda	Added allele biased down-sampling capabilities to the PerReadAlleleLikelihoodMap object, which means that both the UG and HC can use this functionality. Note that it's only available in protected, so GATK-lite users won't be allowed to enable it. Needs more testing.	2012-10-24 22:52:25 -04:00
Ami Levy Moonshine	bcf3582095	Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-10-24 21:50:41 -04:00
Eric Banks	9da7bbf689	Refactoring the PerReadAlleleLikelihoodMap in preparation for adding contntamination downsampling into protected only.	2012-10-24 15:49:07 -04:00
David Roazen	d9aa9855f8	Better comments in NestedIntegerArray	2012-10-24 15:29:13 -04:00
David Roazen	02018ca764	Legacy BaseRecalibrator walker is neither TreeReducible nor NanoSchedulable The old BaseRecalibrator walker is and never will be thread-safe, since it's a LocusWalker that uses read attributes to track state. ONLY the newer DelocalizedBaseRecalibrator is believed likely to be thread-safe at this point. It is safe to run the DelocalizedBaseRecalibrator with -nct > 1 for testing purposes, but wait for further testing to be done before using it for production purposes in multithreaded mode.	2012-10-24 15:22:50 -04:00
David Roazen	32a6d7000a	Thread-safe ReadGroupCovariate The ReadGroupCovariate class was not thread-safe. This led to horrible race conditions in multithreaded runs of the BQSR where (for example) the same read group could get inserted into the reverse lookup table twice with different IDs. Should fix the intermittent crash reported in GSA-492.	2012-10-24 15:22:50 -04:00
David Roazen	991658acf4	BQSR: use more granular locking for concurrency control -With this change, BQSR performance scales properly by thread rather than gaining nothing from additional threads. -Benefits are seen when using either -nt (HierarchicalMicroScheduler) or -nct (NanoScheduler) -Removes high-level locks in the recalibration engines and NestedIntegerArray in favor of maximally-granular locks on and around manipulation of the leaf nodes of the NestedIntegerArray. -NestedIntegerArray now creates all interior nodes upfront rather than on the fly to avoid the need for locking during tree traversals. This uses more memory in the initial part of BQSR runs, but the BQSR would eventually converge to use this memory anyway over the course of a typical run. IMPORTANT NOTE: This does not mean it's safe to run the old BaseRecalibrator walker with multiple threads. The BaseRecalibrator walker is and will never be thread-safe, as it's a LocusWalker that uses read attributes to track state information. ONLY the newer DelocalizedBaseRecalibrator can be made thread-safe (and will hopefully be made so in my subsequent commits). This commit addresses performance, not correctness.	2012-10-24 15:22:50 -04:00
Eric Banks	5b7b42356b	Fix bug in GenotypeAndValidate where it doesn't check vc.hasAttribute() before using vc.getAttribute().	2012-10-24 14:02:50 -04:00
Mark DePristo	6e421a72d6	Add more exhaustive unit tests for input errors to NanoScheduler -- Resolves issue GSA-515 / Nanoscheduler GSA-605 / Seems that -nct may deadlock as not reproducible -- It seems that it's not an input error problem (or at least cannot be provoked with unit tests) -- I'll keep an eye on this later	2012-10-23 20:11:29 -04:00

1 2 3 4 5 ...

3291 Commits (eae2d019cf6f9b5b359a3bbc0b0066f490464f44)