Commit Graph

10966 Commits (0a56fe5bc33f9dfd40e25005f2e037bc36e9ffdc)

Author SHA1 Message Date
Eric Banks 0a56fe5bc3 Merge remote-tracking branch 'unstable/master' 2012-10-31 12:17:24 -04:00
Eric Banks eccb76c304 Only run UG in the bundle for chr20 2012-10-30 15:09:46 -04:00
Eric Banks e1e480a0b9 Bug fix: don't add no-call alleles to the list of ALT alleles being validated. 2012-10-30 14:54:29 -04:00
Eric Banks 2aa28abe0a Fixing md5s to reflect the new HapMap file 2012-10-30 14:27:10 -04:00
Eric Banks 8a402024c2 Updating bundle script to handle new naming convention of CEU trio best practices callset 2012-10-30 09:11:56 -04:00
Eric Banks c95e893920 Better error message for unused ALT alleles 2012-10-29 21:51:35 -04:00
Eric Banks b6a1967f12 Better documentation for ValidateVariants so that people realize it's used for strict validation of the VCF file. Added an option to turn off strict validation and an integration test to cover it. 2012-10-29 21:47:09 -04:00
Ryan Poplin 21fa5f70ca Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-10-29 18:53:41 -04:00
Ryan Poplin 5ee2feb2a3 updating pipeline test md5s 2012-10-29 18:53:27 -04:00
Eric Banks be902375ac 'Bug' fix: fix the error message from the vcf validator so people realize that the file fails strict validation but still adheres to the spec. 2012-10-29 16:29:27 -04:00
Ryan Poplin 4e661847b2 DelocalizedBaseRecalibrator becomes the BaseRecalibrator. 2012-10-29 12:53:39 -04:00
Eric Banks ac99437eec Bug fixes to hapmap conversion in VariantsToVCF 2012-10-29 01:45:33 -04:00
Eric Banks 43625f652e Shoot, mixed up the md5s last time. 2012-10-27 19:43:46 -04:00
Andrey Sivachenko f3ac5d404d updating vcf header attribute descriptions in order to reflect correctly what's actually being written... 2012-10-26 23:52:21 -04:00
Andrey Sivachenko b4fbf6280a fixing missing sample genotype bug, missing AD/DP bug, and putting annotations in more natural order (Ref/Alt) 2012-10-26 23:48:40 -04:00
Mark DePristo ac5e58a265 Bugfix for GSA-540 / Update metadata maps when adding lines to VCFHeader
-- https://jira.broadinstitute.org/browse/GSA-540
-- http://gatkforums.broadinstitute.org/discussion/1433/possible-bug-and-fix-in-java-code-of-vcfheader-org-broadinstitute-sting-utils-codecs-vcf-vcfheader
2012-10-26 16:34:16 -04:00
Mark DePristo fa9b2a91d0 Bugfix for GSA-552
-- https://jira.broadinstitute.org/browse/GSA-552
-- User reports a null exception while using VariantsToVCF:
   http://gatkforums.broadinstitute.org/discussion/1461/nullpointerexception-converting-vcf3-to-vcf-using-variantstovcf
   The problem is that he left out an input VCF file for the --variant argument and the command-line argument parsing code didn't catch this, so we NPE out later on.
2012-10-26 16:34:16 -04:00
Eric Banks 682a72faf7 Hmm, thought I got all the md5s last time. Apparently not. 2012-10-26 16:10:12 -04:00
Ryan Poplin b0dcc2c78e Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-10-26 15:14:41 -04:00
Ryan Poplin 610e003b93 Fixing a bug in the BQSR where the reference context gets out of sync after a read is adapter clipped inside the walker. 2012-10-26 15:14:27 -04:00
David Roazen 35483a7eef Update MD5s for PrintReads with BQSR Integration Test
The MD5s for these tests were changed in commit 87435f1074615b2cd016f042980109fd53962c8d
to match the output of a broken version of BaseRecalibration. With the patch in
commit c397102ecc1fd1d2cd8f209a8f358ab4a60b50a7, the output once again matches the
*original* MD5s for these tests, and does not vary as you increase -nct.

Final resolution to GSA-632
2012-10-26 14:25:25 -04:00
Eric Banks f66d812778 Merge branch 'master' of ssh://gsa2/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-10-26 13:20:41 -04:00
Eric Banks a8704ca73f Adding TODO notes for Ami 2012-10-26 13:20:27 -04:00
Mark DePristo 251983b8fb Add GATK-wide command line argument to control the maximum runtime allowed for the GATK
-- Providing this optional argument -maxRuntime (in -maxRuntimeUnits units) causes the GATK to exit gracefully when the max. runtime has been exceeded.  By cleanly I mean that the engine simply stops at the next available cycle in the walker as through the end of processing had been reached.  This means that all output files are closed properly, etc.
-- Emits an info message that looks like "INFO  10:36:52,723 MicroScheduler - Aborting execution (cleanly) because the runtime has exceeded the requested maximum 10.0000 s".  Otherwise there's currently no way to differentiate a truly completed run from a timelimit exceeded run, which may be a useful thing for a future update
-- Resolves GSA-630 / GATK max runtime to deal with bad LSA calling?
-- Added new JIRA entry for Ami to restart chr1 macarthur with this argument set to -maxRuntime 1 -maxRuntimeUnits DAYS to see if we can do all of chr1 in one weekend.
2012-10-26 13:18:34 -04:00
Eric Banks 46099af8db Merge branch 'master' of ssh://gsa2/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-10-26 12:10:53 -04:00
Eric Banks ed11b7dab2 Fix UG parallelization test 2012-10-26 12:10:44 -04:00
Eric Banks 7a706ed345 Fix some of the broken integration tests 2012-10-26 11:23:44 -04:00
Yossi Farjoun 27a4d6d90e Merge branch 'master' of ssh://gsa4/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-10-26 09:33:10 -04:00
Yossi Farjoun a3193a1743 fixed LargeScaleValidationCallingSingle.scala for new version of scala 2012-10-26 09:32:19 -04:00
Eric Banks ebebec7fdb Accidentally left one test disabled 2012-10-26 02:15:32 -04:00
Eric Banks b06f689d4b Merge branch 'master' of ssh://gsa2/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-10-26 02:13:26 -04:00
Eric Banks a53e03d525 Do not let reduced reads get removed in the contamination down-sampling 2012-10-26 02:13:04 -04:00
Eric Banks bf3d61ce82 The default value for --contamination_fraction_to_filter is now 0.05 (5%) in both UG and HC. Users of GATK-lite get pushed down to 0% by default (since it's not enabled) or get a user error if they try to set it. 2012-10-26 01:04:51 -04:00
Eric Banks 91f2c847a3 Fixing problem reported on forum for VF: DP couldn't be filtered from the FORMAT field, only from the INFO field. Fixed and added integration test. 2012-10-26 00:57:40 -04:00
Mark DePristo d879c77aca Don't scale up memory requirements by nct for PrintReads tests 2012-10-25 17:43:49 -04:00
Mark DePristo 6b8b7df651 Queue now understands -nct and requests the appropriate number of cores from LSF, SGE, etc
-- NCT wasn't previously recognized by Queue as needing more processors per machine.  This commit fixes this.  Also a potential cause of poor GATKPerformanceOverTime, in that runs with -nct could flood a node and cause it to have hundreds of cores in contention.
2012-10-25 17:26:58 -04:00
David Roazen 422e16c62e BaseRecalibration: don't cache instances of ReadCovariates across reads
Caching and reusing ReadCovariates instances across reads sounds good in theory, but:

-it doesn't work unless you zero out the internal arrays before each read
-the internal arrays must be sized proportionally to the maximum POSSIBLE
recalibrated read length (5000!!!), instead of the ACTUAL read lengths

By contrast, creating a new instance per read is basically equivalent to doing an
efficient low-level memset-style clear on a much smaller array (since we use the actual
rather than the maximum read length to create it). So this should be faster than caching
instances and calling clear() but slower than caching instances and not calling clear().

Credit to Ryan to proposing this approach.
2012-10-25 17:02:55 -04:00
Ami Levy Moonshine dde3060bb8 add the CEUtrio best practices results (UG + PBT) to the bundle 2012-10-25 15:36:17 -04:00
Ami Levy Moonshine 90b9971033 Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-10-25 15:32:29 -04:00
David Roazen 884d031e72 NestedIntegerArray: Pre-allocate only the first two dimensions
It turns out that pre-allocating the entire tree was too expensive in
terms of memory when using large values for the -mcs and -ics parameters.

Pre-allocating the first two dimensions prevents us from ever locking the
root node during a put(). Contention between threads over lower levels
of the tree should be minimal given that puts are rare compared to gets.

Also output dimensions and pre-allocation info at startup. If pre-allocation
takes longer than usual this gives the user a sense of what is causing the
delay.
2012-10-25 15:17:42 -04:00
Mark DePristo cc8c12b954 Committing a broken version of BaseRecalibration
-- I'm committing because there's some kind of fundamental problem with the ReadCovariates cache, in that historical data isn't being cleared / computed properly, and I'd rather it fail for a while than leave it in JIRA.
-- The integration tests test the -nct with PrintReads to get 1, 2, 4 and the 4 fails.  But that's because of this incorrect calculation
-- Updating GATKPerformanceOverTime with the new @ClassType annotation
2012-10-25 14:46:35 -04:00
Eric Banks e93ff3ea6e Let's go back to having the SB/SLOD NOT computed by default. If you recall, it was only enabled by default because we thought we were going to use it when we made VQSR use random forests. But since we decided not to change VQSR, there's no reason to triple the computation for every variant site anymore. 2012-10-25 12:45:23 -04:00
Eric Banks 6dc7d872ec Fix GenotypeAndValidate to handle SNPs and indels as reported on the forum. Recent changes to the UnifiedArgumentCollection made this stop working. Adding in JIRA to create integration tests for this tool. 2012-10-25 10:06:13 -04:00
Eric Banks c53c55da12 Re-enable tests 2012-10-25 09:37:08 -04:00
Eric Banks e6652f7777 Added integration test for contamination down-sampling 2012-10-25 09:36:05 -04:00
Eric Banks df9e0b7045 Merge branch 'master' of ssh://gsa2/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-10-25 02:49:54 -04:00
Eric Banks 72714ee43e Minor patches to get the contamination down-sampling working for indels. Adding @Hidden logging output for easy debugging. 2012-10-25 02:47:42 -04:00
Eric Banks c6b57fffda Added allele biased down-sampling capabilities to the PerReadAlleleLikelihoodMap object, which means that both the UG and HC can use this functionality. Note that it's only available in protected, so GATK-lite users won't be allowed to enable it. Needs more testing. 2012-10-24 22:52:25 -04:00
Ami Levy Moonshine bcf3582095 Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-10-24 21:50:41 -04:00
David Roazen 2d9e2e6b8e Delete ExperimentalNestedIntegerArray
Forgot to delete this in my last push. This class was only used for
profiling purposes to try out different ideas and is no longer needed.
2012-10-24 19:59:46 -04:00