Commit Graph

10982 Commits (bf7c832d25c51e588b9605747f132799452d4e7e)

Author SHA1 Message Date
David Roazen bf7c832d25 Merged bug fix from Stable into Unstable 2012-11-02 19:23:11 -04:00
David Roazen eae2d019cf Refuse to package the GATK from a non-clean working directory
Packaging from a non-clean working directory can result in an incorrect
jar. Now that we have external collaborators packaging and distributing
the GATK, not enforcing the clean requirement has become far too dangerous.
At the same time, invoking "clean" automatically through a direct
dependency would also be dangerous -- instead, it's better to error out
if a packaging target is invoked from a non-clean working dir.
2012-11-02 19:11:24 -04:00
Mark DePristo 0ab4022f23 Final r119 tribble jar 2012-11-02 14:30:33 -04:00
David Roazen 73157ae3d3 Allow each pipeline test the max of 10 hours to run
The runtime of these tests is extremely variable -- sometimes they will complete almost instantly,
other times they will wait in an LSF queue for 5-10+ hours. Minimize timeout errors by setting the
timeout for these tests to the maximum of 10 hours.
2012-11-02 12:40:56 -04:00
Mark DePristo f8a0a947e3 Critical bugfix for GSA-652 / Multi-threaded VCF -> BCF writing produces invalid intermediate file that fails on merging
-- New tribble library now uses 64 bit sizes.  The 26K VCF has so much data that low-level tribble block indices where overflowing their int size values.  This includes a to-be-committed tribble jar that fixes this problem
-- See https://jira.broadinstitute.org/browse/GSA-652
-- Minor cleanup of error messages that were useful on the way to solving this monster problem
2012-11-02 09:09:59 -04:00
David Roazen 6185e8c432 Allow large-scale tests 5 hours each to run 2012-11-01 17:48:58 -04:00
Ryan Poplin 386b45e94d This VE eval module isn't useful anymore. 2012-11-01 15:44:41 -04:00
Mark DePristo 872abddfce Add custom TestNGTestTransformer that adds a maximum test runtime of 10 minutes to all testng tests
-- Closes GSA-494 / Add maximum runtime for integration tests, running them in timeout thread
-- Needed to debug locking issues
-- Needed to debug excessively long running integrationtests
-- Added build.xml maximum runtime for all testng tests of 10 hours.  We will ultimately fail the build if it goes on for more than 10 hours
2012-11-01 15:34:12 -04:00
Mark DePristo 7dd7afe739 Use BQSR in SingleExomeCalling evaluation 2012-11-01 15:34:12 -04:00
Mark DePristo 1444cd753b Bugfix for GSA-647 HaplotypeCaller misses good variant because the active region doesn't trigger for an exome
-- The logic for determining active regions was a bit broken in the HC when intervals were used in the system
-- TraverseActiveRegions now uses the AllLocus view, since we always want to see all reference sites, not just those covered.  Simplifies logic of TAR
-- Non-overlapping intervals are always treated as separate objects for determing active / inactive state.  This means that each exon will stand on its own when deciding if it should be active or inactive
-- Misc. cleanup, docs of some TAR infrastructure to make it safer and easier to debug in the future.
-- Committing the SingleExomeCalling script that I used to find this problem, and will continue to use in evaluating calling of a single exome with the HC
-- Make sure to get all of the reads into the set of potentially active reads, even for genomic locations that themselves don't overlap the engine intervals but may have reads that overlap the regions
-- Remove excessively expensive calls to check bases are upper cased in ReferenceContext
-- Update md5s after a lot of manual review and discussion with Ryan
2012-11-01 15:34:04 -04:00
Mark DePristo 9cd04c335c Work on GSA-508 / CachingIndexedFastaReader should internally upper case bases loading data
-- As one might expect, CachingIndexedFastaSequenceFile now internally upper cases the FASTA reference bases.  This is now done by default, unless requested explicitly to preserve the original bases.
-- This is really the correct place to do this for a variety of reasons.  First, you don't need to work about upper casing bases throughout the code.  Second, the cache is only upper cased once, no matter how often the bases are accessed, which walkers cannot optimize themselves.  Finally, this uses the fastest function for this -- Picard's toUpperCase(byte[]) which is way better than String.toUpperCase()
-- Added unit tests to ensure this functionality works correct.
-- Removing unnecessary upper casing of bases in some core GATK tools, now that RefContext guarentees that the reference bases are all upper case.
-- Added contracts to ensure this is the case.
-- Remove a ton of sh*t from BaseUtils that was so old I had no idea what it was doing any longer, and didn't have any unit tests to ensure it was correct, and wasn't used anywhere in our code
2012-11-01 15:34:03 -04:00
Eric Banks 94a13c05ed Merged bug fix from Stable into Unstable 2012-10-31 22:57:26 -04:00
Eric Banks 47a0f5859e Don't run these tests if not GAKT lite 2012-10-31 22:56:38 -04:00
Eric Banks 881c843307 Merged bug fix from Stable into Unstable 2012-10-31 21:28:27 -04:00
Eric Banks f8af8a2355 Moving UG integration tests to protected since they use protected-only contamination filtering. Adding a new UGLite integration test to confirm that contamination filtering is ignored in lite. 2012-10-31 21:28:07 -04:00
Eric Banks 96344c6b62 Add note to realigner docs 2012-10-31 12:35:45 -04:00
Eric Banks 0a56fe5bc3 Merge remote-tracking branch 'unstable/master' 2012-10-31 12:17:24 -04:00
Eric Banks eccb76c304 Only run UG in the bundle for chr20 2012-10-30 15:09:46 -04:00
Eric Banks e1e480a0b9 Bug fix: don't add no-call alleles to the list of ALT alleles being validated. 2012-10-30 14:54:29 -04:00
Eric Banks 2aa28abe0a Fixing md5s to reflect the new HapMap file 2012-10-30 14:27:10 -04:00
Eric Banks 8a402024c2 Updating bundle script to handle new naming convention of CEU trio best practices callset 2012-10-30 09:11:56 -04:00
Eric Banks c95e893920 Better error message for unused ALT alleles 2012-10-29 21:51:35 -04:00
Eric Banks b6a1967f12 Better documentation for ValidateVariants so that people realize it's used for strict validation of the VCF file. Added an option to turn off strict validation and an integration test to cover it. 2012-10-29 21:47:09 -04:00
Ryan Poplin 21fa5f70ca Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-10-29 18:53:41 -04:00
Ryan Poplin 5ee2feb2a3 updating pipeline test md5s 2012-10-29 18:53:27 -04:00
Eric Banks be902375ac 'Bug' fix: fix the error message from the vcf validator so people realize that the file fails strict validation but still adheres to the spec. 2012-10-29 16:29:27 -04:00
Ryan Poplin 4e661847b2 DelocalizedBaseRecalibrator becomes the BaseRecalibrator. 2012-10-29 12:53:39 -04:00
Eric Banks ac99437eec Bug fixes to hapmap conversion in VariantsToVCF 2012-10-29 01:45:33 -04:00
Eric Banks 43625f652e Shoot, mixed up the md5s last time. 2012-10-27 19:43:46 -04:00
Andrey Sivachenko f3ac5d404d updating vcf header attribute descriptions in order to reflect correctly what's actually being written... 2012-10-26 23:52:21 -04:00
Andrey Sivachenko b4fbf6280a fixing missing sample genotype bug, missing AD/DP bug, and putting annotations in more natural order (Ref/Alt) 2012-10-26 23:48:40 -04:00
Mark DePristo ac5e58a265 Bugfix for GSA-540 / Update metadata maps when adding lines to VCFHeader
-- https://jira.broadinstitute.org/browse/GSA-540
-- http://gatkforums.broadinstitute.org/discussion/1433/possible-bug-and-fix-in-java-code-of-vcfheader-org-broadinstitute-sting-utils-codecs-vcf-vcfheader
2012-10-26 16:34:16 -04:00
Mark DePristo fa9b2a91d0 Bugfix for GSA-552
-- https://jira.broadinstitute.org/browse/GSA-552
-- User reports a null exception while using VariantsToVCF:
   http://gatkforums.broadinstitute.org/discussion/1461/nullpointerexception-converting-vcf3-to-vcf-using-variantstovcf
   The problem is that he left out an input VCF file for the --variant argument and the command-line argument parsing code didn't catch this, so we NPE out later on.
2012-10-26 16:34:16 -04:00
Eric Banks 682a72faf7 Hmm, thought I got all the md5s last time. Apparently not. 2012-10-26 16:10:12 -04:00
Ryan Poplin b0dcc2c78e Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-10-26 15:14:41 -04:00
Ryan Poplin 610e003b93 Fixing a bug in the BQSR where the reference context gets out of sync after a read is adapter clipped inside the walker. 2012-10-26 15:14:27 -04:00
David Roazen 35483a7eef Update MD5s for PrintReads with BQSR Integration Test
The MD5s for these tests were changed in commit 87435f1074615b2cd016f042980109fd53962c8d
to match the output of a broken version of BaseRecalibration. With the patch in
commit c397102ecc1fd1d2cd8f209a8f358ab4a60b50a7, the output once again matches the
*original* MD5s for these tests, and does not vary as you increase -nct.

Final resolution to GSA-632
2012-10-26 14:25:25 -04:00
Eric Banks f66d812778 Merge branch 'master' of ssh://gsa2/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-10-26 13:20:41 -04:00
Eric Banks a8704ca73f Adding TODO notes for Ami 2012-10-26 13:20:27 -04:00
Mark DePristo 251983b8fb Add GATK-wide command line argument to control the maximum runtime allowed for the GATK
-- Providing this optional argument -maxRuntime (in -maxRuntimeUnits units) causes the GATK to exit gracefully when the max. runtime has been exceeded.  By cleanly I mean that the engine simply stops at the next available cycle in the walker as through the end of processing had been reached.  This means that all output files are closed properly, etc.
-- Emits an info message that looks like "INFO  10:36:52,723 MicroScheduler - Aborting execution (cleanly) because the runtime has exceeded the requested maximum 10.0000 s".  Otherwise there's currently no way to differentiate a truly completed run from a timelimit exceeded run, which may be a useful thing for a future update
-- Resolves GSA-630 / GATK max runtime to deal with bad LSA calling?
-- Added new JIRA entry for Ami to restart chr1 macarthur with this argument set to -maxRuntime 1 -maxRuntimeUnits DAYS to see if we can do all of chr1 in one weekend.
2012-10-26 13:18:34 -04:00
Eric Banks 46099af8db Merge branch 'master' of ssh://gsa2/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-10-26 12:10:53 -04:00
Eric Banks ed11b7dab2 Fix UG parallelization test 2012-10-26 12:10:44 -04:00
Eric Banks 7a706ed345 Fix some of the broken integration tests 2012-10-26 11:23:44 -04:00
Yossi Farjoun 27a4d6d90e Merge branch 'master' of ssh://gsa4/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-10-26 09:33:10 -04:00
Yossi Farjoun a3193a1743 fixed LargeScaleValidationCallingSingle.scala for new version of scala 2012-10-26 09:32:19 -04:00
Eric Banks ebebec7fdb Accidentally left one test disabled 2012-10-26 02:15:32 -04:00
Eric Banks b06f689d4b Merge branch 'master' of ssh://gsa2/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-10-26 02:13:26 -04:00
Eric Banks a53e03d525 Do not let reduced reads get removed in the contamination down-sampling 2012-10-26 02:13:04 -04:00
Eric Banks bf3d61ce82 The default value for --contamination_fraction_to_filter is now 0.05 (5%) in both UG and HC. Users of GATK-lite get pushed down to 0% by default (since it's not enabled) or get a user error if they try to set it. 2012-10-26 01:04:51 -04:00
Eric Banks 91f2c847a3 Fixing problem reported on forum for VF: DP couldn't be filtered from the FORMAT field, only from the INFO field. Fixed and added integration test. 2012-10-26 00:57:40 -04:00