Commit Graph

832 Commits (da2e879bbc07033e7b5bd90e9f941fcaec937914)

Author SHA1 Message Date
ebanks a0231f073f Damnit. Enabling the Picard code to recalculate all of the relevant SAMRecord attribute tags means that I need to have reference bases over all read bases even after realignment (and there are some big indels in dbsnp). Fortunately, I have my trusty IndexedFastaSequenceFile reader handy! Re-enabling the previously broken performance test.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4255 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-12 05:06:37 +00:00
rpoplin 7b113a4886 Truncate the floating point numbers coming out of the variant recalibration walkers. Integration tests now work with both 1.6.0_16-b01 and 1.6.0_21-b06
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4253 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-10 18:37:49 +00:00
aaron cf33614ddc remove the test that's failing the performance tests, please don't release until this is figured out
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4251 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-10 06:30:40 +00:00
aaron 4adb07683d all fixed..thanks Matt!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4250 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-10 06:18:59 +00:00
aaron bd4bc84abd comment out the broken aligner test again - I'll take a crack at fixing it tomorrow. Each software engineer is going to take a pass at fixing it, and we'll see who can do it with the most style.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4249 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-10 05:22:24 +00:00
rpoplin 61e848c4f0 It's clear from Sendu's calling and my own calling that -qScale 100.0 is a much better default value for low pass data.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4248 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-10 01:47:21 +00:00
hanna e183b6598c - Fix our private repository of bwa reference support files.
- Update the test to point to our repository.
- Update the md5 to reflect new Picard tag ordering.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4247 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-10 00:29:26 +00:00
kshakir 4183e8805a Fixed reference (via busted symlink) /broad/1KG
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4245 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-09 20:34:51 +00:00
rpoplin aeb897db7f VR walkers look at by-hapmap validation status by default. Eric will be updating the syntax to allow for more flexibility here.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4242 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-09 15:40:56 +00:00
kshakir d7f55574e2 Re-enabling aligner integration test now that we're back to having more than 1 or 2GB memory.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4241 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-09 15:09:48 +00:00
rpoplin d625186796 I think the VR integration tests are fine.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4240 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-09 15:00:41 +00:00
depristo 6a30617a60 Initial implementation of UserError exceptions and error message overhaul. UserErrors and their subclasses UserError.MalFormedBam for example should be used when the GATK detects errors on part of the user. The output for errors is now much clearer and hopefully will reduce GS posts. Please start using UserError and its subclasses in your code. I've replace some, but not all, of the StingExceptions in the GATK with UserError where appropriate.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4239 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-09 11:32:20 +00:00
ebanks 65edbced36 Addition for Tim: recalculate the NM and UQ tags after realignment. Also, don't fix the insert size calculation, since that's done by fix mate information.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4227 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-08 04:02:14 +00:00
rpoplin e3962c0d13 VR integration tests are longer but much more useful.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4210 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 15:50:19 +00:00
ebanks b59d62927e Fix busted performance test (-outputBam has been deprecated in the BQ recalibrator in favor of -o)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4201 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 12:51:53 +00:00
hanna 70bb480939 The battle is over. Picard is revved.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4200 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 05:28:01 +00:00
rpoplin 0bb05fb472 Bug fix in VariantRecalibrator. Only add sample names from the input rod bindings, not from all rod bindings.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4194 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-02 21:12:09 +00:00
hanna dc5f858d29 Replaced placeholder support for splitting by read group with read support (sorry everyone), and added relatively comprehensive unit tests to ensure that splitting by read group works.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4190 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-01 22:24:50 +00:00
rpoplin b28f63a948 Base recalibrator now uses -o and deprecates -outputBam
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4189 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-01 22:13:50 +00:00
kshakir 33400074fa Updated tribble BED parsing code to use the official UCSC spec, and updated tests to match expected results.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4188 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-01 21:49:06 +00:00
rpoplin 469bbaa240 Added more integration tests for the variant quality score recalibrator
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4181 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-01 15:31:24 +00:00
depristo 8c4009ee18 Oops, don't enable reporting in integration tests
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4179 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-31 22:56:18 +00:00
ebanks 3d6c4fc55f Removing the obsolete --hapmap and --hapmap_chip options
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4172 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-31 16:57:05 +00:00
depristo b33873206a GATKRunReport now has an ID (random 32 char string) that uniquely identifies the JOB run and can be used to find a run in the run repository
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4171 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-31 16:18:57 +00:00
depristo 3fd2392090 Improved interface to getting command line options. Now fully traverses all objects to get all internal argument collections. Preliminary (but disabled version) of phoning home (see -et argument for more information). Captures correct and erroring out runs and writes out gzipped, xml report with lots of useful information. Needs a bit more information but is approximately working. Reports going to /humgen/gsa-hpprojects/GATK/reports/ in submitted directory that will be collated by some external tool. Only operating if -et STANDARD or -et STDOUT are provided currently and REPORT_DIR contains a file called ENABLE. WalkerTest now adds -et NO_ET to tests to avoid populating the reports with tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4155 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-28 22:53:32 +00:00
rpoplin 9c3f403307 Add the calculated lod value to the info field of each recalibrated VCF record.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4153 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 21:33:58 +00:00
ebanks bfcac33e80 Cleaning up playground utils and tests
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4136 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 01:25:47 +00:00
hanna d773b3264b Eliminated -mrl option.
Eliminated -fmq0 option.
Eliminated read group hallucination.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4133 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-26 21:38:03 +00:00
ebanks dfae48cee0 Moving supported tools to core
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4127 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-26 13:56:19 +00:00
ebanks 45d895dcf4 Remove the check in the Unified Genotyper for hitting the max reads at locus value. Instead, simply add a flag to the INFO field if any of the samples has been downsampled. 95% hooked up.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4126 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-26 05:50:47 +00:00
ebanks 79cd716671 More cleanup of the Genomic Annotator. Also, we now require join tables to have unique entries for the column keyed on the join.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4124 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-26 04:43:52 +00:00
ebanks dd7f136298 Office-mate courtesy: fixing Andrey's busted integration test
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4123 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-26 02:00:06 +00:00
ebanks 4678613893 Significant fixes for the Genomic Annotator.
1. Rip out all of Ben's code intended to circumvent the stable VCF Writer output system in multi-threaded mode (I threw up a little when 
I saw this code).  This will improve memory consumption when running with -nt.
2. Don't annotate indels or > bi-allelic sites.
3. Fix bug where not all records were making it into the output VCF.
4. General code clean up.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4118 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-25 20:16:50 +00:00
rpoplin 5623e01602 GenerateVariantClusters and VariantRecalibrator now uses hapmap and 1kg ROD bindings (in addition to dbsnp) to distinguish between knowns and novels. It no longer looks at by-hapmap validation status so providing hapmap is highly recommended. Example on the wiki. Input variants tracks now must start with input.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4113 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-25 18:33:40 +00:00
hanna bf0b6bd486 Update integration tests to use the new ROD syntax.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4112 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-25 18:13:30 +00:00
hanna 3dc78855fd Command-line argument tagging is in, and the ROD system is hacked slightly to support the new syntax
(-B:name,type file) as well as the old syntax.  Also, a bonus feature: BAMs can now be tagged at the
command-line, which should allow us to get rid of some of the hackier calls in GenomeAnalysisEngine.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4105 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-25 03:47:57 +00:00
rpoplin 85007ffa87 Some clean up for the variant recalibrator. Now uses @Input and @Output so that it can join the Queue party. Users now specify a -o, -clusterFile, -tranchesFile, and -reportDatFile. Example on the wiki. ApplyVariantCuts now has an integration test. Base quality recalibrator now requires a dbsnp rod or vcf file. Now that the base quality recalibrator is using @Output the PrintStream shouldn't be closed in OnTraversalDone.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4101 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-24 20:14:58 +00:00
ebanks 90aef66ec5 Minor fixes for my last commit
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4090 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 23:25:29 +00:00
ebanks ccda4f6ec1 More output consistency changes (updating wiki docs as I go along).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4086 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 18:46:08 +00:00
ebanks c9c6ff49c2 Deprecated 'O' in favor of 'o' in the cleaner
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4085 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 18:09:24 +00:00
aaron 2d3b6d89dc adding the ability in Tribble to create indexes from a stream of features, so that we can create multiple indexes from one pass of the file. In the GATK we now create multiple indexes, and choose the
most appropriate based on feature density, and the longest feature in the file.  Also:

- Converted Tribble to TestNG; it has better features and is about 6x faster.
- As much code clean-up as I could get done.  More to do, especially in the example code.
- Moved asserts in the code to throw exceptions.
- Added getBinSize to the index interface; both indexes already implemented this.
- Removed the abstract parts of the indexCreator interface; this is now more simple.
- Added an IndexType enumeration; might be overkill but it is at least a single point of entry for index information.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4082 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 06:54:59 +00:00
hanna 8252494fa9 Forgot to update UG performance test to reflect the new -o argument.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4079 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 00:57:16 +00:00
hanna c177801d81 Add deprecated command-line arguments, and switched over UG to output to
-o/--out instead of -varout.  Let's watch as our intrepid support engineer
gracefully responds to all the incoming questions of the form: "the GATK told
me to use -o instead of -varout.  What do I do?"


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4078 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-22 21:01:44 +00:00
hanna b80cf7d1d9 Modifications to the output system for better interaction with @Output. Multiplexed arguments. More details in the Monday meeting.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4077 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-22 14:27:05 +00:00
kiran 121b4f23b6 Simple change to allow a list of samples or regular expressions to be provided in a text file (one line per sample).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4074 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-21 00:01:48 +00:00
aaron fa36731faf fixes for VariantEval integration tests affected by the spaces to underscores change.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4070 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-19 22:43:20 +00:00
ebanks 1ec305cd15 Fix for running the cleaner at the lane-level for known indels only: instead of relying on the reads to get the reference sequence, we now use an IndexedFastaSequenceFile in all cases and pad the reference with bases on either end. This allows us to deal with cases in which we are trying to clean just a single deletion-containing read with tiny LOD (so the read needs to be pushed off the seen reference; @Reference doesn't yet work for Read Walkers) and has the added benefit of allowing us now to get much larger known indels that aren't completely covered with reads.
Thanks to Matt for the advice.

Also, for Guillermo: while I was at it, I changed the .stats debug output to emit the original interval instead of the cleaned region.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4058 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-19 11:31:13 +00:00
rpoplin 8f15b2ba72 Memory optimization for the VariantRecalibrator. Only add variants to the list if they pass the novelty and qual filters.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4051 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-17 21:57:28 +00:00
aaron e632d9b83d remove some dependencies on out of date methods from the tests
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4047 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-17 00:07:26 +00:00
aaron c1df293feb remove testing code from tribble track builder, set the command line program in walker test to null to reclaim memory in integration tests, and removed some orphaned intergration tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4046 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-16 23:52:01 +00:00