Commit Graph

3399 Commits (44f3c5639ad4152e7bea9573dbe525b0069bc494)

Author SHA1 Message Date
ebanks 44f3c5639a I have finally figured out that when you volunteer to do something in group meeting, you keep getting pestered about it on Mark's Omniplan doc until it gets done (except for contig aliasing, of course). As such...
We can now emit bzipped VCFs from the GATK.

Details: any walker that defines a VCFWriter for its @Output (i.e. pretty much every core walker from UG and on), also has associated with it the -bzip (--bzip_compression) boolean argument.  When set, it will emit a VCF that is compressed with bzip2.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4093 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-24 04:14:50 +00:00
hanna 691333f75c Force isRequired() to be false for @Deprecated args.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4092 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 23:50:30 +00:00
hanna 5d6a6420a9 New behavior for filling it output streams: if required==true for a field and the field
is an output stream, we'll automatically create it and point it to stdout.  Otherwise, 
we'll leave it empty.  
I think about it like this: marking a field 'required' indicates to the GATK that the 
walker author requires a value for this field, and if the GATK can provide one without 
end user intervention, it will.  Maybe this is hackish.  We'll try it and see.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4091 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 23:39:13 +00:00
ebanks 90aef66ec5 Minor fixes for my last commit
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4090 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 23:25:29 +00:00
ebanks ef795825fd Yet more argument consistency updates
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4089 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 20:52:30 +00:00
aaron 7474afa7a3 allow other objects access to the static method that resolves bam lists, and some renaming and improved documentation for the function.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4087 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 18:52:00 +00:00
ebanks ccda4f6ec1 More output consistency changes (updating wiki docs as I go along).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4086 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 18:46:08 +00:00
ebanks c9c6ff49c2 Deprecated 'O' in favor of 'o' in the cleaner
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4085 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 18:09:24 +00:00
ebanks 55a8306a0d Update the @RMD tags to look for VariantContext.class instead of ReferenceOrderedDatum.class. Since the test for rod type is broken this won't affect anything right now.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4084 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 17:49:37 +00:00
aaron 35b9883dd6 vcfwriter is in tribble now
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4083 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 17:01:04 +00:00
aaron 2d3b6d89dc adding the ability in Tribble to create indexes from a stream of features, so that we can create multiple indexes from one pass of the file. In the GATK we now create multiple indexes, and choose the
most appropriate based on feature density, and the longest feature in the file.  Also:

- Converted Tribble to TestNG; it has better features and is about 6x faster.
- As much code clean-up as I could get done.  More to do, especially in the example code.
- Moved asserts in the code to throw exceptions.
- Added getBinSize to the index interface; both indexes already implemented this.
- Removed the abstract parts of the indexCreator interface; this is now more simple.
- Added an IndexType enumeration; might be overkill but it is at least a single point of entry for index information.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4082 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 06:54:59 +00:00
kiran 295472bf69 Simple change to handle a no-call (must avoid asking for the second allele, which will be be null in this case). Also, added a hack to deal with input VCFs where there are no genotype likelihoods (needed in order to process Hapmap and 1KG VCFs). In this mode, called genotypes are assigned a likelihood of 0.96, and alternative genotypes are given 0.02 each. I know Beagle actually takes genotype data without likelihoods, so this might not be the right way to do this.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4081 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 05:13:09 +00:00
kiran dec713a184 Simple test code from Steve Schaffner to compute R^2 and D'. This is just for educational purposes. Don't use this code for anything, ever!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4080 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 05:06:16 +00:00
hanna 8252494fa9 Forgot to update UG performance test to reflect the new -o argument.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4079 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 00:57:16 +00:00
hanna c177801d81 Add deprecated command-line arguments, and switched over UG to output to
-o/--out instead of -varout.  Let's watch as our intrepid support engineer
gracefully responds to all the incoming questions of the form: "the GATK told
me to use -o instead of -varout.  What do I do?"


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4078 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-22 21:01:44 +00:00
hanna b80cf7d1d9 Modifications to the output system for better interaction with @Output. Multiplexed arguments. More details in the Monday meeting.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4077 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-22 14:27:05 +00:00
ebanks 30a104228a Don't require entropy reduction when cleaning only at known sites; instead we need to trust the known indels. This will improve consistency between lane-level and aggregated cleaning.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4076 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-22 02:44:38 +00:00
depristo b6989289fc Potential bug fix for bad references where some codons may have Ns
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4075 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-21 12:09:33 +00:00
kiran 121b4f23b6 Simple change to allow a list of samples or regular expressions to be provided in a text file (one line per sample).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4074 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-21 00:01:48 +00:00
ebanks 165dc6d3b0 Ryan, what did you decide about supporting this tool? Is it still useful?
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4073 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-20 19:16:14 +00:00
ebanks 2ef2f1b24a Fix UG's simple indel calculation model so that deletions are created correctly
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4072 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-20 15:35:47 +00:00
aaron fa36731faf fixes for VariantEval integration tests affected by the spaces to underscores change.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4070 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-19 22:43:20 +00:00
fromer 1c4784999a Updated to work exclusively in log10 space
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4069 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-19 21:31:07 +00:00
fromer 3af4e618cc Fixed precision issues with PQ (phasing quality)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4068 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-19 20:34:47 +00:00
kshakir 88ca1fb22c Lazy loading reflections so Queue can hack the classpath before the PluginManager looks for classes.
Removed extra quotes from 'cd' pre-exec command.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4067 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-19 20:29:52 +00:00
aaron 63ada20da5 allow RefSeq files to optionally contain the header line, which is the default output from the UCSC table browser
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4065 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-19 20:25:37 +00:00
fromer effeedf1a3 Updated Bayesian phasing method to output per-site phasing statistics (and to not cap PQ at 40)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4064 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-19 19:55:47 +00:00
aaron 04e5b28f6d updates for VCF; we can no longer cache genotypes or alleles in a static array, this is bad for sharred memory parallel runs. One instance per codec was better for performance than using ThreadLocal code.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4063 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-19 19:34:44 +00:00
corin 8054b6b295 Changing a name of a column for variantevals output for easier reading by R--let me know if this needs to be updated elsewhere; it's just a space to an underscore.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4062 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-19 19:18:16 +00:00
ebanks 4b94f8c21b Silly me, I forgot to check for the contig boundaries. Thank goodness for performance tests!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4061 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-19 18:40:26 +00:00
aaron f16bb1e830 fix for a bug in package utils.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4060 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-19 15:01:50 +00:00
fromer 15c5aa6e48 Efficient iteration over all possible combinations of variable assignments, for variables of arbitrary cardinalities
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4059 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-19 14:14:37 +00:00
ebanks 1ec305cd15 Fix for running the cleaner at the lane-level for known indels only: instead of relying on the reads to get the reference sequence, we now use an IndexedFastaSequenceFile in all cases and pad the reference with bases on either end. This allows us to deal with cases in which we are trying to clean just a single deletion-containing read with tiny LOD (so the read needs to be pushed off the seen reference; @Reference doesn't yet work for Read Walkers) and has the added benefit of allowing us now to get much larger known indels that aren't completely covered with reads.
Thanks to Matt for the advice.

Also, for Guillermo: while I was at it, I changed the .stats debug output to emit the original interval instead of the cleaned region.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4058 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-19 11:31:13 +00:00
ebanks 98f7679619 Fixed the bug reported on GS regarding a clipped read that got moved several hundred bases away. The code that got triggered here was written back in the original version of the cleaner and it never actually did the right thing.
While I was fixing it, I noticed that we weren't allowing the cleaner to un-clean reads with indels when they're wrong even though we should.  Hypothetically, that should rarely happen: only when we can left-align out an indel or when the original mapper really went haywire.  This situation is rare enough that I'm calling logger.info to let the user know it's happening and suggesting that they double-check that everything looks right with their reads.  Better to be extra-cautious now that the cleaner is moving into the 1kg and Broad production pipelines soon
.
Mark, have no fear: this was truly a rare edge case - one that won't affect the cleaning stats.  There is no need to re-clean the data processing paper bams!



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4057 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-19 01:42:48 +00:00
aaron 3dc4d3c3a9 removing the custom reflections library from the libs, and adding a release version. Hopefully this will fix the problem Menachem has been seeing with random JVM crashes. Also
removed the auto-deletion of the reflections jar, and removed the very old OmniPlan document we had checked-in.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4056 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-19 00:42:37 +00:00
fromer 1336ea17a3 quality-scored-based Bayesian phasing algorithm implemented
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4055 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-18 21:17:46 +00:00
fromer 553bda4e0e PreciseNonNegativeDouble permits precise arithmetic operations on NON-NEGATIVE double values
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4054 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-18 21:10:58 +00:00
rpoplin 8f15b2ba72 Memory optimization for the VariantRecalibrator. Only add variants to the list if they pass the novelty and qual filters.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4051 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-17 21:57:28 +00:00
kshakir b7c60b9729 Queue now uses its own version instead of the gatk version.
Added a Queue release directory.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4050 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-17 19:34:23 +00:00
aaron e632d9b83d remove some dependencies on out of date methods from the tests
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4047 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-17 00:07:26 +00:00
aaron c1df293feb remove testing code from tribble track builder, set the command line program in walker test to null to reclaim memory in integration tests, and removed some orphaned intergration tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4046 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-16 23:52:01 +00:00
rpoplin 578e7fa36d Don't output -0 as qual value in VariantRecalibrator.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4044 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-16 16:47:58 +00:00
kiran 3d63302b70 Deprecated. Use SelectVariants instead.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4043 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-16 15:07:50 +00:00
depristo 20db00a3e8 Lazy reference loading; the engine doesn't fetch the reference bases until you actually call ref.getBases(). With the new hidden --dontUpdateUG to table recalibrator this is 2-3x faster than before. Enabled for locus, read, and rod walkers.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4042 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-16 13:46:22 +00:00
aaron 9ab647b730 adding checks to the RefSeq rod for line's that contain less than the required number of columns (we expect there to be 16 columns)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4041 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-16 13:34:32 +00:00
aaron cc58a27b00 fix for broken unit test; make sure when we can't get an index off of disk, the internal method returns null
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4040 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-16 13:12:32 +00:00
aaron b23545fafa re-enable the check for up-to-date versions in the Tribble index.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4039 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-16 12:47:58 +00:00
ebanks 37586d3a43 Don't exception out when bad aligners emit wonky alignments; instead, just don't clean
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4038 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-16 02:36:04 +00:00
depristo a36951f11a @output and @input arguments for table recalibration for use with Q
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4037 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-14 18:36:28 +00:00
depristo 61064d7075 GenotypeConcordance log file -- if provided, GC module will write FN/FP information to this file by context
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4036 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-14 18:35:57 +00:00