Commit Graph

3220 Commits (3aedd0055e6fb32d0a026b60302f652fc0785c03)

Author SHA1 Message Date
aaron 7474afa7a3 allow other objects access to the static method that resolves bam lists, and some renaming and improved documentation for the function.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4087 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 18:52:00 +00:00
ebanks ccda4f6ec1 More output consistency changes (updating wiki docs as I go along).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4086 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 18:46:08 +00:00
ebanks c9c6ff49c2 Deprecated 'O' in favor of 'o' in the cleaner
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4085 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 18:09:24 +00:00
ebanks 55a8306a0d Update the @RMD tags to look for VariantContext.class instead of ReferenceOrderedDatum.class. Since the test for rod type is broken this won't affect anything right now.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4084 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 17:49:37 +00:00
aaron 35b9883dd6 vcfwriter is in tribble now
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4083 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 17:01:04 +00:00
aaron 2d3b6d89dc adding the ability in Tribble to create indexes from a stream of features, so that we can create multiple indexes from one pass of the file. In the GATK we now create multiple indexes, and choose the
most appropriate based on feature density, and the longest feature in the file.  Also:

- Converted Tribble to TestNG; it has better features and is about 6x faster.
- As much code clean-up as I could get done.  More to do, especially in the example code.
- Moved asserts in the code to throw exceptions.
- Added getBinSize to the index interface; both indexes already implemented this.
- Removed the abstract parts of the indexCreator interface; this is now more simple.
- Added an IndexType enumeration; might be overkill but it is at least a single point of entry for index information.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4082 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 06:54:59 +00:00
kiran 295472bf69 Simple change to handle a no-call (must avoid asking for the second allele, which will be be null in this case). Also, added a hack to deal with input VCFs where there are no genotype likelihoods (needed in order to process Hapmap and 1KG VCFs). In this mode, called genotypes are assigned a likelihood of 0.96, and alternative genotypes are given 0.02 each. I know Beagle actually takes genotype data without likelihoods, so this might not be the right way to do this.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4081 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 05:13:09 +00:00
kiran dec713a184 Simple test code from Steve Schaffner to compute R^2 and D'. This is just for educational purposes. Don't use this code for anything, ever!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4080 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 05:06:16 +00:00
hanna c177801d81 Add deprecated command-line arguments, and switched over UG to output to
-o/--out instead of -varout.  Let's watch as our intrepid support engineer
gracefully responds to all the incoming questions of the form: "the GATK told
me to use -o instead of -varout.  What do I do?"


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4078 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-22 21:01:44 +00:00
hanna b80cf7d1d9 Modifications to the output system for better interaction with @Output. Multiplexed arguments. More details in the Monday meeting.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4077 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-22 14:27:05 +00:00
ebanks 30a104228a Don't require entropy reduction when cleaning only at known sites; instead we need to trust the known indels. This will improve consistency between lane-level and aggregated cleaning.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4076 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-22 02:44:38 +00:00
depristo b6989289fc Potential bug fix for bad references where some codons may have Ns
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4075 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-21 12:09:33 +00:00
kiran 121b4f23b6 Simple change to allow a list of samples or regular expressions to be provided in a text file (one line per sample).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4074 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-21 00:01:48 +00:00
ebanks 165dc6d3b0 Ryan, what did you decide about supporting this tool? Is it still useful?
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4073 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-20 19:16:14 +00:00
ebanks 2ef2f1b24a Fix UG's simple indel calculation model so that deletions are created correctly
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4072 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-20 15:35:47 +00:00
fromer 1c4784999a Updated to work exclusively in log10 space
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4069 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-19 21:31:07 +00:00
fromer 3af4e618cc Fixed precision issues with PQ (phasing quality)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4068 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-19 20:34:47 +00:00
kshakir 88ca1fb22c Lazy loading reflections so Queue can hack the classpath before the PluginManager looks for classes.
Removed extra quotes from 'cd' pre-exec command.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4067 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-19 20:29:52 +00:00
aaron 63ada20da5 allow RefSeq files to optionally contain the header line, which is the default output from the UCSC table browser
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4065 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-19 20:25:37 +00:00
fromer effeedf1a3 Updated Bayesian phasing method to output per-site phasing statistics (and to not cap PQ at 40)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4064 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-19 19:55:47 +00:00
aaron 04e5b28f6d updates for VCF; we can no longer cache genotypes or alleles in a static array, this is bad for sharred memory parallel runs. One instance per codec was better for performance than using ThreadLocal code.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4063 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-19 19:34:44 +00:00
corin 8054b6b295 Changing a name of a column for variantevals output for easier reading by R--let me know if this needs to be updated elsewhere; it's just a space to an underscore.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4062 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-19 19:18:16 +00:00
ebanks 4b94f8c21b Silly me, I forgot to check for the contig boundaries. Thank goodness for performance tests!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4061 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-19 18:40:26 +00:00
aaron f16bb1e830 fix for a bug in package utils.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4060 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-19 15:01:50 +00:00
fromer 15c5aa6e48 Efficient iteration over all possible combinations of variable assignments, for variables of arbitrary cardinalities
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4059 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-19 14:14:37 +00:00
ebanks 1ec305cd15 Fix for running the cleaner at the lane-level for known indels only: instead of relying on the reads to get the reference sequence, we now use an IndexedFastaSequenceFile in all cases and pad the reference with bases on either end. This allows us to deal with cases in which we are trying to clean just a single deletion-containing read with tiny LOD (so the read needs to be pushed off the seen reference; @Reference doesn't yet work for Read Walkers) and has the added benefit of allowing us now to get much larger known indels that aren't completely covered with reads.
Thanks to Matt for the advice.

Also, for Guillermo: while I was at it, I changed the .stats debug output to emit the original interval instead of the cleaned region.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4058 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-19 11:31:13 +00:00
ebanks 98f7679619 Fixed the bug reported on GS regarding a clipped read that got moved several hundred bases away. The code that got triggered here was written back in the original version of the cleaner and it never actually did the right thing.
While I was fixing it, I noticed that we weren't allowing the cleaner to un-clean reads with indels when they're wrong even though we should.  Hypothetically, that should rarely happen: only when we can left-align out an indel or when the original mapper really went haywire.  This situation is rare enough that I'm calling logger.info to let the user know it's happening and suggesting that they double-check that everything looks right with their reads.  Better to be extra-cautious now that the cleaner is moving into the 1kg and Broad production pipelines soon
.
Mark, have no fear: this was truly a rare edge case - one that won't affect the cleaning stats.  There is no need to re-clean the data processing paper bams!



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4057 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-19 01:42:48 +00:00
aaron 3dc4d3c3a9 removing the custom reflections library from the libs, and adding a release version. Hopefully this will fix the problem Menachem has been seeing with random JVM crashes. Also
removed the auto-deletion of the reflections jar, and removed the very old OmniPlan document we had checked-in.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4056 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-19 00:42:37 +00:00
fromer 1336ea17a3 quality-scored-based Bayesian phasing algorithm implemented
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4055 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-18 21:17:46 +00:00
fromer 553bda4e0e PreciseNonNegativeDouble permits precise arithmetic operations on NON-NEGATIVE double values
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4054 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-18 21:10:58 +00:00
rpoplin 8f15b2ba72 Memory optimization for the VariantRecalibrator. Only add variants to the list if they pass the novelty and qual filters.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4051 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-17 21:57:28 +00:00
kshakir b7c60b9729 Queue now uses its own version instead of the gatk version.
Added a Queue release directory.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4050 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-17 19:34:23 +00:00
aaron c1df293feb remove testing code from tribble track builder, set the command line program in walker test to null to reclaim memory in integration tests, and removed some orphaned intergration tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4046 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-16 23:52:01 +00:00
rpoplin 578e7fa36d Don't output -0 as qual value in VariantRecalibrator.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4044 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-16 16:47:58 +00:00
kiran 3d63302b70 Deprecated. Use SelectVariants instead.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4043 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-16 15:07:50 +00:00
depristo 20db00a3e8 Lazy reference loading; the engine doesn't fetch the reference bases until you actually call ref.getBases(). With the new hidden --dontUpdateUG to table recalibrator this is 2-3x faster than before. Enabled for locus, read, and rod walkers.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4042 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-16 13:46:22 +00:00
aaron 9ab647b730 adding checks to the RefSeq rod for line's that contain less than the required number of columns (we expect there to be 16 columns)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4041 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-16 13:34:32 +00:00
aaron b23545fafa re-enable the check for up-to-date versions in the Tribble index.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4039 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-16 12:47:58 +00:00
ebanks 37586d3a43 Don't exception out when bad aligners emit wonky alignments; instead, just don't clean
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4038 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-16 02:36:04 +00:00
depristo a36951f11a @output and @input arguments for table recalibration for use with Q
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4037 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-14 18:36:28 +00:00
depristo 61064d7075 GenotypeConcordance log file -- if provided, GC module will write FN/FP information to this file by context
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4036 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-14 18:35:57 +00:00
depristo 0d209d5442 Nicer printing out of clustering
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4035 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-14 16:02:13 +00:00
kshakir 307c8ca027 Created a new playground script for cleaning bams in Firehose.
Some refactoring of Queue extensions for reusability in scripts.
Putting the extensions into the Queue.jar after building them.
More updates to GATK walker arguments specifying @Input and @Output for Queue.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4032 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-13 23:52:24 +00:00
fromer dfe2922b5e First working version of statistical haplotype phaser
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4031 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-13 21:29:45 +00:00
ebanks f36c0ed613 Stop building obsolete VCFTools and CGUtilities
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4030 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-13 19:28:36 +00:00
rpoplin 222f61df87 Bug fix for damoskow in TableRecalibration. Shouldn't try to update the reference mismatch rate tag for an unmapped read.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4028 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-13 18:57:07 +00:00
kshakir 80a70ccf03 Repopulating rodsToSamples. Code reviewed by Eric.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4027 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-13 17:07:18 +00:00
hanna cb144734c0 Getting rid of GenotypeWriter interface. Of note:
- GATKVCFWriter deleted, to be replaced if absolutely necessary when VCF writing goes into Tribble.
- VCFWriter is now an interface, for easier redirection.
- VCFWriterImpl fleshes out the VCFWriter interface.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4026 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-13 16:33:22 +00:00
kshakir 542d394e09 Cleaning up Queue debugging output.
-l DEBUG with local programs now prints out the stdout/stderr of the programs as they are run.
More documentation in the examples with a new even simpler CountReads example.
Took out unused option to build Queue GATK extensions separately.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4025 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-13 15:54:08 +00:00
chartl 49a3db9dfe A brief implementation of a QD calculation that is not quite so bimodal for known variants (multiplicatively penalizes QD by (n variant samples)/(n variant alleles) ). Not sure how helpful this will be (which is why it is in oneoffs). Seems nice on MCKD1, but I'm still playing with the optimization.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4024 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-13 15:42:37 +00:00