aaron
cba436fa2f
small fix for the table codec; if you see a header line, you know you've finished parsing the header. Also also some changes to return the ref ordered data pool test to using MappedStreamSegment instead of EntireStream
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4942 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-05 21:20:26 +00:00
hanna
0982d35f5b
Bug fixes in streaming in Tribble data via /dev/stdin.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4935 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-05 02:43:04 +00:00
hanna
3fc9862964
Unit test fixed - Tribble codecs aren't designed to be stateless, but I was
...
using one as though it was. Fixed, and debug code reverted.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4917 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-31 17:47:52 +00:00
hanna
b9cb57f4b9
A unit test is failing on bamboo in a way I can't reproduce (or even explain).
...
Checking in some debugging info.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4916 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-31 16:35:04 +00:00
hanna
cba18116e4
A significant refactoring of the ROD system, done largely to simplify the process of
...
streaming/piping VCFs into the GATK. Notable changes:
- Public interface to RMDTrackBuilder is greatly simplified; users can use it only to build
RMDTracks and lookup codecs.
- RODDataSource and RMDTrack are no longer functionally at the same level; RODDataSources now
manage RMDTracks on behalf of the GATK, and the only direct consumers of the RMDTrack class
are the walkers that feel the need to access the ROD system directly. (We need to stamp out
this access pattern.
A few minor warts were introduced as part of this process, labeled with TODOs. These'll be
fixed as part of the VCF streaming project.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4915 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-31 04:52:22 +00:00
aaron
85f2968104
add convenience methods for RODs-for-reads: the ability to get all the RODs covering the read, regardless of their type or position on the read.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4912 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-29 20:46:03 +00:00
ebanks
8a0c07b865
Support for indels in hapmap. This was non-trivial because not only does hapmap not tell you whether the allele is an insertion or deletion, but it also has a completely different positioning strategy (rightmost base). I'll send out an email tomorrow when the new HapMap3.3 VCF is ready.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4908 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-27 07:37:46 +00:00
hanna
e313eeede8
Push command-line expansions, such as BAM list unpacking and -B tag parsing, out
...
into the CommandLine* classes. This makes it easier for external functionality
(such as the VCF streamer) to use GenomeAnalysisEngine directly.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4897 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-22 19:00:17 +00:00
hanna
09c7ea879d
Merging GenomeAnalysisEngine and AbstractGenomeAnalysisEngine back together.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4889 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-21 02:09:46 +00:00
chartl
e406eb0f95
Adding a useful accessor method to TableFeature
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4856 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-16 18:11:51 +00:00
depristo
44feb4a362
Improved BAQ implementation. Now supports adding BAQ tags to reads on the fly with ADD_TAG_ONLY option. Caching fasta reader implementation, and changes throughout the system to enable this. Many performance improvements throughout the system due to better reference access patterns.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4792 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-05 18:29:39 +00:00
ebanks
a181680814
We no longer require dbSNP files to be of the dbsnp rod-type; VCFs will do (provided they are bound to the name 'dbsnp')
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4753 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-30 03:25:18 +00:00
aaron
b03ac61e9d
consolidating the checking of the RMD sequence dictionary against the reference into a single function, and adding an integration test to test that empty VCFs pass (both the indexing and the seq dictionary validation).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4750 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-30 00:01:56 +00:00
hanna
abc13d0a90
Temporary hack: force abort with an intelligent message suggesting that users
...
specify -B:dbsnp,vcf <filename> if the filename passed if the --DBSNP argument
value contains 'vcf'. We'll replace this functionality once dbSNP 132 starts
playing nicely with the tagging system.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4749 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-29 23:37:30 +00:00
depristo
8768e1a240
Useful profiling tool that reads in a single rod and evalutes the time it takes to read the file by byte, by line, into pieces, just the sites of the vcf, and finally the full vcf. Emits a useful table for plotting with the associated R script that can be run like Rscript R/analyzeRodProfile.R table.txt table.pdf titleString
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4728 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-24 14:59:16 +00:00
aaron
53672361cc
capture more details when something IO-related goes wrong in writing a Tribble index
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4720 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-23 17:06:28 +00:00
hanna
90711d445c
Change the interface for RMDTrackBuilder, therefore always mandating the specification
...
of a sequence dictionary and related info. This will hopefully eliminate the cases in
which the refseq track depends a sequence dictionary / contig parser that hasn't been
specified.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4700 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-17 19:00:17 +00:00
ebanks
f1b0f3bc49
Putting my changes from earlier in the day back in after someone (rhymes with 'Dark') trounced on them with his last commit...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4687 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-16 01:55:50 +00:00
depristo
ef2f6d90d2
VQSR now operates on LOD scores in the INFO field directly, and doesn't adjust the QUAL field. New format for tranches file uses LOD score. Old file format no longer supported. log10sumlog10() function, a very useful utility in MathUtils. No more ExtendedPileupElement! Robust math calculations in GMM so that no infinities are generated! HaplotypeScore refactored to enable use of filtered context. Not yet enabled... InferredContext getDouble and getInteger arguments now parse values from Strings if necessary
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4684 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-15 22:19:22 +00:00
hanna
5b83942cee
- Fix DepthOfCoverage so that, when it abuses the ROD system by instantiating a track in onTraversalDone, it also supplies the correct sequence dictionary and parser.
...
- Changed RMDTrackBuilder to use SequenceDictionaryUtils.validateDictionaries for ref <-> ROD sequence dictionary validation.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4683 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-15 20:34:04 +00:00
ebanks
35382468ee
Better error checking/output
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4676 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-15 16:36:34 +00:00
kshakir
673fa841a4
Updated PluginManager so that during testing Queue can dynamically compile and load separately multiple class directories into the same class loader.
...
Removed obsolete usages of PackageUtils with updated PluginManager.
Ported Queue interval utilities written in scala over to Sting's java IntervalUtils.
Added a very basic intergration test to ensure that the fullCallingPipeline.q compiles.
Added options to specify the temporary directories without having to use -Djava.io.tmpdir (useful during the above integration test).
While adding tempDir added options to specify the run directory from the command line, for example "-runDir v1".
Upgraded to scala 2.8.1 and updated calls to deprecated functions.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4661 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-12 20:14:28 +00:00
aaron
698e5cf345
for GATK style codecs, make sure we fill in their GenomeLocParser from the RMDIndexer
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4650 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-11 18:44:15 +00:00
hanna
8e36a07bea
Convert GenomeLocParser into an instance variable. This change is required
...
for anything that needs to be simultaneously aware of multiple references, eg
Queue's interval sharding code, liftover support, distributed GATK etc.
GenomeLocParser instances must now be used to create/parse GenomeLocs.
GenomeLocParser instances are available in walkers by calling either
-getToolkit().getGenomeLocParser()
or
-refContext.getGenomeLocParser()
This is an intermediate change; GenomeLocParser will eventually be merged
with the reference, but we're not clear exactly how to do that yet. This
will become clearer when contig aliasing is implemented.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4642 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-10 17:59:50 +00:00
aaron
97db593efb
making my last commit message actually true
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4636 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-07 18:26:23 +00:00
aaron
be499fc986
making the reference optional (the GATK will set it on the first run if it's not included), and setting the seq index if they do supply it.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4635 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-07 18:15:31 +00:00
aaron
2a8c97a4a7
better error catching, as well as allowing for default index naming, <filename>.idx
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4633 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-06 19:12:19 +00:00
aaron
cb2e26a004
by request, an indexer tool to create Tribble style indexes outside of the GATK
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4632 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-06 18:59:06 +00:00
ebanks
1b3fc8ddd2
Doing things too quickly is also naughty. Thanks, Andrey. Now, we're even.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4597 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-29 14:50:04 +00:00
ebanks
58f7b4c595
Naughty use of assertions means that malformed records are not caught.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4596 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-29 14:41:38 +00:00
hanna
3039c0de3c
Retire old ROD syntax.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4564 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-22 23:52:11 +00:00
depristo
e6b008f87c
Fixed >= vs. > test leading to failure to tolerate dynamic indexes that are created at *exactly* the instant the output VCF is closed too
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4555 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-22 16:11:14 +00:00
depristo
f7ce18553e
GenotypeConcordance now prints interesting sites more nicely. RMDTrackBuilder is now uses the root class FeatureSource not BasicFeatureSource.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4525 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-20 00:29:02 +00:00
depristo
da29fcdb68
No longer writes the index to disk twice. But fixes for closing VCFWriters throughout the codebase
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4488 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-13 14:26:06 +00:00
aaron
ff0df1a2da
A fix for an integration test that was broken by on-the-fly indexing. Also, better reporting of Tribble exceptions in GATK integration tests. Trying to get the tests back up and running...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4483 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-12 18:39:56 +00:00
depristo
38a67fed63
High performance version of standard vcf writer. New general static Tribble class for common constants, including general .idx constant and functions to get standard index name for a given file.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4471 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-08 19:53:21 +00:00
asivache
39e373af6e
deleting accidentally committed junk
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4464 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-08 15:13:01 +00:00
ebanks
6448753cf7
Removed the SequenomValidationConvertor and renamed it VariantValidationAssessor since it no longer handles ped/sequenom files (but instead works on vcfs/variantcontexts). Updated all of the wiki docs, including adding instructions on how to convert ped files to vcf, a la Shaun Purcell. We now officially no longer support ped files everyone. Other misc cleanup in the code.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4419 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-04 02:11:38 +00:00
aaron
64b7b3f83b
fix for a recent change to the indexing code where we ignore the results of locking the file (this is bad), and as a result don't write the index; this should fix the build.
...
Off to Yosemite in 4 hours, enjoy the week gsa folks!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4410 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-02 04:35:11 +00:00
depristo
7551ba8249
Trival refactoring in preparation for on-the-fly indexing
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4409 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-01 22:32:59 +00:00
aaron
70f03a7113
first pass of well-formatted tribble exceptions
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4352 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-25 03:29:33 +00:00
kshakir
4ed9f437e9
Sliced the GAE in half like a gordian knot to avoid the constant merge conflicts.
...
The GAE half has all the walker specific code. The new "Abstract" GAE has the rest of the logic.
More refactoring to come, with the end goal of having a tool that other java analysis programs (Queue, etc.) can use to read in genomic data.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4339 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-23 23:28:55 +00:00
aaron
b968af5db5
The tribble indexes are now updated with correct sequence lengths for each contig they have in their sequence dictionary. Also clean-up in the RMD track builder.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4321 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-21 18:21:22 +00:00
aaron
782e0018e4
removal of most of the old GATK ROD system; also a fix for -Dsingle so we can again run just a single unit or integration test (single tests in tribble can be run with the -DsingleTest option now). More to come.
...
*** Three integration tests had to change: ***
RecalibarationWalkersIntegrationTest:
One of the tests was using the interval as the snp track, and wasn't supplying a DbSNP track (for CountCovariates)
SequenomValidationConverterIntegrationTest:
relies on Plink ROD which we've removed.
PileupWalkerIntegrationTest:
we no longer have implicit interval tracks, so there isn't a rod name over the specified region. Otherwise the same result.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4292 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-15 22:54:49 +00:00
depristo
7880863eb7
Final step in error refactoring. GATK exception is now ReviewedStingException, indicating that this exception is really what one wants. Only use this exception when you have thought about StingException vs. UserException and made a real decision.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4267 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-12 15:07:38 +00:00
depristo
7ad8fbdd5a
Moved GATKException to exceptions
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4266 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-12 14:47:19 +00:00
depristo
595907e98e
Moving StingException
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4262 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-12 14:34:15 +00:00
depristo
40e6179911
Penultimate step in exception system overhaul. UserError is now UserException. This class should be used for all communication with the USER for problems with their inputs. Engine now validates sequence dictionaries for compatibility, detecting not only lack of overlap but now inconsistent headers (b36 ref with v37 BAM, for example) as well as ref / bam order inconsistency. New -U option to allow users to tolerate dangerous seq dict issues. WalkerTest system now supports testing for exceptions (see email and wiki for docs). Tests for vcf and bam vs. ref incompatibility. Waiting on Tribble seq dict improvements to detect b36 VCF with b37 ref (currently cannot tell this is wrong.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4258 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-12 14:02:43 +00:00
depristo
8f1a32acae
All exceptions thrown by the GATK have been reviewed and UserErrors replaced where appropriate. Shazam. Another check-in will remove the GATKException and restore the StingException.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4252 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-10 15:25:30 +00:00
depristo
1de713f354
Massive review of maybe 50% of the exceptions in the GATK. GATKException is a tmp. tracker so that I can tell which StingExceptions I've reviewed. Please don't use it. If you are working on new code and are considering throwing exceptions, it's either UserError or StingException, please
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4246 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-09 23:21:17 +00:00