Eric Banks
d7d8b8e380
Tribble v42 changes the Codec.canDecode method to take in a String instead of a File; this is something that Jim was adamant about (because Tribble can handle streams other than files). I didn't want the next person who needed to rev Tribble to deal with this change additionally, so I took care of updating the GATK now.
2011-11-28 14:18:28 -05:00
Mark DePristo
6cf315e17b
Change interface to getNegLog10PError to getLog10PError
2011-11-18 21:07:30 -05:00
Mark DePristo
7490dbb6eb
First version of VariantContextBuilder
2011-11-18 11:06:15 -05:00
Mark DePristo
aa0610ea92
GenotypeCollection renamed to GenotypesContext
2011-11-16 16:24:05 -05:00
Mark DePristo
460a51f473
ID field now stored in the VariantContext itself, not the attributes
2011-11-15 14:56:33 -05:00
Mark DePristo
f0234ab67f
GenotypeMap -> GenotypeCollection part 2
...
-- Code actually builds
2011-11-14 17:42:55 -05:00
Mark DePristo
1fbdcb4f43
GenotypeMap -> GenotypeCollection
2011-11-14 15:32:03 -05:00
Mark DePristo
b11c535527
Deleted MutableGenotype
...
-- This class wasn't really used anywhere, and so removed to control code bloat.
2011-11-14 13:16:36 -05:00
Mark DePristo
fee9b367e4
VariantContext genotypes are now stored as GenotypeMap objects
...
-- Enables further sophisticated optimizations, as this class can be smarter about storing the data and will directly support operations like subset to samples
-- All instances in the gatk that used Map<String, Genotype> now use GenotypeMap type.
-- Amazingly, there were many places where HashMap<String, Genotype> is used, so that the order of the genotypes is technically undefined and could be dangerous. Now everything uses GenotypeMap with a specific ordering of samples (by name)
-- Integrationtests updated and all pass
2011-11-11 15:00:35 -05:00
Mark DePristo
ee40791776
Attributes are now Map<String,Object> not Map<String,?>
...
-- Allows us to avoid an unnecessary copy when creating InferredGeneticContext (whose name really needs to change).
2011-11-11 09:55:42 -05:00
Eric Banks
d64f8a89a9
Instead of the SelfScopingFeatureCodec interface, pushed this functionality into Tribble itself. Now we can e.g. determine that a file can be parsed by the BedCodec on the fly.
2011-11-09 15:24:29 -05:00
Eric Banks
9424e8b2ca
Initial working version of new interval system in which the argument for -L (and -XL) is allowed to be a rod file (e.g. VCF). Old samtools-style intervals still behave as before. BTI is no longer supported. The merging (union or intersection) of intervals is now consistently applied to all -L (or -XL) intervals, which is nice. More testing needed.
2011-10-26 14:11:49 -04:00
Eric Banks
1b45f21774
Removing this command-line tool. Purposely not doing this in stable so that users who may still use it have time to find other options. But the docs are no longer on the wiki.
2011-09-28 13:18:32 -04:00
Mark DePristo
b7511c5ff3
Fixed long-standing bug in tribble index creation
...
-- Previously, on the fly indices didn't have dictionary set on the fly, so the GATK would read, add dictionary, and rewrite the index. This is now fixed, so that the on the fly index contains the reference dictionary when first written, avoiding the unnecessary read and write
-- Added a GenomeAnalysisEngine and Walker function called getMasterSequenceDictionary() that fetches the reference sequence dictionary. This can be used conveniently everywhere, and is what's written into the Tribble index
-- Refactored tribble index utilities from RMDTrackBuilder into IndexDictionaryUtils
-- VCFWriter now requires the master sequence dictionary
-- Updated walkers that create VCFWriters to provide the master sequence dictionary
2011-09-20 10:53:18 -04:00
Mark DePristo
aa8afa3899
Merge
2011-09-19 21:16:47 -04:00
Eric Banks
da9c8ab386
Revving the Tribble jar where the DbsnpCodec class was renamed to OldDbsnpCodec. Updating GATK code accordingly.
2011-09-06 20:39:42 -04:00
Mark DePristo
ac49b8d26b
Conditional support for PerformanceTrackingQuerySource to measure Tribble / GATK bridge performance
...
-- Removed DEBUG option, instead use MEASURE_TRIBBLE_QUERY_PERFORMANCE in RMDTrackerBuilder
2011-09-01 10:41:55 -04:00
Mark DePristo
a5c65fc133
Debugging information to print out the Query tracks
2011-08-28 18:54:49 -04:00
Mark DePristo
f7414e39bc
Improvements to GATKDocs
...
-- Allowed values for RodBinding<T> are displayed in the GATKDocs
-- Longest name up to 30 characters is chosen for main argument list (suggested by Ryan/Mauricio)
-- Features are listed in alphabetical order
-- Moved useful getParameterizedType() function to JVMUtils
-- Tests of these features in the Documentation Test
2011-08-18 21:20:09 -04:00
Mark DePristo
cbec69a130
Merge branch 'master' into help
...
Conflicts:
public/java/src/org/broadinstitute/sting/utils/help/HelpUtils.java
2011-08-18 11:33:27 -04:00
Mark DePristo
47bbddb724
Now provides type-specific user feedback
...
For RodBinding<VariantContext> error messages now list only the Tribble types that produce VariantContexts
2011-08-18 10:47:16 -04:00
Mark DePristo
2d41ba15a4
Vastly better Tribble help message
...
Here's a new example:
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A USER ERROR has occurred (version 1.1-520-g76495cd):
##### ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
##### ERROR Please do not post this error to the GATK forum
##### ERROR
##### ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
##### ERROR Visit our wiki for extensive documentation http://www.broadinstitute.org/gsa/wiki
##### ERROR Visit our forum to view answers to commonly asked questions http://getsatisfaction.com/gsa
##### ERROR
##### ERROR MESSAGE: Invalid command line: Failed to parse value /humgen/gsa-hpprojects/GATK/data/refGene_b37.filtered.sorted.txt for argument refSeqRodBinding. Message: Invalid command line: No tribble type was provided on the command line and the type of the file could not be determined dynamically. Please add an explicit type tag :TYPE listing the correct type from among the supported types:
##### ERROR Name FeatureType Documentation
##### ERROR BEAGLE BeagleFeature http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_beagle_BeagleCodec.html
##### ERROR BED BEDFeature http://www.broadinstitute.org/gsa/gatkdocs/release/org_broad_tribble_bed_BEDCodec.html
##### ERROR BEDTABLE TableFeature http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_table_BedTableCodec.html
##### ERROR CGVAR VariantContext http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_completegenomics_CGVarCodec.html
##### ERROR DBSNP DbSNPFeature http://www.broadinstitute.org/gsa/gatkdocs/release/org_broad_tribble_dbsnp_DbSNPCodec.html
##### ERROR GELITEXT GeliTextFeature http://www.broadinstitute.org/gsa/gatkdocs/release/org_broad_tribble_gelitext_GeliTextCodec.html
##### ERROR MAF MafFeature http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_gatk_features_maf_MafCodec.html
##### ERROR MILLSDEVINE VariantContext http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_MillsDevineCodec.html
##### ERROR RAWHAPMAP RawHapMapFeature http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_hapmap_RawHapMapCodec.html
##### ERROR REFSEQ RefSeqFeature http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_refseq_RefSeqCodec.html
##### ERROR SAMPILEUP SAMPileupFeature http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_sampileup_SAMPileupCodec.html
##### ERROR SAMREAD SAMReadFeature http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_samread_SAMReadCodec.html
##### ERROR SNPEFF SnpEffFeature http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_snpEff_SnpEffCodec.html
##### ERROR SOAPSNP VariantContext http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_soapsnp_SoapSNPCodec.html
##### ERROR TABLE TableFeature http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_table_TableCodec.html
##### ERROR VCF VariantContext http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_vcf_VCFCodec.html
##### ERROR VCF3 VariantContext http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_vcf_VCF3Codec.html
##### ERROR ------------------------------------------------------------------------------------------
2011-08-18 10:31:32 -04:00
Mark DePristo
c2287c93d7
Cleanup of codec locations. No more dbSNPHelper
...
-- refdata/features now in utils/codecs with the other codecs
-- Deleted dbsnpHelper. rsID function now in VCFutils. Remaining code either deleted or put into VariantContextAdaptors
-- Many associated import updates due to code move
2011-08-18 10:02:46 -04:00
Mark DePristo
9c17d54cb6
getFeatureClass() now returns Class<T> not Class to avoid yesterday's runtime error
2011-08-18 09:39:20 -04:00
Mark DePristo
d59e6ed274
Fix for RefSeqCodec bug and better error messages
...
-- RefSeqCodec bug: getFeatureClass() returned RefSeqCodec.class, not RefSeqFeature.class. Really should change this in Tribble to require Class<T extends Feature> to get compile time type checking
-- Better error messages that actually list the available tribble types, when there's a type error
2011-08-17 16:22:07 -04:00
Andrey Sivachenko
a423546cdd
fix: RefSeq contains records with zero coding length and the refsec codec/feature used to crash on those; now such records are ignored, with warning printed (once)
2011-08-17 15:17:31 -04:00
Eric Banks
045e8a045e
Updating random walkers to new rod system; removing unused GenotypeAndValidateWalker
2011-08-15 14:05:23 -04:00
Eric Banks
27f0748b33
Renaming the HapMap codec and feature to RawHapMap so that we don't get esoteric errors when trying to bind a rod with the name 'hapmap' (since it was also a feature).
2011-08-12 11:11:56 -04:00
Eric Banks
90771b74b4
When matching eval to comps, try to choose the one with the same alt allele.
2011-08-11 13:55:01 -04:00
Eric Banks
bdb1da30fd
Better interface for getting RodBindings to the VariantAnnotatorEngine and its annotations: pass around an AnnotatorCompatibleWalker (interface) object. Updating VA to use the new rod system.
2011-08-10 22:43:08 -04:00
Eric Banks
70b3daf689
VariantsToVCF is up and running again; integration tests are reenabled (and added one for dbSNP).ant
2011-08-09 03:03:43 -04:00
Mark DePristo
e36994e36b
Refactored a FeatureManager class from RMDTrackBuilder
...
New class handles (vastly more cleanly) the db of tribble codecs, features, and names for use throughout the GATK.
Added SelfScopingFeatureCodec interface that allows a FeatureCodec to examine a file and determine if the file can be parsed. This is the first step towards allowing the GATK to dynamically determine the type of a RodBinding.
2011-08-08 14:04:46 -04:00
Mark DePristo
8f696c7731
Continuing progress towards RodBinding 1.0
...
-- Cleaning up old interface to RMDT, docs and contracts added
-- Proper type checking for RodBinding for cases where the Tribble type isn't found or is the wrong type
2011-08-03 17:19:28 -04:00
Mark DePristo
800bb97f0b
Removed getFeaturesAsGATKFeature and created createGenomeLoc(Feature) in genomeLocParser
...
Updated all walkers that used the now deleted methods.
2011-08-03 16:04:51 -04:00
Mark DePristo
79e4a8f6d3
Merge
...
Conflicts:
private/java/src/org/broadinstitute/sting/gatk/walkers/qc/TestVariantContextWalker.java
public/java/src/org/broadinstitute/sting/gatk/walkers/phasing/PhaseByTransmission.java
public/java/src/org/broadinstitute/sting/gatk/walkers/variantrecalibration/VariantDataManager.java
public/java/src/org/broadinstitute/sting/gatk/walkers/variantutils/SelectVariants.java
public/java/src/org/broadinstitute/sting/gatk/walkers/variantutils/VariantValidationAssessor.java
public/java/test/org/broadinstitute/sting/gatk/walkers/recalibration/RecalibrationWalkersIntegrationTest.java
public/java/test/org/broadinstitute/sting/gatk/walkers/recalibration/RecalibrationWalkersPerformanceTest.java
public/java/test/org/broadinstitute/sting/gatk/walkers/varianteval/VariantEvalIntegrationTest.java
public/java/test/org/broadinstitute/sting/utils/variantcontext/VariantContextIntegrationTest.java
2011-08-03 15:09:47 -04:00
Mark DePristo
b25140db83
Contracts and documentation for some of RefMetaDataTracker
...
Continuing to fix integration tests that don't pass / run
2011-08-03 13:34:20 -04:00
Eric Banks
7c89fe01b3
Instead of having the padded reference base be some hackish attribute it is now an actual variable in the Variant Context class. More importantly, we now always require that it be present when padding is necessary - and validate as such upon construction of the VC. This cleans up the interface significantly because we no longer require that a reference base be passed in when writing a VC/VCF record.
2011-08-03 11:00:36 -04:00
Mark DePristo
2874835997
Bug fix for type checking RodBindings
...
Now compares the feature class not the codec class.
UnitTests improvements
integrationtests on their way to actually running
2011-08-02 22:25:41 -04:00
Mark DePristo
b5e843f8f0
Approaching the end for the new RodBinding system
...
-- support for explicit naming of bindings (-X:name,type x)
-- support for automatic naming of bindings in lists (-X:vcf foo.vcf -X:vcf bar.vcf will generate internal names X and X2)
-- ParserEngineUnitTest expanded to cover all of the Rodbinding cases
-- RodBindingUnitTest tests all of the low-level accessors
-- Parsing engine throws UserExceptions when bad bindings are provided on the command line
2011-08-02 22:00:06 -04:00
Mark DePristo
3a27a25cfc
Validates that the tribble binding provides the right object types at startup
...
Tests to ensure this remains working
2011-08-02 20:11:24 -04:00
Mark DePristo
e4a67f3df1
RefMetaDataTracker has complete set of get() functions for List<RodBinding<T>>
...
Including unit tests
2011-08-02 14:28:35 -04:00
Mark DePristo
03741fb640
Merge branch 'master' into rodRefactor
...
Conflicts:
public/java/src/org/broadinstitute/sting/gatk/walkers/annotator/VariantAnnotatorEngine.java
public/java/test/org/broadinstitute/sting/gatk/walkers/indels/IndelRealignerIntegrationTest.java
public/java/test/org/broadinstitute/sting/gatk/walkers/indels/IndelRealignerPerformanceTest.java
public/java/test/org/broadinstitute/sting/utils/variantcontext/VariantContextIntegrationTest.java
2011-08-02 14:21:58 -04:00
Mark DePristo
a366f9a18d
Updating tools to use the RodBinding<T> syntax
2011-08-02 14:05:51 -04:00
Mark DePristo
184030dd56
RefMetaDataTracker no longer automagically converts inputs to VariantContexts
...
This was no longer working properly given that DBSNP indels needed to be moved around. The adaptor system is being refactored and you will need to convert files from X -> VCF for many tools to work.
2011-08-01 15:21:16 -04:00
Mark DePristo
8b1adb8c95
Removed getVariantContext() code
2011-08-01 13:41:09 -04:00
Mark DePristo
7b07c4e04e
RefMetaDataTracker now has get() methods accepting RodBindings
...
RodBinding no longer duplicates the get() methods in RMDT. This is just an object now that connects the command line system to the RMDT.
Updated programs to use new style
Added UnitTests for the RodBinding accessors.
2011-07-30 15:34:11 -04:00
Mark DePristo
3b799db61a
RefMetaDataTracker cleanup and unit tests
...
You know have to provide an explicit list of RODRecordLists upfront to the constructor. RefMetaDataTracker is now immutable. Changes in engine to incorporate these differences
Extensive UnitTests for RefMetaDataTracker now.
2011-07-29 13:23:17 -04:00
Mark DePristo
39b4e76fde
Continuing refactoring of RefMetaDataTracker.
...
On the path towards converging getVariantContext() and getValues() in tracker so that we can have a single approach to get values from RODs with the new RodBinding() types
2011-07-28 17:48:28 -04:00
Eric Banks
33b32c4211
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-07-28 13:57:22 -04:00
Eric Banks
1afc49a297
There are some really 'interesting' (but apparently valid) records in the Mus musculus dbSNP file. Generalized the handling of complex cases in the dbSNP adaptor to handle it all. I just grabbed the actual Mus musculus dbSNP file as a test, ran it whole genome, and confirmed that we finally produce a valid VCF on it. Should be the last commit needed on this adaptor.
2011-07-28 13:55:58 -04:00