gatk-3.8

Commit Graph

Author	SHA1	Message	Date
David Roazen	b180a1311a	Merge branch 'snpEff'	2011-08-08 22:12:14 -04:00
David Roazen	a13bc7b929	Added an integration test for the SnpEff annotation support, as well as some extra safety checks and comments.	2011-08-08 20:01:24 -04:00
Mark DePristo	80924d24de	Single positional arguments are now treated as names unless they actually match a tribble feature	2011-08-08 19:26:27 -04:00
Mark DePristo	f8a56bc64b	Merge branch 'master' into rodRefactor	2011-08-08 16:58:18 -04:00
Mark DePristo	f8ad91b16f	Reverting a bunch of bad -B type drops	2011-08-08 16:57:38 -04:00
David Roazen	5e288136e0	Added unit tests for the SnpEff codec, and made minor adjustments to the codec itself.	2011-08-08 16:51:43 -04:00
Eric Banks	d7813db217	Combine Variants was actually outputting invalid VCFs in cases where it was combining Variant Contexts with different alternate alleles: if any of the genotypes had PLs they were no longer valid/correct. Added a check for such cases (the combined VC has more alleles than an original VC) and strip out the PLs when triggered; added integration test to cover it. I also added the check to Select Variants, although it currently doesn't remove unused alleles so it should never trigger. Is there any reason not to strip out unused alleles after a select?	2011-08-08 16:25:35 -04:00
Mark DePristo	383bb6f0e0	Merge branch 'master' into rodRefactor	2011-08-08 15:25:55 -04:00
Mark DePristo	ba7353c561	Updated IntegrationTests to use the new type free format for VCF files	2011-08-08 15:04:38 -04:00
Mark DePristo	0810c42309	GATK now does dynamic type determination for VCF files Added UnitTests covering all of the cases.	2011-08-08 14:45:46 -04:00
Mark DePristo	e36994e36b	Refactored a FeatureManager class from RMDTrackBuilder New class handles (vastly more cleanly) the db of tribble codecs, features, and names for use throughout the GATK. Added SelfScopingFeatureCodec interface that allows a FeatureCodec to examine a file and determine if the file can be parsed. This is the first step towards allowing the GATK to dynamically determine the type of a RodBinding.	2011-08-08 14:04:46 -04:00
Eric Banks	197169e47b	Submitting patch from Larry Singh to make MathUtils compatible with java 1.7	2011-08-08 13:34:04 -04:00
David Roazen	dd974040af	When finding the highest-impact effect at a locus, all effects that are not within a non-coding gene are now considered higher impact than all effects that are within a non-coding gene.	2011-08-08 13:29:54 -04:00
David Roazen	c1061e994c	Initial support for adding genomic annotations through VariantAnnotator using the output from the SnpEff tool, which replaces the old Genomic Annotator.	2011-08-08 13:29:53 -04:00
Mark DePristo	0db79207e8	Refactored dependancy from CommandLineGATK from javadocs This allows us to run the GATK again in environments without Javadoc loading by default in the classpath	2011-08-08 12:27:13 -04:00
Mark DePristo	e5fde0d16b	Merge branch 'master' into rodRefactor	2011-08-08 10:08:43 -04:00
Mark DePristo	526b524c3c	CombineVariants with new RodBinding. Bugfix -- CombineVariants now uses the new RodBinding syntax, -V / --variants. Passed all integration tests on first run -- Exposed gapping bug in the List<RodBinding<T>> system now fixed. ParserEngine now has a addRodBinding() that is called by RodBindingArgumentTypeDescriptor when it encounters each RodBinding. This allows the system to work with collection types that are recursively parsed by the system.	2011-08-07 20:16:51 -04:00
Ryan Poplin	6693407bd8	Merged bug fix from Stable into Unstable	2011-08-07 17:39:03 -04:00
Mark DePristo	5f8bc3aa8a	Documenting classes, and name cleanup	2011-08-07 15:17:50 -04:00
Mark DePristo	1c63d43176	Help now points to GATKDocs instead of spitting out full, garbled description	2011-08-07 15:02:46 -04:00
Mark DePristo	b0e91f85cf	fix merge from Khalid's Queue fix	2011-08-07 10:33:20 -04:00
Mark DePristo	4d88e72958	Merge remote-tracking branch 'remotes/khalid/rodRefactor' into rodRefactor Conflicts: public/java/src/org/broadinstitute/sting/gatk/walkers/variantutils/SelectVariants.java public/java/test/org/broadinstitute/sting/BaseTest.java	2011-08-07 10:32:27 -04:00
Khalid Shakir	f049461120	Changed @Argument to @Input on input RodBindings. Changed shortname collision with longname. Restored scala builds. Updated HSP to use new syntax.	2011-08-06 20:44:19 -04:00
Mark DePristo	d7f98e5c2a	Fixed merge conflict deleting a {	2011-08-04 18:48:34 -04:00
Mark DePristo	75632abf88	Merge branch 'master' into rodRefactor Conflicts: public/java/src/org/broadinstitute/sting/gatk/walkers/variantutils/VariantsToVCF.java public/java/test/org/broadinstitute/sting/gatk/walkers/indels/RealignerTargetCreatorIntegrationTest.java public/java/test/org/broadinstitute/sting/gatk/walkers/recalibration/RecalibrationWalkersIntegrationTest.java	2011-08-04 18:44:14 -04:00
Mark DePristo	f21f7f6335	SelectVariants fully documented, now the shining example of the new RodBinding system.	2011-08-04 18:28:59 -04:00
Mark DePristo	9be1ee59cc	TODO comments for Eric	2011-08-04 18:07:50 -04:00
Guillermo del Angel	a8eb8c27f0	a) Minor changes to indel consensus scripts to better reflect good default values, b) Fixed up Mills/Devine codec so it always produces correct ref padded bases, and added option to VariantsToVCF to fix reference base	2011-08-04 15:34:49 -04:00
Ryan Poplin	98a96f07c1	Updated standard deviation parameter in VQSR to our current recommended value	2011-08-04 14:06:26 -04:00
Eric Banks	e48492f3c3	Validate that the reference padding base for indels is correct.	2011-08-04 12:48:56 -04:00
Mark DePristo	f0d798d47c	Bug fix: call RodBinding.resetNameCounter() in new ParsingEngine() so that we don't magically misnumber arguments in the integration tests where the GATK is only instantiated once.	2011-08-04 12:06:10 -04:00
Mark DePristo	d0279bb28c	RodBinding names are now defaulting to the ArgumentTypeDescriptor fullname Nearly all of the tools are passing integrationtests	2011-08-03 20:48:11 -04:00
Mark DePristo	0ef85647f7	A working version of a GATKReportDiffableReader for the diffEngine!	2011-08-03 18:21:18 -04:00
Mark DePristo	acbd3d0922	Fixing up integration tests so more	2011-08-03 17:26:35 -04:00
Mark DePristo	8f696c7731	Continuing progress towards RodBinding 1.0 -- Cleaning up old interface to RMDT, docs and contracts added -- Proper type checking for RodBinding for cases where the Tribble type isn't found or is the wrong type	2011-08-03 17:19:28 -04:00
Mark DePristo	800bb97f0b	Removed getFeaturesAsGATKFeature and created createGenomeLoc(Feature) in genomeLocParser Updated all walkers that used the now deleted methods.	2011-08-03 16:04:51 -04:00
Mark DePristo	f6563c0f9f	Removed support for RMD in @Requires and @Allows Merge as well Conflicts: private/java/src/org/broadinstitute/sting/gatk/walkers/qc/TestVariantContextWalker.java public/java/src/org/broadinstitute/sting/gatk/walkers/phasing/PhaseByTransmission.java public/java/src/org/broadinstitute/sting/gatk/walkers/variantrecalibration/VariantDataManager.java public/java/src/org/broadinstitute/sting/gatk/walkers/variantutils/SelectVariants.java public/java/src/org/broadinstitute/sting/gatk/walkers/variantutils/VariantValidationAssessor.java public/java/test/org/broadinstitute/sting/gatk/walkers/recalibration/RecalibrationWalkersIntegrationTest.java public/java/test/org/broadinstitute/sting/gatk/walkers/recalibration/RecalibrationWalkersPerformanceTest.java public/java/test/org/broadinstitute/sting/gatk/walkers/varianteval/VariantEvalIntegrationTest.java public/java/test/org/broadinstitute/sting/utils/variantcontext/VariantContextIntegrationTest.java	2011-08-03 15:36:55 -04:00
Mark DePristo	79e4a8f6d3	Merge Conflicts: private/java/src/org/broadinstitute/sting/gatk/walkers/qc/TestVariantContextWalker.java public/java/src/org/broadinstitute/sting/gatk/walkers/phasing/PhaseByTransmission.java public/java/src/org/broadinstitute/sting/gatk/walkers/variantrecalibration/VariantDataManager.java public/java/src/org/broadinstitute/sting/gatk/walkers/variantutils/SelectVariants.java public/java/src/org/broadinstitute/sting/gatk/walkers/variantutils/VariantValidationAssessor.java public/java/test/org/broadinstitute/sting/gatk/walkers/recalibration/RecalibrationWalkersIntegrationTest.java public/java/test/org/broadinstitute/sting/gatk/walkers/recalibration/RecalibrationWalkersPerformanceTest.java public/java/test/org/broadinstitute/sting/gatk/walkers/varianteval/VariantEvalIntegrationTest.java public/java/test/org/broadinstitute/sting/utils/variantcontext/VariantContextIntegrationTest.java	2011-08-03 15:09:47 -04:00
Mark DePristo	38efd3066c	Bug fix for mask RodBinding	2011-08-03 14:58:18 -04:00
Eric Banks	f62f47d476	Not sure why this didn't fail before, but bringing VE up to date with previous changes	2011-08-03 14:27:07 -04:00
Mark DePristo	b25140db83	Contracts and documentation for some of RefMetaDataTracker Continuing to fix integration tests that don't pass / run	2011-08-03 13:34:20 -04:00
Eric Banks	f6648e0144	Don't left-align complex indels because it's too complicated.	2011-08-03 12:03:50 -04:00
Mark DePristo	85c67e9891	Contracts and documentation for Rodbinding	2011-08-03 11:16:06 -04:00
Eric Banks	5dc324ff35	Dealing with merge confict	2011-08-03 11:03:47 -04:00
Eric Banks	7c89fe01b3	Instead of having the padded reference base be some hackish attribute it is now an actual variable in the Variant Context class. More importantly, we now always require that it be present when padding is necessary - and validate as such upon construction of the VC. This cleans up the interface significantly because we no longer require that a reference base be passed in when writing a VC/VCF record.	2011-08-03 11:00:36 -04:00
Khalid Shakir	5dcac7b064	GATKReport v0.2: - Floating point column widths are measured correctly - Using fixed width columns instead of white space separated which allows spaces embedded in cell values - Legacy support for parsing white space separated v0.1 tables where the columns may not be fixed width - Enforcing that table descriptions do not contain newlines so that tables can be parsed correctly Replaced GATKReportTableParser with existing functionality in GATKReport	2011-08-03 00:24:47 -04:00
Mark DePristo	2874835997	Bug fix for type checking RodBindings Now compares the feature class not the codec class. UnitTests improvements integrationtests on their way to actually running	2011-08-02 22:25:41 -04:00
Mark DePristo	b5e843f8f0	Approaching the end for the new RodBinding system -- support for explicit naming of bindings (-X:name,type x) -- support for automatic naming of bindings in lists (-X:vcf foo.vcf -X:vcf bar.vcf will generate internal names X and X2) -- ParserEngineUnitTest expanded to cover all of the Rodbinding cases -- RodBindingUnitTest tests all of the low-level accessors -- Parsing engine throws UserExceptions when bad bindings are provided on the command line	2011-08-02 22:00:06 -04:00
David Roazen	d3437e62da	Added a simple utility method Utils.optimumHashSize() to calculate the optimum initial size for a Java hash table (HashMap, HashSet, etc.) given an expected maximum number of elements. The optimum size is the smallest size that's guaranteed not to result in any rehash / table-resize operations. Example Usage: Map<String, Object> hash = new HashMap<String, Object>(Utils.optimumHashSize(expectedMaxElements)); I think we're paying way too heavy a price in unnecessary rehash operations across the GATK. If you don't specify an initial size, you get a table of size 16 that gets completely rehashed and doubles in size every time it becomes 75% full. This means you do at least twice as much work as you need to in order to populate your table: (n + n/2 + n/4 + ... 16 ~= (1 + 1/2 + 1/4...) * n ~= 2 * n	2011-08-02 21:59:06 -04:00
Mark DePristo	83891271b5	--variants throughout integrationtests	2011-08-02 20:28:47 -04:00
Mark DePristo	3a27a25cfc	Validates that the tribble binding provides the right object types at startup Tests to ensure this remains working	2011-08-02 20:11:24 -04:00
Guillermo del Angel	df37716857	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-08-02 18:27:13 -04:00
Mark DePristo	e4a67f3df1	RefMetaDataTracker has complete set of get() functions for List<RodBinding<T>> Including unit tests	2011-08-02 14:28:35 -04:00
Mark DePristo	03741fb640	Merge branch 'master' into rodRefactor Conflicts: public/java/src/org/broadinstitute/sting/gatk/walkers/annotator/VariantAnnotatorEngine.java public/java/test/org/broadinstitute/sting/gatk/walkers/indels/IndelRealignerIntegrationTest.java public/java/test/org/broadinstitute/sting/gatk/walkers/indels/IndelRealignerPerformanceTest.java public/java/test/org/broadinstitute/sting/utils/variantcontext/VariantContextIntegrationTest.java	2011-08-02 14:21:58 -04:00
Mark DePristo	a366f9a18d	Updating tools to use the RodBinding<T> syntax	2011-08-02 14:05:51 -04:00
Ryan Poplin	c0653514b3	minor update to comment in UG	2011-08-02 13:34:48 -04:00
Ryan Poplin	2ba57bb502	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-08-02 13:30:46 -04:00
Ryan Poplin	38e4ae4176	minor update to comment in UG	2011-08-02 13:30:38 -04:00
Guillermo del Angel	821bbfa9e0	Bug fixes and enhancements to run whole-genome indel VQSR, removed old chr20-only code and cleanup	2011-08-02 13:17:20 -04:00
Eric Banks	65c5d55b72	Not sure how I missed these. These lines are now superfluous.	2011-08-02 12:48:36 -04:00
Eric Banks	2c5e526eb7	Don't use the mismatch fraction by default in the RealignerTargetCreator (since it's only useful when using SW in the indel realigner). Also, no more use of -D but instead move over to using VCFs. One integration test is temporarily commented out while I wait for a VCF file to get fixed.	2011-08-02 10:34:46 -04:00
Eric Banks	5626199bb6	The Unified Genotyper now does NOT emit SLOD/SB by default; to compute SB use --computeSLOD	2011-08-02 10:14:21 -04:00
Mark DePristo	184030dd56	RefMetaDataTracker no longer automagically converts inputs to VariantContexts This was no longer working properly given that DBSNP indels needed to be moved around. The adaptor system is being refactored and you will need to convert files from X -> VCF for many tools to work.	2011-08-01 15:21:16 -04:00
Mark DePristo	8b1adb8c95	Removed getVariantContext() code	2011-08-01 13:41:09 -04:00
Eric Banks	3a9b6eacdf	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-08-01 11:23:18 -04:00
Mark DePristo	7b07c4e04e	RefMetaDataTracker now has get() methods accepting RodBindings RodBinding no longer duplicates the get() methods in RMDT. This is just an object now that connects the command line system to the RMDT. Updated programs to use new style Added UnitTests for the RodBinding accessors.	2011-07-30 15:34:11 -04:00
Mark DePristo	a6691ab2fd	List<RodBinding<T>> now working (sort of). At least the argument parsing system tolerates it.	2011-07-29 16:11:22 -04:00
Mark DePristo	6acb4aad3b	RodBinding<T> are properly generic now. VariantContextRodBinding removed, as RodBinding<VariantContext> is the right style now.	2011-07-29 14:37:12 -04:00
Mark DePristo	3b799db61a	RefMetaDataTracker cleanup and unit tests You know have to provide an explicit list of RODRecordLists upfront to the constructor. RefMetaDataTracker is now immutable. Changes in engine to incorporate these differences Extensive UnitTests for RefMetaDataTracker now.	2011-07-29 13:23:17 -04:00
Ryan Poplin	b06deac9ea	Merged bug fix from Stable into Unstable	2011-07-29 10:02:36 -04:00
Ryan Poplin	c0d4110ffd	Correcting redundant warning text.	2011-07-29 10:01:11 -04:00
Mark DePristo	39b4e76fde	Continuing refactoring of RefMetaDataTracker. On the path towards converging getVariantContext() and getValues() in tracker so that we can have a single approach to get values from RODs with the new RodBinding() types	2011-07-28 17:48:28 -04:00
Mark DePristo	7c5c656b46	Uncovered fundamental accounting bug in VariantEval. Will be fixed by dev. team Problem is that Novelty sees multiple records at a site (SNP, INDEL) to calculate whether a site is novel, but VariantEvalWalker makes an arbitrary decision which to use for analysis and CompOverlap may not see a comp record of the same type as eval. So you get lines where the stratification is known but there are 10 novel sites!	2011-07-28 14:19:27 -04:00
Eric Banks	33b32c4211	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-07-28 13:57:22 -04:00
Eric Banks	7a2a65155f	Merged bug fix from Stable into Unstable	2011-07-28 13:56:43 -04:00
Eric Banks	1afc49a297	There are some really 'interesting' (but apparently valid) records in the Mus musculus dbSNP file. Generalized the handling of complex cases in the dbSNP adaptor to handle it all. I just grabbed the actual Mus musculus dbSNP file as a test, ran it whole genome, and confirmed that we finally produce a valid VCF on it. Should be the last commit needed on this adaptor.	2011-07-28 13:55:58 -04:00
Mark DePristo	f7a126722b	Cleaned up VariantContext accessors in RefMetaDataTracker It's no longer possible to provided allowed types, as this was a very rarely used feature in the engine. These get methods have been removed and local uses replaced with tests directly in their code. This simplified the RefMetaDataTracker significantly VariantContextRodBinding now forwards along all of the RefMetaDataTracker methods, so it is possible to create a full equivalent VariantContextRodBinding now as a walker field variable. All walkers updated to the new RefMetaDataTracker function call style	2011-07-28 00:16:34 -04:00
Mark DePristo	c83f9432eb	Cleaned up RefMetaDataTracker Renamed many functions to more clearly state what they are actually doing Removed unnecessary / unused functionality, reducing interface complexity Updated all uses of this code in GATK Added generic, type-safe accessors to RefMetaDataTracker such as public <T> List<T> getValues(final String name, Class<T> clazz) Added standard refMetaDataTracker accessors to RodBinding, so you can do everything you can for generic rods with the tracker directly with with the RodBinding	2011-07-27 23:25:52 -04:00
Mark DePristo	f3ad4ec94b	Removed annoying FastaSequenceIndexBuilderProgressListener infrastructure that was just a boolean switch on whether to print progress or not.	2011-07-27 22:06:23 -04:00
Eric Banks	ff31fa7990	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-07-27 16:15:23 -04:00
Eric Banks	5809a61b20	Merged bug fix from Stable into Unstable	2011-07-27 16:14:59 -04:00
Eric Banks	64aad67b5f	Fixing dbSNP adaptor for complex indels (wasn)	2011-07-27 16:13:45 -04:00
Mark DePristo	15be383d5b	Merge branch 'master' into rodRefactor	2011-07-27 15:36:49 -04:00
Mark DePristo	38a2518668	Merge branch 'master' into rodRefactor	2011-07-27 15:34:54 -04:00
Mark DePristo	60db6cc836	Warnings for old ROD system use. Removed unused class GATKRODFeature	2011-07-27 12:39:12 -04:00
Mark DePristo	097828a466	ParsingEngine now maintains the list of rodBindings No longer try to reparser objects to find the right fields Direct support in RodBinding for getTags()	2011-07-27 11:36:53 -04:00
Mauricio Carneiro	20a3b31b61	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-07-26 19:29:45 -04:00
Mauricio Carneiro	321afac4e8	Updates to the help layout. New style.css, new template for the walker auto-generated html. Short description is no longer repeated in the long description of the walker. Updated DiffObjectsWalker and ContigStatsWalker as "reference" documented walkers.	2011-07-26 19:29:25 -04:00
Kiran V Garimella	405e521d44	Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-07-26 17:56:48 -04:00
Kiran V Garimella	412c466de6	Bug fix, wherein triple-hets after genotype refinement need to be left unphased, not just prior to refinement	2011-07-26 17:43:43 -04:00
Matt Hanna	fec495e292	Fix a nasty little bug in the sharding system: if the last shard in contig n overlaps exactly on disk with the first shard in contig n+1, the shards would be merged together to avoid duplicate extraction. Unfortunately, the interval overlap filter couldn't handle shards spanning contigs, and was choosing to filter out reads from contig n+1 which should have been included. I'm not completely sure why the BAM indexing code would ever specify that the end of one chromosome had the same on-disk location as the start of the next one. I suspect that this is a indexer performance bug.	2011-07-26 15:43:20 -04:00
Mark DePristo	9dfb57168a	RodBinding source is no longer assumed to be a file	2011-07-26 13:59:44 -04:00
Mark DePristo	d0badd5bd6	RodBinding subclassed to VariantContextRodBinding for easy access to VariantContext providing RODs	2011-07-26 13:54:55 -04:00
Mark DePristo	7ab8b53339	Support for List<RodBinding> argument type	2011-07-26 11:37:31 -04:00
Mark DePristo	38969b9783	Prototype of RODBinding @Arguments instead of -B syntax Initial version of RodBinding class. Flow from walker Rodbinding @Arguments -> RMDTriplet (old system) -> GATK engine (standard). Will need refactoring.	2011-07-26 11:09:06 -04:00
Matt Hanna	088fc39308	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-07-25 15:54:56 -04:00
Eric Banks	a53aeb75ab	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-07-25 15:10:35 -04:00
Eric Banks	a29554e565	Removing the Genomic Annotator and its supporting classes	2011-07-25 15:10:25 -04:00
Mark DePristo	3afcb3415d	Max of 1000 records will be loaded and compared to avoid heap size problem.	2011-07-25 14:58:31 -04:00
Mark DePristo	f3049fba63	refdata directory cleanup Removing unused files RODRecordIterator, ReferenceOrderedData, QueryableTrack, RMDTrackCreationException, GATKFeatureIterator, ReferenceOrderedDataUnitTest Refactored dbSNP and refseq utilities to be closer to the other files implementing these features	2011-07-25 13:21:52 -04:00

1 2 3 4 5 ...

330 Commits (c7ca33cbff8707973b7e09b738235151b970217e)