gatk-3.8

Commit Graph

Author	SHA1	Message	Date
Eric Banks	f6648e0144	Don't left-align complex indels because it's too complicated.	2011-08-03 12:03:50 -04:00
Mark DePristo	85c67e9891	Contracts and documentation for Rodbinding	2011-08-03 11:16:06 -04:00
Eric Banks	5dc324ff35	Dealing with merge confict	2011-08-03 11:03:47 -04:00
Eric Banks	7c89fe01b3	Instead of having the padded reference base be some hackish attribute it is now an actual variable in the Variant Context class. More importantly, we now always require that it be present when padding is necessary - and validate as such upon construction of the VC. This cleans up the interface significantly because we no longer require that a reference base be passed in when writing a VC/VCF record.	2011-08-03 11:00:36 -04:00
Khalid Shakir	5dcac7b064	GATKReport v0.2: - Floating point column widths are measured correctly - Using fixed width columns instead of white space separated which allows spaces embedded in cell values - Legacy support for parsing white space separated v0.1 tables where the columns may not be fixed width - Enforcing that table descriptions do not contain newlines so that tables can be parsed correctly Replaced GATKReportTableParser with existing functionality in GATKReport	2011-08-03 00:24:47 -04:00
Mark DePristo	2874835997	Bug fix for type checking RodBindings Now compares the feature class not the codec class. UnitTests improvements integrationtests on their way to actually running	2011-08-02 22:25:41 -04:00
Mark DePristo	b5e843f8f0	Approaching the end for the new RodBinding system -- support for explicit naming of bindings (-X:name,type x) -- support for automatic naming of bindings in lists (-X:vcf foo.vcf -X:vcf bar.vcf will generate internal names X and X2) -- ParserEngineUnitTest expanded to cover all of the Rodbinding cases -- RodBindingUnitTest tests all of the low-level accessors -- Parsing engine throws UserExceptions when bad bindings are provided on the command line	2011-08-02 22:00:06 -04:00
David Roazen	d3437e62da	Added a simple utility method Utils.optimumHashSize() to calculate the optimum initial size for a Java hash table (HashMap, HashSet, etc.) given an expected maximum number of elements. The optimum size is the smallest size that's guaranteed not to result in any rehash / table-resize operations. Example Usage: Map<String, Object> hash = new HashMap<String, Object>(Utils.optimumHashSize(expectedMaxElements)); I think we're paying way too heavy a price in unnecessary rehash operations across the GATK. If you don't specify an initial size, you get a table of size 16 that gets completely rehashed and doubles in size every time it becomes 75% full. This means you do at least twice as much work as you need to in order to populate your table: (n + n/2 + n/4 + ... 16 ~= (1 + 1/2 + 1/4...) * n ~= 2 * n	2011-08-02 21:59:06 -04:00
Mark DePristo	83891271b5	--variants throughout integrationtests	2011-08-02 20:28:47 -04:00
Mark DePristo	3a27a25cfc	Validates that the tribble binding provides the right object types at startup Tests to ensure this remains working	2011-08-02 20:11:24 -04:00
Guillermo del Angel	df37716857	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-08-02 18:27:13 -04:00
Mark DePristo	e4a67f3df1	RefMetaDataTracker has complete set of get() functions for List<RodBinding<T>> Including unit tests	2011-08-02 14:28:35 -04:00
Mark DePristo	03741fb640	Merge branch 'master' into rodRefactor Conflicts: public/java/src/org/broadinstitute/sting/gatk/walkers/annotator/VariantAnnotatorEngine.java public/java/test/org/broadinstitute/sting/gatk/walkers/indels/IndelRealignerIntegrationTest.java public/java/test/org/broadinstitute/sting/gatk/walkers/indels/IndelRealignerPerformanceTest.java public/java/test/org/broadinstitute/sting/utils/variantcontext/VariantContextIntegrationTest.java	2011-08-02 14:21:58 -04:00
Mark DePristo	a366f9a18d	Updating tools to use the RodBinding<T> syntax	2011-08-02 14:05:51 -04:00
Ryan Poplin	c0653514b3	minor update to comment in UG	2011-08-02 13:34:48 -04:00
Ryan Poplin	2ba57bb502	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-08-02 13:30:46 -04:00
Ryan Poplin	38e4ae4176	minor update to comment in UG	2011-08-02 13:30:38 -04:00
Guillermo del Angel	821bbfa9e0	Bug fixes and enhancements to run whole-genome indel VQSR, removed old chr20-only code and cleanup	2011-08-02 13:17:20 -04:00
Eric Banks	65c5d55b72	Not sure how I missed these. These lines are now superfluous.	2011-08-02 12:48:36 -04:00
Eric Banks	2c5e526eb7	Don't use the mismatch fraction by default in the RealignerTargetCreator (since it's only useful when using SW in the indel realigner). Also, no more use of -D but instead move over to using VCFs. One integration test is temporarily commented out while I wait for a VCF file to get fixed.	2011-08-02 10:34:46 -04:00
Eric Banks	5626199bb6	The Unified Genotyper now does NOT emit SLOD/SB by default; to compute SB use --computeSLOD	2011-08-02 10:14:21 -04:00
Mark DePristo	184030dd56	RefMetaDataTracker no longer automagically converts inputs to VariantContexts This was no longer working properly given that DBSNP indels needed to be moved around. The adaptor system is being refactored and you will need to convert files from X -> VCF for many tools to work.	2011-08-01 15:21:16 -04:00
Mark DePristo	8b1adb8c95	Removed getVariantContext() code	2011-08-01 13:41:09 -04:00
Eric Banks	3a9b6eacdf	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-08-01 11:23:18 -04:00
Mark DePristo	7b07c4e04e	RefMetaDataTracker now has get() methods accepting RodBindings RodBinding no longer duplicates the get() methods in RMDT. This is just an object now that connects the command line system to the RMDT. Updated programs to use new style Added UnitTests for the RodBinding accessors.	2011-07-30 15:34:11 -04:00
Mark DePristo	a6691ab2fd	List<RodBinding<T>> now working (sort of). At least the argument parsing system tolerates it.	2011-07-29 16:11:22 -04:00
Mark DePristo	6acb4aad3b	RodBinding<T> are properly generic now. VariantContextRodBinding removed, as RodBinding<VariantContext> is the right style now.	2011-07-29 14:37:12 -04:00
Mark DePristo	3b799db61a	RefMetaDataTracker cleanup and unit tests You know have to provide an explicit list of RODRecordLists upfront to the constructor. RefMetaDataTracker is now immutable. Changes in engine to incorporate these differences Extensive UnitTests for RefMetaDataTracker now.	2011-07-29 13:23:17 -04:00
Ryan Poplin	b06deac9ea	Merged bug fix from Stable into Unstable	2011-07-29 10:02:36 -04:00
Ryan Poplin	c0d4110ffd	Correcting redundant warning text.	2011-07-29 10:01:11 -04:00
Mark DePristo	39b4e76fde	Continuing refactoring of RefMetaDataTracker. On the path towards converging getVariantContext() and getValues() in tracker so that we can have a single approach to get values from RODs with the new RodBinding() types	2011-07-28 17:48:28 -04:00
Mark DePristo	7c5c656b46	Uncovered fundamental accounting bug in VariantEval. Will be fixed by dev. team Problem is that Novelty sees multiple records at a site (SNP, INDEL) to calculate whether a site is novel, but VariantEvalWalker makes an arbitrary decision which to use for analysis and CompOverlap may not see a comp record of the same type as eval. So you get lines where the stratification is known but there are 10 novel sites!	2011-07-28 14:19:27 -04:00
Eric Banks	33b32c4211	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-07-28 13:57:22 -04:00
Eric Banks	7a2a65155f	Merged bug fix from Stable into Unstable	2011-07-28 13:56:43 -04:00
Eric Banks	1afc49a297	There are some really 'interesting' (but apparently valid) records in the Mus musculus dbSNP file. Generalized the handling of complex cases in the dbSNP adaptor to handle it all. I just grabbed the actual Mus musculus dbSNP file as a test, ran it whole genome, and confirmed that we finally produce a valid VCF on it. Should be the last commit needed on this adaptor.	2011-07-28 13:55:58 -04:00
Mark DePristo	f7a126722b	Cleaned up VariantContext accessors in RefMetaDataTracker It's no longer possible to provided allowed types, as this was a very rarely used feature in the engine. These get methods have been removed and local uses replaced with tests directly in their code. This simplified the RefMetaDataTracker significantly VariantContextRodBinding now forwards along all of the RefMetaDataTracker methods, so it is possible to create a full equivalent VariantContextRodBinding now as a walker field variable. All walkers updated to the new RefMetaDataTracker function call style	2011-07-28 00:16:34 -04:00
Mark DePristo	c83f9432eb	Cleaned up RefMetaDataTracker Renamed many functions to more clearly state what they are actually doing Removed unnecessary / unused functionality, reducing interface complexity Updated all uses of this code in GATK Added generic, type-safe accessors to RefMetaDataTracker such as public <T> List<T> getValues(final String name, Class<T> clazz) Added standard refMetaDataTracker accessors to RodBinding, so you can do everything you can for generic rods with the tracker directly with with the RodBinding	2011-07-27 23:25:52 -04:00
Mark DePristo	f3ad4ec94b	Removed annoying FastaSequenceIndexBuilderProgressListener infrastructure that was just a boolean switch on whether to print progress or not.	2011-07-27 22:06:23 -04:00
Eric Banks	ff31fa7990	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-07-27 16:15:23 -04:00
Eric Banks	5809a61b20	Merged bug fix from Stable into Unstable	2011-07-27 16:14:59 -04:00
Eric Banks	64aad67b5f	Fixing dbSNP adaptor for complex indels (wasn)	2011-07-27 16:13:45 -04:00
Mark DePristo	15be383d5b	Merge branch 'master' into rodRefactor	2011-07-27 15:36:49 -04:00
Mark DePristo	38a2518668	Merge branch 'master' into rodRefactor	2011-07-27 15:34:54 -04:00
Mark DePristo	60db6cc836	Warnings for old ROD system use. Removed unused class GATKRODFeature	2011-07-27 12:39:12 -04:00
Mark DePristo	097828a466	ParsingEngine now maintains the list of rodBindings No longer try to reparser objects to find the right fields Direct support in RodBinding for getTags()	2011-07-27 11:36:53 -04:00
Mauricio Carneiro	20a3b31b61	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-07-26 19:29:45 -04:00
Mauricio Carneiro	321afac4e8	Updates to the help layout. New style.css, new template for the walker auto-generated html. Short description is no longer repeated in the long description of the walker. Updated DiffObjectsWalker and ContigStatsWalker as "reference" documented walkers.	2011-07-26 19:29:25 -04:00
Kiran V Garimella	405e521d44	Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-07-26 17:56:48 -04:00
Kiran V Garimella	412c466de6	Bug fix, wherein triple-hets after genotype refinement need to be left unphased, not just prior to refinement	2011-07-26 17:43:43 -04:00
Matt Hanna	fec495e292	Fix a nasty little bug in the sharding system: if the last shard in contig n overlaps exactly on disk with the first shard in contig n+1, the shards would be merged together to avoid duplicate extraction. Unfortunately, the interval overlap filter couldn't handle shards spanning contigs, and was choosing to filter out reads from contig n+1 which should have been included. I'm not completely sure why the BAM indexing code would ever specify that the end of one chromosome had the same on-disk location as the start of the next one. I suspect that this is a indexer performance bug.	2011-07-26 15:43:20 -04:00
Mark DePristo	9dfb57168a	RodBinding source is no longer assumed to be a file	2011-07-26 13:59:44 -04:00
Mark DePristo	d0badd5bd6	RodBinding subclassed to VariantContextRodBinding for easy access to VariantContext providing RODs	2011-07-26 13:54:55 -04:00
Mark DePristo	7ab8b53339	Support for List<RodBinding> argument type	2011-07-26 11:37:31 -04:00
Mark DePristo	38969b9783	Prototype of RODBinding @Arguments instead of -B syntax Initial version of RodBinding class. Flow from walker Rodbinding @Arguments -> RMDTriplet (old system) -> GATK engine (standard). Will need refactoring.	2011-07-26 11:09:06 -04:00
Matt Hanna	088fc39308	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-07-25 15:54:56 -04:00
Eric Banks	a53aeb75ab	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-07-25 15:10:35 -04:00
Eric Banks	a29554e565	Removing the Genomic Annotator and its supporting classes	2011-07-25 15:10:25 -04:00
Mark DePristo	3afcb3415d	Max of 1000 records will be loaded and compared to avoid heap size problem.	2011-07-25 14:58:31 -04:00
Mark DePristo	f3049fba63	refdata directory cleanup Removing unused files RODRecordIterator, ReferenceOrderedData, QueryableTrack, RMDTrackCreationException, GATKFeatureIterator, ReferenceOrderedDataUnitTest Refactored dbSNP and refseq utilities to be closer to the other files implementing these features	2011-07-25 13:21:52 -04:00
Matt Hanna	8014fad6ff	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-07-25 13:20:44 -04:00
Matt Hanna	2ac490dbdf	Fix improper detection of command-line arguments with missing values.	2011-07-25 13:20:00 -04:00
Mark DePristo	90947ab359	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-07-25 12:53:56 -04:00
Mark DePristo	acda8eb09c	Commented out test that causes new CommandLineGATK() to fail	2011-07-25 12:43:27 -04:00
Kiran V Garimella	357f503a21	Merge branch 'desktop'	2011-07-25 11:36:27 -04:00
Kiran V Garimella	0b43ee117c	Added the required=false tag to the -noST and -noEV arguments so the auto-help output doesn't look weird (i.e. listing arguments as required when their value has already been specified by default).	2011-07-25 11:35:34 -04:00
Kiran V Garimella	bbb8473f03	Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-07-25 10:59:00 -04:00
Mark DePristo	1a268ff1fd	Refactor so that GenotypeAnnotation and InfoFieldAnnotation share common superclass VariantAnnotatorAnnotation	2011-07-25 10:55:09 -04:00
Mark DePristo	7f8e6a97ee	InfoFieldAnnotation now an abstract class extended by annotations so doc system works	2011-07-25 10:47:11 -04:00
Mauricio Carneiro	4c6c16f895	Documented following the new gatkdoc framework	2011-07-25 00:25:08 -04:00
Mark DePristo	2039ce6102	Default values now displayed in arguments DiffEngine fixed so that newInstance() would work. Pretty quickly encountered a situation where newInstance() failed. Debug output now written when this occurs in the log. Logger now used instead of standard out, with INFO the default level.	2011-07-24 22:56:55 -04:00
Mark DePristo	c43b5981f2	Hidden variables are hidden by default. Settable by command line option DiffObjectsWalker test arguments removed. Minor refactoring of GATKDoclet	2011-07-24 20:52:44 -04:00
Mark DePristo	1c1f1da349	Fixing compilation	2011-07-24 20:01:59 -04:00
Mark DePristo	9f06f6c493	Split GATKDoclet from ResourceBundleDoclet. Refactored GaTKDocWorkUnit	2011-07-24 20:00:04 -04:00
Mark DePristo	ff85687679	Merge branch 'master' into help	2011-07-24 18:14:32 -04:00
Mark DePristo	83996f7951	Enumerated types are working.	2011-07-24 18:14:21 -04:00
Mark DePristo	3c34e9fa65	Cleanup emuns and tables	2011-07-24 17:45:58 -04:00
Mark DePristo	c620d96c96	Inline enum documentation is working	2011-07-24 17:22:14 -04:00
Mark DePristo	793e7d3d1d	Improved header and argument details Argument detail structure cleaned up. Only relevant pieces of information are shown now, and in a cleaner layout. Misc. cleanup in the code.	2011-07-24 16:36:25 -04:00
Mark DePristo	c6af4efcdc	Implemented see also and version header	2011-07-24 16:10:17 -04:00
Mark DePristo	5e0fe2d0f9	Support for style.css via refactored common.html included in all files	2011-07-24 15:42:39 -04:00
Mark DePristo	d0ab6bf7a9	Now links to sub and superclass documentation, where possible.	2011-07-24 09:56:17 -04:00
Mark DePristo	e2dabb70b8	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-07-24 08:57:47 -04:00
Mauricio Carneiro	7ffedf211c	Contig comparator -- sorting contigs like Picard This is very useful if you want to output your text files or manipulate data in the usual chromosome ordering : 1 2 3 ... 21 22 X Y GL??? ... Just use this comparator in any SortedSet class constructor and your data will be sorted like in the BAM file.	2011-07-24 02:33:19 -04:00
Mark DePristo	6b501e267b	Includes non-concrete classes in docs CommandLineGATK has extraDocs to ReadFilter and UserException as well	2011-07-23 22:15:01 -04:00
Mark DePristo	7420ed098e	Semi-working version of extraDocs tag in annotation to refer to one capability being accessible in another Required a significant refactoring of the GATKDoclet, which now has a unified place where the ClassDoc, class, annotation, and handler are all stored together.	2011-07-23 22:07:30 -04:00
Mark DePristo	999acacfa1	Merge branch 'master' into help	2011-07-23 20:19:33 -04:00
Mark DePristo	1d3bcce2c4	Merge branch 'master' into NoDistributedGATK	2011-07-23 20:04:50 -04:00
Mark DePristo	e262f4e10b	gatkdoc now generalized to use @Annotation. Multiple subsystems now use annotation to receive docs Index expanded to use summary() annotation field UserExceptions, ReadFilters, GATK engine all use the system to generate docs Doclet expanded to handle lots of new cases	2011-07-23 20:00:35 -04:00
Kiran V Garimella	1dba8b768c	Merge branch 'laptop'	2011-07-23 01:39:15 -04:00
Kiran V Garimella	57e3d136eb	Don't try to phase triple-hets either.	2011-07-23 01:38:58 -04:00
Kiran V Garimella	5af9d50183	Merge branch 'laptop'	2011-07-23 01:12:06 -04:00
Kiran V Garimella	5521919cc9	Fixed bug where variants to phase were not being selected properly.	2011-07-23 01:11:28 -04:00
Kiran V Garimella	7da99388ac	Merge branch 'laptop'	2011-07-23 01:01:11 -04:00
Kiran V Garimella	58eed20b83	Copy all entries from the attributes map, rather than attempting to modify an unmodifiable map.	2011-07-23 01:00:46 -04:00
Kiran V Garimella	ffa361f57f	Merge branch 'laptop'	2011-07-23 00:50:38 -04:00
Kiran V Garimella	9417ba8c2c	Modified to accept multi-sample VCFs, removed the application of filters, and changed transmission probability field to be a genotype field rather than an INFO field.	2011-07-23 00:48:26 -04:00
Mark DePristo	28b9432d26	Docs for read filters, the engine, and the UserExceptions.	2011-07-22 16:09:21 -04:00
Kiran V Garimella	051c1dc639	Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-07-22 15:59:00 -04:00
Mark DePristo	f0be7348be	Generalized handler to allow it to be used with any arbitrary class structure. DocumentedGATKFeature now includes a field for the group name. Build.xml works with public / private now.	2011-07-22 14:07:40 -04:00
Mark DePristo	453954182e	Generalized the documentation system to use a class-specific annotation and processor. Need to generalize and bug fix the system. But at a high level it's working now.	2011-07-22 13:18:33 -04:00

1 2 3 4 5 ...

289 Commits (5f8264dddb0ded5f0e039bc50c02fb2a2c8cfbb2)