Commit Graph

8256 Commits (69b19047ba1e40a1bd4acdbf11e0a42394de48bb)

Author SHA1 Message Date
Eric Banks f6648e0144 Don't left-align complex indels because it's too complicated. 2011-08-03 12:03:50 -04:00
Mark DePristo 85c67e9891 Contracts and documentation for Rodbinding 2011-08-03 11:16:06 -04:00
Eric Banks 5dc324ff35 Dealing with merge confict 2011-08-03 11:03:47 -04:00
Eric Banks 7c89fe01b3 Instead of having the padded reference base be some hackish attribute it is now an actual variable in the Variant Context class. More importantly, we now always require that it be present when padding is necessary - and validate as such upon construction of the VC. This cleans up the interface significantly because we no longer require that a reference base be passed in when writing a VC/VCF record. 2011-08-03 11:00:36 -04:00
Mark DePristo d9bc673ff2 Fixed bad constructor in RMDTUnitTest 2011-08-03 09:42:43 -04:00
Khalid Shakir 3e043a633c Merged bug fix from Stable into Unstable 2011-08-03 02:23:16 -04:00
Khalid Shakir a587f38808 Fixed example unified genotyper pipeline to wrap filter expressions with quotes and use rod binding name "variant" instead of "vcf". 2011-08-03 02:21:01 -04:00
Khalid Shakir 5dcac7b064 GATKReport v0.2:
- Floating point column widths are measured correctly
- Using fixed width columns instead of white space separated which allows spaces embedded in cell values
- Legacy support for parsing white space separated v0.1 tables where the columns may not be fixed width
- Enforcing that table descriptions do not contain newlines so that tables can be parsed correctly
Replaced GATKReportTableParser with existing functionality in GATKReport
2011-08-03 00:24:47 -04:00
Mark DePristo 2874835997 Bug fix for type checking RodBindings
Now compares the feature class not the codec class.
UnitTests improvements
integrationtests on their way to actually running
2011-08-02 22:25:41 -04:00
Mark DePristo b5e843f8f0 Approaching the end for the new RodBinding system
-- support for explicit naming of bindings (-X:name,type x)
-- support for automatic naming of bindings in lists (-X:vcf foo.vcf -X:vcf bar.vcf will generate internal names X and X2)
-- ParserEngineUnitTest expanded to cover all of the Rodbinding cases
-- RodBindingUnitTest tests all of the low-level accessors
-- Parsing engine throws UserExceptions when bad bindings are provided on the command line
2011-08-02 22:00:06 -04:00
David Roazen d3437e62da Added a simple utility method Utils.optimumHashSize() to calculate the optimum
initial size for a Java hash table (HashMap, HashSet, etc.) given an expected
maximum number of elements. The optimum size is the smallest size that's
guaranteed not to result in any rehash / table-resize operations.

Example Usage:
Map<String, Object> hash = new HashMap<String, Object>(Utils.optimumHashSize(expectedMaxElements));

I think we're paying way too heavy a price in unnecessary rehash operations across
the GATK. If you don't specify an initial size, you get a table of size 16 that gets
completely rehashed and doubles in size every time it becomes 75% full. This means you
do at least twice as much work as you need to in order to populate your table:

(n + n/2 + n/4 + ... 16 ~= (1 + 1/2 + 1/4...) * n ~= 2 * n
2011-08-02 21:59:06 -04:00
Mark DePristo 83891271b5 --variants throughout integrationtests 2011-08-02 20:28:47 -04:00
Mark DePristo 3a27a25cfc Validates that the tribble binding provides the right object types at startup
Tests to ensure this remains working
2011-08-02 20:11:24 -04:00
Guillermo del Angel df37716857 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-02 18:27:13 -04:00
Khalid Shakir daeabe6a2f Updated HybridSelectionPipelineTest's variant count and titv to match expectations based on ebanks removing strand bias.
TODO: update filtering/recalibrations in HSP.
2011-08-02 17:09:16 -04:00
Ryan Poplin 20afe2a437 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-02 15:34:52 -04:00
Ryan Poplin b2cde87378 Removing --DBSNP syntax from BQSR integration tests 2011-08-02 15:34:38 -04:00
Khalid Shakir c2dc7c9c99 Updated flanks eval the same way the targets eval was updated by ebanks. 2011-08-02 15:30:04 -04:00
Mark DePristo e4a67f3df1 RefMetaDataTracker has complete set of get() functions for List<RodBinding<T>>
Including unit tests
2011-08-02 14:28:35 -04:00
Mark DePristo 03741fb640 Merge branch 'master' into rodRefactor
Conflicts:
	public/java/src/org/broadinstitute/sting/gatk/walkers/annotator/VariantAnnotatorEngine.java
	public/java/test/org/broadinstitute/sting/gatk/walkers/indels/IndelRealignerIntegrationTest.java
	public/java/test/org/broadinstitute/sting/gatk/walkers/indels/IndelRealignerPerformanceTest.java
	public/java/test/org/broadinstitute/sting/utils/variantcontext/VariantContextIntegrationTest.java
2011-08-02 14:21:58 -04:00
Mark DePristo a366f9a18d Updating tools to use the RodBinding<T> syntax 2011-08-02 14:05:51 -04:00
Ryan Poplin c0653514b3 minor update to comment in UG 2011-08-02 13:34:48 -04:00
Ryan Poplin 2ba57bb502 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-02 13:30:46 -04:00
Ryan Poplin 38e4ae4176 minor update to comment in UG 2011-08-02 13:30:38 -04:00
Guillermo del Angel 821bbfa9e0 Bug fixes and enhancements to run whole-genome indel VQSR, removed old chr20-only code and cleanup 2011-08-02 13:17:20 -04:00
Eric Banks 65c5d55b72 Not sure how I missed these. These lines are now superfluous. 2011-08-02 12:48:36 -04:00
Eric Banks b9d0d2af22 Adding back temporarily removed integration test now that the file permissions have been fixed. 2011-08-02 12:39:11 -04:00
Eric Banks 9497c303e6 Temporarily comment out annotation from the pipeline test; David will re-enable. 2011-08-02 12:38:47 -04:00
Eric Banks 1c387848de No more use of -D in the integration tests but instead stick with VCFs only. Since all of these tests were duplicated (one each for dbSNP format and for VCF), we don't actually lose coverage in the integration tests. 2011-08-02 10:39:50 -04:00
Eric Banks a2ca994e9a Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-02 10:34:53 -04:00
Eric Banks 2c5e526eb7 Don't use the mismatch fraction by default in the RealignerTargetCreator (since it's only useful when using SW in the indel realigner). Also, no more use of -D but instead move over to using VCFs. One integration test is temporarily commented out while I wait for a VCF file to get fixed. 2011-08-02 10:34:46 -04:00
Eric Banks 5626199bb6 The Unified Genotyper now does NOT emit SLOD/SB by default; to compute SB use --computeSLOD 2011-08-02 10:14:21 -04:00
Guillermo del Angel a6a87006a5 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-02 08:26:05 -04:00
Guillermo del Angel 3b2d1dee66 Final changes to indel project consensus script 2011-08-02 08:25:50 -04:00
Mark DePristo 184030dd56 RefMetaDataTracker no longer automagically converts inputs to VariantContexts
This was no longer working properly given that DBSNP indels needed to be moved around.  The adaptor system is being refactored and you will need to convert files from X -> VCF for many tools to work.
2011-08-01 15:21:16 -04:00
Mark DePristo 8b1adb8c95 Removed getVariantContext() code 2011-08-01 13:41:09 -04:00
Mark DePristo f69bff5dd6 Commented out, because these fail the now removed dbSNP conversion. 2011-08-01 13:34:25 -04:00
Eric Banks 3a9b6eacdf Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-01 11:23:18 -04:00
Mark DePristo 4f8d830960 Updated to reflect new parse() function 2011-07-30 15:34:20 -04:00
Mark DePristo 7b07c4e04e RefMetaDataTracker now has get() methods accepting RodBindings
RodBinding no longer duplicates the get() methods in RMDT.  This is just an object now that connects the command line system to the RMDT.
Updated programs to use new style
Added UnitTests for the RodBinding accessors.
2011-07-30 15:34:11 -04:00
Mauricio Carneiro e7b4959ebe Merged bug fix from Stable into Unstable 2011-07-30 02:06:45 -04:00
Mauricio Carneiro 2d94037ad0 Remove temporary index files (*.bai)
some temporary index files were not being removed.
2011-07-30 02:05:22 -04:00
Mark DePristo a6691ab2fd List<RodBinding<T>> now working (sort of).
At least the argument parsing system tolerates it.
2011-07-29 16:11:22 -04:00
Mark DePristo 6acb4aad3b RodBinding<T> are properly generic now.
VariantContextRodBinding removed, as RodBinding<VariantContext> is the right style now.
2011-07-29 14:37:12 -04:00
Mark DePristo 3b799db61a RefMetaDataTracker cleanup and unit tests
You know have to provide an explicit list of RODRecordLists upfront to the constructor.  RefMetaDataTracker is now immutable.  Changes in engine to incorporate these differences
Extensive UnitTests for RefMetaDataTracker now.
2011-07-29 13:23:17 -04:00
Ryan Poplin b06deac9ea Merged bug fix from Stable into Unstable 2011-07-29 10:02:36 -04:00
Ryan Poplin c0d4110ffd Correcting redundant warning text. 2011-07-29 10:01:11 -04:00
Mauricio Carneiro a58ddab93b minQual and minPower filters added. VCF output added.
Calls are now made based on the likelihood AC model. Two filters are applied: minQual and minPower. Output is now a VCF file with the variant context. It's now called the gatk's PoolCaller, no longer Replication Validation framework. Lots of testing ensue....
2011-07-28 18:58:36 -04:00
Mark DePristo 39b4e76fde Continuing refactoring of RefMetaDataTracker.
On the path towards converging getVariantContext() and getValues() in tracker so that we can have a single approach to get values from RODs with the new RodBinding() types
2011-07-28 17:48:28 -04:00
Mark DePristo 7c5c656b46 Uncovered fundamental accounting bug in VariantEval. Will be fixed by dev. team
Problem is that Novelty sees multiple records at a site (SNP, INDEL) to calculate whether a site is novel, but VariantEvalWalker makes an arbitrary decision which to use for analysis and CompOverlap may not see a comp record of the same type as eval.  So you get lines where the stratification is known but there are 10 novel sites!
2011-07-28 14:19:27 -04:00