Commit Graph

8118 Commits (a2e79fbe8a24d8d970b8fbf790d395c0b79b03cc)

Author SHA1 Message Date
Mark DePristo a2e79fbe8a Fixes to contracts 2011-11-18 14:18:53 -05:00
Mark DePristo 660d6009a2 Documentation and contracts for GenotypesContext and VariantContextBuilder 2011-11-18 13:59:30 -05:00
Mark DePristo f54afc19b4 VariantContextBuilder
-- New approach to making VariantContexts modeled on StringBuilder
-- No more modify routines -- use VariantContextBuilder
-- Renamed isPolymorphic to isPolymorphicInSamples.   Same for mono
-- getChromosomeCount -> getCalledChrCount
-- Walkers changed to use new VariantContext.  Some deprecated new VariantContext calls remain
-- VCFCodec now uses optimized cached information to create GenotypesContext.
2011-11-18 12:39:10 -05:00
Mark DePristo 7490dbb6eb First version of VariantContextBuilder 2011-11-18 11:06:15 -05:00
Mark DePristo fa454c88bb UnitTests for VariantContext for chrCount, getSampleNames, Order function
-- Major change to how chromosomeCounts is computed.  Now NO_CALL alleles are always excluded.  So ChromosomeCounts(A/.) is 1, the previous result would have been 2.
-- Naming changes for getSamplesNameInOrder()
2011-11-17 20:37:22 -05:00
Mark DePristo 02f22cc9f8 No more VC integration tests. All tests are now unit tests 2011-11-17 15:33:09 -05:00
Mark DePristo 23359d1c6c Bugfix for pruneVariantContext, which was dropping the ref base for padding 2011-11-17 15:32:52 -05:00
Mark DePristo 473b860312 Major determinism fix for UG and RankSumTest
-- Now these routines all iterate in sample name order (genotypes.iterateInSampleNameOrder) so that the results of UG and the annotator do not depend on the particular order of samples we see for the exact model and the RankSumTest
2011-11-17 15:31:45 -05:00
Mark DePristo 7e66677769 Expanded UnitTests for VariantContext
Tests for
-- getGenotype and getGenotypes
-- subContextBySample
-- modify routines
2011-11-16 20:45:15 -05:00
Mark DePristo aa0610ea92 GenotypeCollection renamed to GenotypesContext 2011-11-16 16:24:05 -05:00
Mark DePristo 974daaca4d V13 version in archive. Can you pulled out wholesale for performance testing 2011-11-16 16:08:46 -05:00
Mark DePristo caf6080402 Better algorithm for merging genotypes in CombineVariants 2011-11-16 15:17:33 -05:00
Mark DePristo 101ffc4dfd Expanded, contrastive VariantContextBenchmark
-- Compares performance across a bunch of common operations with GATK 1.3 version of VariantContext and GATK 1.4
-- 1.3 VC and associated utilities copied wholesale into test directory under v13
2011-11-16 13:35:16 -05:00
Mark DePristo e56d52006a Continuing bugfixes to get new VC working 2011-11-16 10:39:17 -05:00
Mark DePristo df415da4ab More bug fixes on the way to passing all tests 2011-11-15 17:38:12 -05:00
Mark DePristo 0be23aae4e Bugfixes on way to a working refactored VariantContext 2011-11-15 17:20:14 -05:00
Mark DePristo 231c47c039 Bugfixes on way to a working refactored VariantContext 2011-11-15 16:42:50 -05:00
Mark DePristo 2b2514dad2 Moved many unused phasing walkers and utilities to archive 2011-11-15 16:14:50 -05:00
Mark DePristo 460a51f473 ID field now stored in the VariantContext itself, not the attributes 2011-11-15 14:56:33 -05:00
Mark DePristo 233e581828 Merging in Master 2011-11-15 09:28:24 -05:00
Mark DePristo 6e1a86bc3e Bug fixes to VariantContext and GenotypeCollection 2011-11-15 09:21:30 -05:00
Mauricio Carneiro cde829899d compress Reduce Read counts bytes by offset
compressed the representation of the reduce reads counts by offset results in 17% average compression in final BAM file size.

Example compression -->

from : 10, 10, 11, 11, 12, 12, 12, 11, 10
to:      10, 0, 1, 1,2, 2, 2, 1, 0
2011-11-14 18:30:24 -05:00
Mauricio Carneiro a1ce3d8141 Not reporting counts to reduced deletions (temporary patch)
Deletions will not have counts represented in the reduced form. This may change in the future with a ReadBackedPileup refactor.
2011-11-14 18:30:24 -05:00
Mark DePristo 4ff8225d78 GenotypeMap -> GenotypeCollection part 3
-- Test code actually builds
2011-11-14 17:51:41 -05:00
Mark DePristo f0234ab67f GenotypeMap -> GenotypeCollection part 2
-- Code actually builds
2011-11-14 17:42:55 -05:00
David Roazen ab0ee9b847 Perform only necessary validation in VariantContext modify methods 2011-11-14 16:49:59 -05:00
Mark DePristo 2e9d5363e7 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-11-14 15:32:06 -05:00
Mark DePristo 1fbdcb4f43 GenotypeMap -> GenotypeCollection 2011-11-14 15:32:03 -05:00
Guillermo del Angel 5c38a9cfd6 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-11-14 15:00:03 -05:00
Guillermo del Angel f1db31f072 Attempt to reduce memory footprint of ValidationSiteSelector (if this doesn't work then a radical rewrite of the walker to make it two-pass will be necessary): don't log any attributes of original VCF, if we need chr counts later we can reannotate from original inputs. As things stand, we can't select SNP's genomewide due to memory usage. 2011-11-14 14:56:09 -05:00
Eric Banks 4dc9dbe890 One quick fix to previous commit 2011-11-14 14:42:12 -05:00
Eric Banks b3313e1445 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-11-14 14:31:38 -05:00
Eric Banks 7b2a7cfbe7 Transfer headers from the resource VCF when possible when using expressions. While there, VA was modified so that it didn't assume that the ID field was present in the VC's info map in preparation for Mark's upcoming changes. 2011-11-14 14:31:27 -05:00
Mark DePristo 9b5c79b49d Renamed InferredGeneticContext to CommonInfo
-- I have no idea why I named this InferredGeneticContext, a totally meaningless term
-- Renamed to CommonInfo.
-- Made package protected, as no one should use this outside of VariantContext and Genotype
-- UGEngine was using IGC constant, but it's now using the public one in VariantContext.
2011-11-14 14:28:52 -05:00
Mark DePristo 077397cb4b Deleted MutableVariantContext
-- All methods that used this capable now use VariantContext directly instead
2011-11-14 14:19:06 -05:00
Mark DePristo b11c535527 Deleted MutableGenotype
-- This class wasn't really used anywhere, and so removed to control code bloat.
2011-11-14 13:16:36 -05:00
Guillermo del Angel 509ecc62cc Another bug fix for when no samples are specified in ValidationSiteSelectionWalker 2011-11-14 13:02:51 -05:00
Mark DePristo 79987d685c GenotypeMap contains a Map, not extends it
-- On path to replacing it with GenotypeCollection
2011-11-14 12:55:03 -05:00
Eric Banks 7aee80cd3b Fix to deal with reduced reads containing a deletion 2011-11-14 12:23:46 -05:00
Eric Banks 3d2970453b Misc minor cleanup 2011-11-14 09:41:54 -05:00
Eric Banks b7c33116af Minor docs update 2011-11-12 23:21:07 -05:00
Eric Banks 76d357be40 Updating docs example to use -L since that's best practice 2011-11-12 23:20:05 -05:00
Guillermo del Angel af8e39c04d Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-11-12 08:42:24 -05:00
Guillermo del Angel c95f015d77 a) Bug fix in validation site selector, b) Initial qscript for selection of random snps and indels for validation experiment 2011-11-12 08:41:53 -05:00
Mauricio Carneiro 8cd077f009 Writing a GATKReport table as output.
just to standardize the output.
2011-11-11 18:52:58 -05:00
Mark DePristo fee9b367e4 VariantContext genotypes are now stored as GenotypeMap objects
-- Enables further sophisticated optimizations, as this class can be smarter about storing the data and will directly support operations like subset to samples
-- All instances in the gatk that used Map<String, Genotype> now use GenotypeMap type.
-- Amazingly, there were many places where HashMap<String, Genotype> is used, so that the order of the genotypes is technically undefined and could be dangerous.  Now everything uses GenotypeMap with a specific ordering of samples (by name)
-- Integrationtests updated and all pass
2011-11-11 15:00:35 -05:00
Guillermo del Angel cd3146f4cf Add hidden option to ValidationAmplicons to output slightly modified format to make file work with downstream SQNM tools more seamlessly at request of GAP: one line per record, keep probe identifier to 20 characters, no * in ref allele. 2011-11-11 14:07:07 -05:00
Ryan Poplin 40fbeafa37 VQSR will now detect if the negative model failed to converge properly because of having too few data points and automatically retry with more appropriate clustering parameters. 2011-11-11 11:52:30 -05:00
Mark DePristo 4938569b3a More general handling of parameters for VariantContextBenchmark 2011-11-11 10:22:19 -05:00
Mark DePristo ef9f8b5d46 Added subContextOfSamples to VariantContext
-- This is a more convenient accesssor than subContextOfGenotypes, represents nearly all of the use cases of the former function, and potentially can be implemented more efficiently.
2011-11-11 10:07:11 -05:00