Commit Graph

169 Commits (28b286ad3967b966d4a1f3e95f36cbea482528e7)

Author SHA1 Message Date
Mark DePristo e484625594 GenotypesContext now updates cached data for add, set, replace operations when possible
-- Involved separately managing the sample -> offset and sample sorted list operations.  This should improve performance throughout the system
2011-11-22 08:40:48 -05:00
Mark DePristo 2b51c01df4 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-11-21 19:16:06 -05:00
Mauricio Carneiro 5ad3dfcd62 BugFix: byte overflow in SyntheticRead compressed base counts
* fixed and added unit test
2011-11-21 17:11:50 -05:00
Mark DePristo 2c501364b8 GenotypesContext no longer have immutability in constructor
-- additional bug fixes throughout VariantContext and GenotypesContext objects
2011-11-21 14:34:31 -05:00
David Roazen 1296dd41be Removing the legacy -L "interval1;interval2" syntax
This syntax predates the ability to have multiple -L arguments, is
inconsistent with the syntax of all other GATK arguments, requires
quoting to avoid interpretation by the shell, and was causing
problems in Queue.

A UserException is now thrown if someone tries to use this syntax.
2011-11-21 13:18:53 -05:00
Mark DePristo 2e9ecf639e Generalized interface to LazyGenotypesContext
-- Now you provide a LazyParsing object
-- LazyGenotypesContext now knows nothing about the VCF parser itself.  The parser holds all of the necessary data to parse the VCF genotypes when necessarily, and the LGC only has a pointer to this object
-- Using new interface added LazyGenotypesContext to unit tests with a simple lazy version
-- Deleted VCFParser interface, as it was no longer necessary
2011-11-21 09:30:40 -05:00
Mark DePristo f0ac588d32 Extensive unit test for GenotypeContextUnitTest
-- Currently only tests base class.  Adding subclass testing in a bit
2011-11-20 18:28:01 -05:00
Mark DePristo 707bd30b3f Should have been @BeforeMethod 2011-11-19 16:10:09 -05:00
Mark DePristo 8f7eebbaaf Bugfix for pError not being checked correctly in CommonInfo
-- UnitTests to ensure correct behavior
-- UnitTests to ensure correct behavior for pass filters vs. failed filters vs. unfiltered
2011-11-19 15:58:59 -05:00
Mark DePristo b7b57ef39a Updating MD5 to reflect canonical ordering of calculation
-- We should no longer have md5s changing because of hashmaps changing their sort order on us
-- Added GenotypeLikelihoodsUnitTests
-- Refactored ExactAFCaclculation to put the PL -> QUAL calculation in the GenotypeLikelihoods class to avoid the code copy.
2011-11-19 15:57:33 -05:00
Mark DePristo 73119c8e3c Merge with master
-- A few bug fixes
2011-11-19 09:56:06 -05:00
Mark DePristo f685fff79b Killing the final versions of old new VariantContext interface 2011-11-18 21:32:43 -05:00
Mark DePristo 6cf315e17b Change interface to getNegLog10PError to getLog10PError 2011-11-18 21:07:30 -05:00
Mark DePristo f54afc19b4 VariantContextBuilder
-- New approach to making VariantContexts modeled on StringBuilder
-- No more modify routines -- use VariantContextBuilder
-- Renamed isPolymorphic to isPolymorphicInSamples.   Same for mono
-- getChromosomeCount -> getCalledChrCount
-- Walkers changed to use new VariantContext.  Some deprecated new VariantContext calls remain
-- VCFCodec now uses optimized cached information to create GenotypesContext.
2011-11-18 12:39:10 -05:00
Mark DePristo 7490dbb6eb First version of VariantContextBuilder 2011-11-18 11:06:15 -05:00
Mark DePristo fa454c88bb UnitTests for VariantContext for chrCount, getSampleNames, Order function
-- Major change to how chromosomeCounts is computed.  Now NO_CALL alleles are always excluded.  So ChromosomeCounts(A/.) is 1, the previous result would have been 2.
-- Naming changes for getSamplesNameInOrder()
2011-11-17 20:37:22 -05:00
Mark DePristo 02f22cc9f8 No more VC integration tests. All tests are now unit tests 2011-11-17 15:33:09 -05:00
Khalid Shakir c50274e02e During flanking interval creation merging overlapping flanks so that on scatter the list doesn't accidentally genotype the same site twice.
Moved flanking interval utilies to IntervalUtils with UnitTests.
2011-11-17 13:56:42 -05:00
Mark DePristo 7e66677769 Expanded UnitTests for VariantContext
Tests for
-- getGenotype and getGenotypes
-- subContextBySample
-- modify routines
2011-11-16 20:45:15 -05:00
Mauricio Carneiro 72f00e2883 Merging Roger's Unit tests for Reduce Reads from RR repository 2011-11-16 17:26:49 -05:00
Mark DePristo aa0610ea92 GenotypeCollection renamed to GenotypesContext 2011-11-16 16:24:05 -05:00
Mark DePristo 974daaca4d V13 version in archive. Can you pulled out wholesale for performance testing 2011-11-16 16:08:46 -05:00
Mark DePristo 101ffc4dfd Expanded, contrastive VariantContextBenchmark
-- Compares performance across a bunch of common operations with GATK 1.3 version of VariantContext and GATK 1.4
-- 1.3 VC and associated utilities copied wholesale into test directory under v13
2011-11-16 13:35:16 -05:00
Laurent Francioli fb685f88ec Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-11-15 16:23:53 -05:00
Mark DePristo 460a51f473 ID field now stored in the VariantContext itself, not the attributes 2011-11-15 14:56:33 -05:00
Eric Banks 7fada320a9 The right fix for this test is just to delete it. 2011-11-15 14:53:27 -05:00
Mark DePristo 233e581828 Merging in Master 2011-11-15 09:28:24 -05:00
Mark DePristo 6e1a86bc3e Bug fixes to VariantContext and GenotypeCollection 2011-11-15 09:21:30 -05:00
Roger Zurawicki 284430d61d Added more basic UnitTests for ReadClipper
hardClipByReadCoordinatesWorks
hardClipLowQualTailsWorks
2011-11-15 00:13:52 -05:00
Roger Zurawicki 8e91e19229 Merge branch 'master' of ssh://nickel/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-11-15 00:13:37 -05:00
Mauricio Carneiro cde829899d compress Reduce Read counts bytes by offset
compressed the representation of the reduce reads counts by offset results in 17% average compression in final BAM file size.

Example compression -->

from : 10, 10, 11, 11, 12, 12, 12, 11, 10
to:      10, 0, 1, 1,2, 2, 2, 1, 0
2011-11-14 18:30:24 -05:00
Mark DePristo 4ff8225d78 GenotypeMap -> GenotypeCollection part 3
-- Test code actually builds
2011-11-14 17:51:41 -05:00
Mark DePristo f0234ab67f GenotypeMap -> GenotypeCollection part 2
-- Code actually builds
2011-11-14 17:42:55 -05:00
Mark DePristo 1fbdcb4f43 GenotypeMap -> GenotypeCollection 2011-11-14 15:32:03 -05:00
Mark DePristo 9b5c79b49d Renamed InferredGeneticContext to CommonInfo
-- I have no idea why I named this InferredGeneticContext, a totally meaningless term
-- Renamed to CommonInfo.
-- Made package protected, as no one should use this outside of VariantContext and Genotype
-- UGEngine was using IGC constant, but it's now using the public one in VariantContext.
2011-11-14 14:28:52 -05:00
Mark DePristo 077397cb4b Deleted MutableVariantContext
-- All methods that used this capable now use VariantContext directly instead
2011-11-14 14:19:06 -05:00
Laurent Francioli 1347beef40 Merge branch 'PhaseByTransmission' 2011-11-14 11:31:28 +01:00
Laurent Francioli 34acf8b978 Added Unit tests for new methods in GenotypeLikelihoods 2011-11-14 10:47:02 +01:00
Roger Zurawicki 1202a809cb Added Basic Unit Tests for ReadClipper
Tests some but not all functions
Some tests have been disabled because they are not working
2011-11-13 22:27:49 -05:00
Mark DePristo fee9b367e4 VariantContext genotypes are now stored as GenotypeMap objects
-- Enables further sophisticated optimizations, as this class can be smarter about storing the data and will directly support operations like subset to samples
-- All instances in the gatk that used Map<String, Genotype> now use GenotypeMap type.
-- Amazingly, there were many places where HashMap<String, Genotype> is used, so that the order of the genotypes is technically undefined and could be dangerous.  Now everything uses GenotypeMap with a specific ordering of samples (by name)
-- Integrationtests updated and all pass
2011-11-11 15:00:35 -05:00
Mark DePristo 4938569b3a More general handling of parameters for VariantContextBenchmark 2011-11-11 10:22:19 -05:00
Mark DePristo e216e85465 First working version of VariantContextBenchmark 2011-11-11 09:56:00 -05:00
Mark DePristo ee40791776 Attributes are now Map<String,Object> not Map<String,?>
-- Allows us to avoid an unnecessary copy when creating InferredGeneticContext (whose name really needs to change).
2011-11-11 09:55:42 -05:00
Mauricio Carneiro d00b2c6599 Adding a synthetic read for filtered data
* Generalized the concept of a synthetic read to cread both running consensus and a synthetic reads of filtered data.
* Synthetic reads can now have deletions (but not insertions)
* New reduced read tag for filtered data synthetic reads *(RF)*
* Sliding window header now keeps information of consensus and filtered data
* Synthetic reads are created simultaneously, new functionality is controlled internally by addToSyntheticReads
2011-11-09 20:16:22 -05:00
Eric Banks 02d5e3025e Added integration test for intervals from bed file 2011-11-09 15:34:19 -05:00
Mauricio Carneiro e89ff063fc GATKSAMRecord refactor
The GATK engine will now provide a GATKSAMRecord to all tools which incorporates the functionality used by the GATK to the bam file (ReadGroups, Reduced Reads, ...).

* No tools should create SAMRecord anymore, use GATKSAMRecord instead *
2011-11-03 15:43:26 -04:00
Mark DePristo 8a2929c1dd Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-11-02 16:21:00 -04:00
Eric Banks 4501dce58d Fixing merge conflict 2011-11-02 12:50:32 -04:00
Eric Banks 54331b44e9 New way of looking at the size of a pileup: there's a physical number of elements in the data structure and there's a representative depth of coverage (since a reduced read represents depth >= 1). The size() method has been removed because its meaning is ambiguous. Updated several annotations and the UG engine to make use of the representative depths. 2011-11-02 12:47:30 -04:00
Mark DePristo 392e0aeace Moved unit tests into master IntervalUtilsUnitTest 2011-11-02 10:52:00 -04:00