gatk-3.8

Commit Graph

Author	SHA1	Message	Date
Eric Banks	1e90d602a4	Optimization: cache up front the PL index to the pair of alleles it represents for all possible numbers of alternate alleles.	2011-12-14 13:38:20 -05:00
Mauricio Carneiro	5cc1e72fdb	Parallelized SelectVariants * can now use -nt with SelectVariants for significant speedup in large files * added parallelization integration tests for SelectVariants	2011-12-12 18:41:14 -05:00
Laurent Francioli	7cf27bb66e	Updated md5sum for MendelianViolationEvaluator test to reflect the change in column alignment in VariantEval.	2011-12-12 12:22:43 +01:00
Laurent Francioli	025bdfe2cc	Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-12-12 12:19:44 +01:00
Eric Banks	7b6338c742	Merge branch 'master' into trialleles	2011-12-11 00:28:46 -05:00
Eric Banks	7c4b9338ad	The old bi-allelic implementation of the Exact model has been completely deprecated - you can only use the multi-allelic implementation now.	2011-12-11 00:23:33 -05:00
Eric Banks	044f211a30	Don't collapse likelihoods over all alt alleles - that's just not right. For now, the QUAL is calculated for just the most likely of the alt alleles; I need to think about the right way to handle this properly.	2011-12-10 23:57:14 -05:00
Mauricio Carneiro	8475328b2c	Turning off test that breaks read clipper until we define what is the desired behavior for clipping this particular case.	2011-12-09 11:53:12 -05:00
Roger Zurawicki	4cbd1f0dec	Reorganized the testing code and created ClipReadsTestUtils Tests are more rigorous and includes many more test cases. We can tests custom cigars and the generated cigars. *Still needs debugging because code is not working. Created test classes to be used across several tests. Some cases are still commented out. Signed-off-by: Mauricio Carneiro <carneiro@broadinstitute.org>	2011-12-09 11:52:34 -05:00
Roger Zurawicki	0e9c2cefa2	testHardClipSoftClippedBases works with Matches and Deletions Insertions are a problem so cigar cases with "I" are commented out. The test works with multiple deletions and matches. This is still not a complete test. A lot of cigar test cases are commented out. Added insertions to ReadClipperUnitTest ReadClipper now tests for all indels. Signed-off-by: Mauricio Carneiro <carneiro@broadinstitute.org>	2011-12-09 11:43:37 -05:00
Eric Banks	442ceb6ad9	The Exact model now computes both the likelihoods and posteriors (in separate arrays); likelihoods are used for assigning genotypes, not the posteriors.	2011-12-09 10:16:44 -05:00
Laurent Francioli	a79144f7db	Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-12-09 15:57:24 +01:00
Laurent Francioli	72fbfba97d	Added UnitTests for getFamilies() and getChildrenWithParents()	2011-12-09 15:57:07 +01:00
Eric Banks	aa4a8c5303	No dynamic programming solution for assignning genotypes; just done greedily now. Fixed QualByDepth to skip no-call genotypes. No-calls are no longer given annotations (attributes).	2011-12-09 02:25:06 -05:00
Eric Banks	2fe50c64da	Updating md5s	2011-12-09 00:47:01 -05:00
Eric Banks	4aebe99445	Need to use longs for the set index (because we can run out of ints when there are too many alternate alleles). Integration tests now use the multiallelic implementation.	2011-12-08 15:31:02 -05:00
Mark DePristo	4055877708	Prints 0.0 TiTv not NaN when there are no variants -- Updated md5	2011-12-07 12:07:54 -05:00
Mark DePristo	5d2212bc8e	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-12-07 09:03:17 -05:00
Eric Banks	79d18dc078	Fixing indexing bug on the ACsets. Added unit tests for the Exact model code.	2011-12-06 16:17:18 -05:00
Matt Hanna	f5b977fc88	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-12-06 10:11:35 -05:00
Matt Hanna	4001c22a11	Better file count / buffering variation in test suite. Parameterized read shard buffering. Misc cleanup.	2011-12-06 10:10:38 -05:00
Khalid Shakir	677bea0abd	Right aligning GATKReport numeric columns and updated MD5s in tests. PreQC parses file with spaces in sample names by using tabs only. PostQC allows passing the file names for the evals so that flanks can be evaled. BaseTest's network temp dir now adds the user name to the path so files aren't created in the root. HybridSelectionPipeline: - Updated to latest versions of reference data. - Refactored Picard parsing code replacing YAML.	2011-12-05 23:22:15 -05:00
Eric Banks	29662be3d7	Fixed bug where k=2N case wasn't properly being computed. Added optimization for BB genotype case not in old model. At this point, integration tests pass except for 1 case where QUALs differ by 0.01 (this is okay because I occasionally need to compute extra cells in the matrix which affects the approximations) and 2 cases where multi-allelic indels are being genotyped (some work still needs to be done to support them).	2011-12-03 23:12:04 -05:00
Mark DePristo	3060a4a15e	Support for list of known CNVs in VariantEval -- VariantSummary now includes novelty of CNVs by reciprocal overlap detection using the standard variant eval -knownCNVs argument -- Genericizes loading for intervals into interval tree by chromosome -- GenomeLoc methods for reciprocal overlap detection, with unit tests	2011-11-30 17:05:16 -05:00
Laurent Francioli	9574be0394	Updated MendelianViolationEvaluator integration test	2011-11-30 14:44:15 +01:00
Laurent Francioli	a4606f9cfe	Merge branch 'MendelianViolation' Conflicts: public/java/src/org/broadinstitute/sting/utils/MendelianViolation.java	2011-11-30 11:13:15 +01:00
Laurent Francioli	7d58db626e	Added MendelianViolationEvaluator integration test	2011-11-30 10:09:20 +01:00
Ryan Poplin	110298322c	Adding Transmission Disequilibrium Test annotation to VariantAnnotator and integration test to test it.	2011-11-29 09:29:18 -05:00
Mark DePristo	e60272975a	Fix for changed MD5 in streaming VCF test	2011-11-23 19:01:33 -05:00
Mark DePristo	12f09d88f9	Removing references to SimpleMetricsByAC	2011-11-23 16:08:18 -05:00
Mark DePristo	4107636144	VariantEval updates -- Performance optimizations -- Tables now are cleanly formatted (floats are %.2f printed) -- VariantSummary is a standard report now -- Removed CompEvalGenotypes (it didn't do anything) -- Deleted unused classes in GenotypeConcordance -- Updates integration tests as appropriate	2011-11-23 13:02:07 -05:00
Mark DePristo	e484625594	GenotypesContext now updates cached data for add, set, replace operations when possible -- Involved separately managing the sample -> offset and sample sorted list operations. This should improve performance throughout the system	2011-11-22 08:40:48 -05:00
Mark DePristo	29ca24694a	UG now encoding NO_CALLs as ./. not ./.:.:4:0,0,0 A few updated UGs integration tests	2011-11-22 08:22:32 -05:00
Mark DePristo	2b51c01df4	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-11-21 19:16:06 -05:00
Mark DePristo	5443d3634a	Again, fixing the add call when we really mean replace -- Updating MD5s for UG to reflect that what was previously called ./.:.:10:0,0,0 is now just ./. Eric will fix long-standing bug in QD observed from this change -- VFW MD5s restored to their old correct values. There was a bug in my implementation to caused the genotypes to not be parsed from the lazy output even through the header was incorrect.	2011-11-21 19:15:56 -05:00
Mauricio Carneiro	5ad3dfcd62	BugFix: byte overflow in SyntheticRead compressed base counts * fixed and added unit test	2011-11-21 17:11:50 -05:00
Mark DePristo	2c501364b8	GenotypesContext no longer have immutability in constructor -- additional bug fixes throughout VariantContext and GenotypesContext objects	2011-11-21 14:34:31 -05:00
David Roazen	1296dd41be	Removing the legacy -L "interval1;interval2" syntax This syntax predates the ability to have multiple -L arguments, is inconsistent with the syntax of all other GATK arguments, requires quoting to avoid interpretation by the shell, and was causing problems in Queue. A UserException is now thrown if someone tries to use this syntax.	2011-11-21 13:18:53 -05:00
Mark DePristo	2e9ecf639e	Generalized interface to LazyGenotypesContext -- Now you provide a LazyParsing object -- LazyGenotypesContext now knows nothing about the VCF parser itself. The parser holds all of the necessary data to parse the VCF genotypes when necessarily, and the LGC only has a pointer to this object -- Using new interface added LazyGenotypesContext to unit tests with a simple lazy version -- Deleted VCFParser interface, as it was no longer necessary	2011-11-21 09:30:40 -05:00
Mark DePristo	f0ac588d32	Extensive unit test for GenotypeContextUnitTest -- Currently only tests base class. Adding subclass testing in a bit	2011-11-20 18:28:01 -05:00
Mark DePristo	9cb3fe3a59	Vastly better way of doing on-demand genotyping loading -- With our GenotypesContext class we can naturally create a LazyGenotypesContext subclass that does the on-demand loading. -- This new class was replaced all of the old, complex functionality -- Better still, there were many cases were the genotypes were being loaded unnecessarily, resulting in efficiency. This was detected because some of the integration tests changed as the genotypes were no longer being parsing unnecessarily -- Misc. bug fixes throughout the system -- Bug fixes for PhaseByTransmission with new GenotypesContext	2011-11-20 08:23:09 -05:00
Mark DePristo	7d09c0064b	Bug fixes and code cleanup throughout -- chromosomeCounts now takes builder as well, cleaning up a lot of code throughout the codebase.	2011-11-19 18:40:15 -05:00
Mark DePristo	707bd30b3f	Should have been @BeforeMethod	2011-11-19 16:10:09 -05:00
Mark DePristo	8f7eebbaaf	Bugfix for pError not being checked correctly in CommonInfo -- UnitTests to ensure correct behavior -- UnitTests to ensure correct behavior for pass filters vs. failed filters vs. unfiltered	2011-11-19 15:58:59 -05:00
Mark DePristo	b7b57ef39a	Updating MD5 to reflect canonical ordering of calculation -- We should no longer have md5s changing because of hashmaps changing their sort order on us -- Added GenotypeLikelihoodsUnitTests -- Refactored ExactAFCaclculation to put the PL -> QUAL calculation in the GenotypeLikelihoods class to avoid the code copy.	2011-11-19 15:57:33 -05:00
Mark DePristo	73119c8e3c	Merge with master -- A few bug fixes	2011-11-19 09:56:06 -05:00
Mark DePristo	f685fff79b	Killing the final versions of old new VariantContext interface	2011-11-18 21:32:43 -05:00
Mark DePristo	6cf315e17b	Change interface to getNegLog10PError to getLog10PError	2011-11-18 21:07:30 -05:00
Matt Hanna	8bb4d4dca3	First pass of the asynchronous block loader. Block loads are only triggered on queue empty at this point. Disabled by default (enable with nt:io=?).	2011-11-18 15:02:59 -05:00
Mark DePristo	f54afc19b4	VariantContextBuilder -- New approach to making VariantContexts modeled on StringBuilder -- No more modify routines -- use VariantContextBuilder -- Renamed isPolymorphic to isPolymorphicInSamples. Same for mono -- getChromosomeCount -> getCalledChrCount -- Walkers changed to use new VariantContext. Some deprecated new VariantContext calls remain -- VCFCodec now uses optimized cached information to create GenotypesContext.	2011-11-18 12:39:10 -05:00

1 2 3 4 5 ...

472 Commits (1e90d602a4c1628a0dde9764466f2c41c8aa8f9b)