gatk-3.8

Commit Graph

Author	SHA1	Message	Date
Eric Banks	ab8f499bc3	Annotate with FS even for filtered sites	2012-01-18 22:04:51 -05:00
Ryan Poplin	0268da7560	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-01-18 09:53:00 -05:00
Ryan Poplin	60024e0d7b	updating TDT integration test	2012-01-18 09:52:50 -05:00
Mark DePristo	0c7865fdb5	UnitTest for reverseAlleleClipping -- No code modified yet, just implementing a unit test to ensure correctness of the existing code	2012-01-18 07:35:11 -05:00
Mauricio Carneiro	cec7107762	Better location for the downsampling of reads in PrintReads * using the filter() instead of map() makes for a cleaner walker. * renaming the unit tests to make more sense with the other unit and integration tests	2012-01-14 14:06:09 -05:00
Mauricio Carneiro	28aa353501	Added "unbiased" downsampling parameter to PrintReads * also cleaned up and updated part of the unit tests for print reads. Needs a more thorough cleaning.	2012-01-12 16:33:55 -05:00
Matt Hanna	2c3176eb80	Merged bug fix from Stable into Unstable	2012-01-12 13:31:10 -05:00
Matt Hanna	cd43f016ce	Fixed NPE in getNextOverlappingBAMScheduleEntry() when mixed mapped/unmapped interval lists are used. Added integrationtest to verify behavior.	2012-01-12 13:29:11 -05:00
Mauricio Carneiro	77a03c9709	Patching special case in the adaptor clipping * if the adaptor boundary is more than MAXIMUM_ADAPTOR_SIZE bases away from the read, then let's not clip anything and consider the fragment to be undetermined for this read pair. * updated md5's accordingly	2012-01-11 17:47:44 -05:00
Eric Banks	c5320ef1af	Resolving changes in integration test during merge	2012-01-10 12:14:16 -05:00
Eric Banks	0f36f6947e	Resolving merge conflicts	2012-01-10 11:44:16 -05:00
Eric Banks	f2cecce10f	Much better implementation of the approximate summing of an array of log10 values (including more efficient rounding). Now effectively takes 0% of UG runtime on T2D GENES (as opposed to 11% previously).	2012-01-10 11:34:23 -05:00
Mark DePristo	dd80ffbbbe	Merged bug fix from Stable into Unstable	2012-01-05 21:51:48 -05:00
Mark DePristo	c96fee477c	Bug fix for VariantSummary -- Call sets with indels > 50 bp in length are tagged as CNVs in the tag (following the 1000 Genomes convention) and were unconditionally checking whether the CNV is already known, by looking at the known cnvs file, which is optional. Fixed. Has the annoying side effect that indels > 50bp in size are not counted as indels, and so are substrated from both the novel and known counts for indels. C'est la vie -- Added integration test to check for this case, using Mauricio's most recent VCF file for NA12878 which has many large indels. Using this more recent and representative file probably a good idea for more future tests in VE and other tools. File is NA12878.HiSeq.WGS.b37_decoy.indel.recalibrated.vcf in Validation_Data	2012-01-05 21:51:06 -05:00
Guillermo del Angel	58d4539304	Enabled banded indel computation by default. Reversed logic in input UG argument so that we can still disable it if required. Minor changes to integration tests due to minor differences in GL's and in annotations	2012-01-04 15:28:26 -05:00
David Roazen	621ee2b613	Merged bug fix from Stable into Unstable	2012-01-03 16:56:49 -05:00
David Roazen	ea6e718cb8	SnpEff 2.0.5 support. Re-enabled SnpEff in the HybridSelectionPipeline. For now, we recommend only running with the GRCh37.64 database.	2012-01-03 15:18:36 -05:00
David Roazen	4984ca5e31	Merged bug fix from Stable into Unstable	2012-01-03 11:03:30 -05:00
David Roazen	f3f01da1af	Enforce serial dependencies in RecalibrationWalkersIntegrationTest Some tests in this class were intermittently not being executed due to being randomly scheduled before tests whose results they depend on. Now the serial dependencies are enforced to avoid problematic orderings.	2012-01-03 10:42:41 -05:00
Eric Banks	ab8d47d9a5	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-01-03 09:38:49 -05:00
Mauricio Carneiro	cd68cc239b	Added knuth-shuffle (KS) and randomSubset using KS to MathUtils * Knuth-shuffle is a simple, yet effective array permutator (hope this is good english). * added a simple randomSubset that returns a random subset without repeats of any given array with the same probability for every permutation. * added unit tests to both functions	2012-01-03 09:29:46 -05:00
Mauricio Carneiro	94791a2a75	Add support for reads starting with insertion * Modified cleanCigarShift to allow insertions in the beginning and end of the read * Allowed cigars starting/ending in insertions in the systematic ReadClipper tests * Updated all ReadClipper unit tests * ReduceReads does not hard clip leading insertions by default anymore * SlidingWindow adjusts start location if read starts with insertion * SlidingWindow creates an empty element with insertions to the right * Fixed all potential divide by zero with totalCount() (from BaseCounts) * Updated all Integration tests * Added new integration test for multiple interval reducing	2012-01-03 09:29:45 -05:00
Mauricio Carneiro	1b6d52817e	fixing adaptor clipping effect on recalibration integration test	2012-01-01 22:20:06 -05:00
Eric Banks	393993e0c7	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-12-31 20:42:46 -05:00
Mauricio Carneiro	55cfa76cf3	Updated integration tests for the new adaptor clipping fix.	2011-12-30 18:47:14 -05:00
Mauricio Carneiro	c7d0a9ebee	Forgot to test for inter-chromosomal mates in the adaptor clipping * Fixing bug caught by Eric (and Kristian)	2011-12-30 00:19:53 -05:00
Eric Banks	1a45ea5a05	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-12-29 11:37:15 -05:00
Eric Banks	d20a25d681	A much better way of choosing the alternate allele(s) to genotype in the SNP model of UG: instead of looking at the sum of base qualities (which can and did lead to us over-genotyping esp. when allowing multiple alternate alleles), we look at the likelihoods themselves (free since we are already calculating likelihoods for all 10 genotypes). Now, even if the base quals exceed some arbitrary threshold, we only bother genotyping an alternate allele when there's a sample for which it is more likely than ref/ref (I can generate weird edge cases where this falls apart, but none that model truly variable sites that we actually want to call). This leads to a huge efficiency improvement esp. for exomes (and esp. for many samples) where we almost always were trying to genotype all 3 alternate alleles. Integration tests change only because ref calls have slight QUAL differences (because the best alt allele is still chosen arbitrarily, but differently).	2011-12-27 16:50:38 -05:00
Mauricio Carneiro	17bfe48d5e	Made all class methods private in the ReadClipper * ReadClipperUnitTest now uses static methods * Haplotype caller now uses static methods * Exon Junction Genotyper now uses static methods	2011-12-27 02:11:32 -05:00
David Roazen	506c0e9c97	Disabling SnpEff support in the GATK and SnpEff annotation in the HybridSelectionPipeline SnpEff support will remain disabled until SnpEff 2.0.4 has been officially released and we've verified the quality of its annotations.	2011-12-23 19:12:57 -05:00
David Roazen	510c71158c	Merged bug fix from Stable into Unstable	2011-12-22 10:49:52 -05:00
David Roazen	32cdef9682	Rename PerformanceTest test classes to LargeScaleTest This is in preparation for the installation of the new performance test suite in Bamboo. Note that "ant performancetest" is now "ant largescaletest"	2011-12-22 10:38:49 -05:00
Mauricio Carneiro	731a463415	Updated IntegrationTests with new adaptor clipper phew!	2011-12-20 17:48:52 -05:00
Mauricio Carneiro	cadff40247	getRefCoordSoftUnclippedStart and End refactor These functions are methods of the read, and supplement getAlignmentStart() and getUnclippedStart() by calculating the unclipped start counting only soft clips. * Removed from ReadUtils * Added to GATKSAMRecord * Changed name to getSoftStart() and getSoftEnd * Updated third party code accordingly.	2011-12-20 17:48:51 -05:00
Mauricio Carneiro	f73ad1c2e2	Bugfix/Rewrite: Algorithm to determine adaptor boundaries The algorithm wasn't accounting for the case where the read is the reverse strand and the insert size is negative. * Fixed and rewrote for more clarity (with Ryan, Mark and Eric). * Restructured the code to handle GATKSAMRecords only * Cleaned up the other structures and functions around it to minimize clutter and potential for error. * Added unit tests for all 4 cases of adaptor boundaries.	2011-12-20 17:48:39 -05:00
Mauricio Carneiro	78d9bf7196	Added REVERT_SOFTCLIPPED_BASES capability to ReadClipper * New ClippingOp REVERT_SOFTCLIPPED_BASES turns soft clipped bases into matches. * Added functionality to clipping op to revert all soft clip bases in a read into matches * Added revertSoftClipBases function to the ReadClipper for public use * Wrote systematic unit tests	2011-12-20 00:04:30 -05:00
Laurent Francioli	16cc2b864e	- Corrected bug causing cases where both parents are HET to be accounted twice in the TDT calculation - Adapted TDT Integration test to corrected version of TDT Signed-off-by: Ryan Poplin <rpoplin@broadinstitute.org>	2011-12-19 10:30:59 -05:00
Eric Banks	3069a689fe	Bug fix: if there are multiple records at a given position, it turns out that SelectVariants would drop all variants that follow after one that fails filters (instead of dropping just the failing one). Added an integration test to cover this case.	2011-12-19 10:04:33 -05:00
Mauricio Carneiro	5b678e3b94	Remove ClippingOp UnitTests * all testing functionality is in the ReadClipperUnitTest, no need to double test. * class and package naming cleanup	2011-12-19 07:49:26 -05:00
Eric Banks	76bd13a1ed	Forgot to update the unit test	2011-12-18 01:13:49 -05:00
Eric Banks	07f9d14d9f	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-12-18 00:43:15 -05:00
Eric Banks	c5ffe0ab04	No reason to sum the normalized posteriors array to get Pr(AF>0) given that we can just compute 1.0 - array[0]. Integration tests change only because of trivial precision artifacts for reference calls using EMIT_ALL_SITES.	2011-12-18 00:31:47 -05:00
Eric Banks	6dc52d42bf	Implemented the proper QUAL calculation for multi-allelic calls. Integration tests pass except for the ones making multi-allelic calls (duh) and one of the SLOD tests (which used to print 0 when one of the LODs was NaN but now we just don't print the SB annotation for that record).	2011-12-18 00:01:42 -05:00
Khalid Shakir	6059ca76e8	Removing cruft that snuck in last commit.	2011-12-16 23:00:16 -05:00
Khalid Shakir	7486696c07	When using bam list mode in HSP deriving VCF name from bam list instead of requiring an additional parameter. Creating a single temporary directory per ant test run instead of a putting temp files across all runs in the same directory. Updated various tests for above items and other small fixes.	2011-12-16 18:09:25 -05:00
Mauricio Carneiro	e5df9e0684	cleaner test output cleaned up the debug "pass" messages in the unit tests	2011-12-16 18:04:00 -05:00
Mauricio Carneiro	fcc21180e8	Added hardClipLeadingInsertions UnitTest for the ReadClipper fixed issue where a read starting with an insertion followed by a deletion would break, clipper can now safely clip the insertion and the deletion if that's the case. note: test is turned off until contract changes to allow hanging insertions (left/right).	2011-12-16 18:02:47 -05:00
Mauricio Carneiro	075be52adc	Added hardClipByReferenceCoordinates (left and right tails) UnitTest for the ReadClipper	2011-12-16 18:01:33 -05:00
Mauricio Carneiro	5bba44d693	Added hardClipByReferenceCoordinates UnitTest for the ReadClipper * fixed edge case when requested to hard clip beginning of a read that had hanging soft clipped bases on the left tail. * fixed edge case when requested to hard clip end of a read that had hanging soft clipped bases on the right tail. * fixed AlignmentStart of a clipped read that results in only hard clips and soft clips note: added tests to all these beautiful cases...	2011-12-16 18:01:33 -05:00
Mauricio Carneiro	5838ba529d	Added hardClipByReadCoordinates UnitTest for the ReadClipper	2011-12-16 18:01:33 -05:00
Mauricio Carneiro	c26295919e	Added hardClipBothEndsByReferenceCoordinates UnitTest for the ReadClipper	2011-12-16 18:01:33 -05:00
Mauricio Carneiro	e61e5c7589	Refactor of ReadClipper unit tests * expanded the systematic cigar string space test framework Roger wrote to all tests * moved utility functions into Utils and ReadUtils * cleaned up unused classes	2011-12-15 19:05:43 -05:00
Mauricio Carneiro	4748ae0a14	Bugfix: Softclips before Hardclips weren't being accounted for caught a bug in the hard clipper where it does not account for hard clipping softclipped bases in the resulting cigar string, if there is already a hard clipped base immediately after it. * updated unit test for hardClipSoftClippedBases with corresponding test-case.	2011-12-15 12:17:25 -05:00
Mauricio Carneiro	50dee86d7f	Added unit test to catch Ryan's exception Unit test to catch the special case that broke the clipping op, fixed in the previous commit.	2011-12-14 16:58:14 -05:00
Mauricio Carneiro	c85100ce9c	Fix ClippingOp bug when performing multiple hardclip ops bug: When performing multiple hard clip operations in a read that has indels, if the N+1 hardclip requests to clip inside an indel that has been removed by one of the (1..N) previous hardclips, the hard clipper would go out of bounds. fix: dynamically adjust the boundaries according to the new hardclipped read length. (this maintains the current contract that hardclipping will never return a read starting or ending in indels).	2011-12-14 16:57:47 -05:00
Eric Banks	de5928ac5a	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-12-14 16:24:56 -05:00
Eric Banks	4fddac9f22	Updating busted integration tests	2011-12-14 16:24:43 -05:00
Mark DePristo	71b4bb12b7	Bug fix for incorrect logic in subsetSamples -- Now properly handles the case where a sample isn't present (no longer adds a null to the genotypes list) -- Fix for logic failure where if the number of requested samples equals the number of known genotypes then all of the records were returned, which isn't correct when there are missing samples. -- Unit tests added to handle these cases	2011-12-14 16:14:26 -05:00
Eric Banks	1e90d602a4	Optimization: cache up front the PL index to the pair of alleles it represents for all possible numbers of alternate alleles.	2011-12-14 13:38:20 -05:00
Mauricio Carneiro	5cc1e72fdb	Parallelized SelectVariants * can now use -nt with SelectVariants for significant speedup in large files * added parallelization integration tests for SelectVariants	2011-12-12 18:41:14 -05:00
Laurent Francioli	7cf27bb66e	Updated md5sum for MendelianViolationEvaluator test to reflect the change in column alignment in VariantEval.	2011-12-12 12:22:43 +01:00
Laurent Francioli	025bdfe2cc	Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-12-12 12:19:44 +01:00
Eric Banks	7b6338c742	Merge branch 'master' into trialleles	2011-12-11 00:28:46 -05:00
Eric Banks	7c4b9338ad	The old bi-allelic implementation of the Exact model has been completely deprecated - you can only use the multi-allelic implementation now.	2011-12-11 00:23:33 -05:00
Eric Banks	044f211a30	Don't collapse likelihoods over all alt alleles - that's just not right. For now, the QUAL is calculated for just the most likely of the alt alleles; I need to think about the right way to handle this properly.	2011-12-10 23:57:14 -05:00
Mauricio Carneiro	8475328b2c	Turning off test that breaks read clipper until we define what is the desired behavior for clipping this particular case.	2011-12-09 11:53:12 -05:00
Roger Zurawicki	4cbd1f0dec	Reorganized the testing code and created ClipReadsTestUtils Tests are more rigorous and includes many more test cases. We can tests custom cigars and the generated cigars. *Still needs debugging because code is not working. Created test classes to be used across several tests. Some cases are still commented out. Signed-off-by: Mauricio Carneiro <carneiro@broadinstitute.org>	2011-12-09 11:52:34 -05:00
Roger Zurawicki	0e9c2cefa2	testHardClipSoftClippedBases works with Matches and Deletions Insertions are a problem so cigar cases with "I" are commented out. The test works with multiple deletions and matches. This is still not a complete test. A lot of cigar test cases are commented out. Added insertions to ReadClipperUnitTest ReadClipper now tests for all indels. Signed-off-by: Mauricio Carneiro <carneiro@broadinstitute.org>	2011-12-09 11:43:37 -05:00
Eric Banks	442ceb6ad9	The Exact model now computes both the likelihoods and posteriors (in separate arrays); likelihoods are used for assigning genotypes, not the posteriors.	2011-12-09 10:16:44 -05:00
Laurent Francioli	a79144f7db	Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-12-09 15:57:24 +01:00
Laurent Francioli	72fbfba97d	Added UnitTests for getFamilies() and getChildrenWithParents()	2011-12-09 15:57:07 +01:00
Eric Banks	aa4a8c5303	No dynamic programming solution for assignning genotypes; just done greedily now. Fixed QualByDepth to skip no-call genotypes. No-calls are no longer given annotations (attributes).	2011-12-09 02:25:06 -05:00
Eric Banks	2fe50c64da	Updating md5s	2011-12-09 00:47:01 -05:00
Eric Banks	4aebe99445	Need to use longs for the set index (because we can run out of ints when there are too many alternate alleles). Integration tests now use the multiallelic implementation.	2011-12-08 15:31:02 -05:00
Mark DePristo	4055877708	Prints 0.0 TiTv not NaN when there are no variants -- Updated md5	2011-12-07 12:07:54 -05:00
Mark DePristo	5d2212bc8e	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-12-07 09:03:17 -05:00
Eric Banks	79d18dc078	Fixing indexing bug on the ACsets. Added unit tests for the Exact model code.	2011-12-06 16:17:18 -05:00
Matt Hanna	f5b977fc88	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-12-06 10:11:35 -05:00
Matt Hanna	4001c22a11	Better file count / buffering variation in test suite. Parameterized read shard buffering. Misc cleanup.	2011-12-06 10:10:38 -05:00
Khalid Shakir	677bea0abd	Right aligning GATKReport numeric columns and updated MD5s in tests. PreQC parses file with spaces in sample names by using tabs only. PostQC allows passing the file names for the evals so that flanks can be evaled. BaseTest's network temp dir now adds the user name to the path so files aren't created in the root. HybridSelectionPipeline: - Updated to latest versions of reference data. - Refactored Picard parsing code replacing YAML.	2011-12-05 23:22:15 -05:00
Eric Banks	29662be3d7	Fixed bug where k=2N case wasn't properly being computed. Added optimization for BB genotype case not in old model. At this point, integration tests pass except for 1 case where QUALs differ by 0.01 (this is okay because I occasionally need to compute extra cells in the matrix which affects the approximations) and 2 cases where multi-allelic indels are being genotyped (some work still needs to be done to support them).	2011-12-03 23:12:04 -05:00
Mark DePristo	3060a4a15e	Support for list of known CNVs in VariantEval -- VariantSummary now includes novelty of CNVs by reciprocal overlap detection using the standard variant eval -knownCNVs argument -- Genericizes loading for intervals into interval tree by chromosome -- GenomeLoc methods for reciprocal overlap detection, with unit tests	2011-11-30 17:05:16 -05:00
Laurent Francioli	9574be0394	Updated MendelianViolationEvaluator integration test	2011-11-30 14:44:15 +01:00
Laurent Francioli	a4606f9cfe	Merge branch 'MendelianViolation' Conflicts: public/java/src/org/broadinstitute/sting/utils/MendelianViolation.java	2011-11-30 11:13:15 +01:00
Laurent Francioli	7d58db626e	Added MendelianViolationEvaluator integration test	2011-11-30 10:09:20 +01:00
Ryan Poplin	110298322c	Adding Transmission Disequilibrium Test annotation to VariantAnnotator and integration test to test it.	2011-11-29 09:29:18 -05:00
Mark DePristo	e60272975a	Fix for changed MD5 in streaming VCF test	2011-11-23 19:01:33 -05:00
Mark DePristo	12f09d88f9	Removing references to SimpleMetricsByAC	2011-11-23 16:08:18 -05:00
Mark DePristo	4107636144	VariantEval updates -- Performance optimizations -- Tables now are cleanly formatted (floats are %.2f printed) -- VariantSummary is a standard report now -- Removed CompEvalGenotypes (it didn't do anything) -- Deleted unused classes in GenotypeConcordance -- Updates integration tests as appropriate	2011-11-23 13:02:07 -05:00
Mark DePristo	e484625594	GenotypesContext now updates cached data for add, set, replace operations when possible -- Involved separately managing the sample -> offset and sample sorted list operations. This should improve performance throughout the system	2011-11-22 08:40:48 -05:00
Mark DePristo	29ca24694a	UG now encoding NO_CALLs as ./. not ./.:.:4:0,0,0 A few updated UGs integration tests	2011-11-22 08:22:32 -05:00
Mark DePristo	2b51c01df4	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-11-21 19:16:06 -05:00
Mark DePristo	5443d3634a	Again, fixing the add call when we really mean replace -- Updating MD5s for UG to reflect that what was previously called ./.:.:10:0,0,0 is now just ./. Eric will fix long-standing bug in QD observed from this change -- VFW MD5s restored to their old correct values. There was a bug in my implementation to caused the genotypes to not be parsed from the lazy output even through the header was incorrect.	2011-11-21 19:15:56 -05:00
Mauricio Carneiro	5ad3dfcd62	BugFix: byte overflow in SyntheticRead compressed base counts * fixed and added unit test	2011-11-21 17:11:50 -05:00
Mark DePristo	2c501364b8	GenotypesContext no longer have immutability in constructor -- additional bug fixes throughout VariantContext and GenotypesContext objects	2011-11-21 14:34:31 -05:00
David Roazen	1296dd41be	Removing the legacy -L "interval1;interval2" syntax This syntax predates the ability to have multiple -L arguments, is inconsistent with the syntax of all other GATK arguments, requires quoting to avoid interpretation by the shell, and was causing problems in Queue. A UserException is now thrown if someone tries to use this syntax.	2011-11-21 13:18:53 -05:00
Mark DePristo	2e9ecf639e	Generalized interface to LazyGenotypesContext -- Now you provide a LazyParsing object -- LazyGenotypesContext now knows nothing about the VCF parser itself. The parser holds all of the necessary data to parse the VCF genotypes when necessarily, and the LGC only has a pointer to this object -- Using new interface added LazyGenotypesContext to unit tests with a simple lazy version -- Deleted VCFParser interface, as it was no longer necessary	2011-11-21 09:30:40 -05:00
Mark DePristo	f0ac588d32	Extensive unit test for GenotypeContextUnitTest -- Currently only tests base class. Adding subclass testing in a bit	2011-11-20 18:28:01 -05:00
Mark DePristo	9cb3fe3a59	Vastly better way of doing on-demand genotyping loading -- With our GenotypesContext class we can naturally create a LazyGenotypesContext subclass that does the on-demand loading. -- This new class was replaced all of the old, complex functionality -- Better still, there were many cases were the genotypes were being loaded unnecessarily, resulting in efficiency. This was detected because some of the integration tests changed as the genotypes were no longer being parsing unnecessarily -- Misc. bug fixes throughout the system -- Bug fixes for PhaseByTransmission with new GenotypesContext	2011-11-20 08:23:09 -05:00
Mark DePristo	7d09c0064b	Bug fixes and code cleanup throughout -- chromosomeCounts now takes builder as well, cleaning up a lot of code throughout the codebase.	2011-11-19 18:40:15 -05:00
Mark DePristo	707bd30b3f	Should have been @BeforeMethod	2011-11-19 16:10:09 -05:00
Mark DePristo	8f7eebbaaf	Bugfix for pError not being checked correctly in CommonInfo -- UnitTests to ensure correct behavior -- UnitTests to ensure correct behavior for pass filters vs. failed filters vs. unfiltered	2011-11-19 15:58:59 -05:00
Mark DePristo	b7b57ef39a	Updating MD5 to reflect canonical ordering of calculation -- We should no longer have md5s changing because of hashmaps changing their sort order on us -- Added GenotypeLikelihoodsUnitTests -- Refactored ExactAFCaclculation to put the PL -> QUAL calculation in the GenotypeLikelihoods class to avoid the code copy.	2011-11-19 15:57:33 -05:00
Mark DePristo	73119c8e3c	Merge with master -- A few bug fixes	2011-11-19 09:56:06 -05:00
Mark DePristo	f685fff79b	Killing the final versions of old new VariantContext interface	2011-11-18 21:32:43 -05:00
Mark DePristo	6cf315e17b	Change interface to getNegLog10PError to getLog10PError	2011-11-18 21:07:30 -05:00
Matt Hanna	8bb4d4dca3	First pass of the asynchronous block loader. Block loads are only triggered on queue empty at this point. Disabled by default (enable with nt:io=?).	2011-11-18 15:02:59 -05:00
Mark DePristo	f54afc19b4	VariantContextBuilder -- New approach to making VariantContexts modeled on StringBuilder -- No more modify routines -- use VariantContextBuilder -- Renamed isPolymorphic to isPolymorphicInSamples. Same for mono -- getChromosomeCount -> getCalledChrCount -- Walkers changed to use new VariantContext. Some deprecated new VariantContext calls remain -- VCFCodec now uses optimized cached information to create GenotypesContext.	2011-11-18 12:39:10 -05:00
Mark DePristo	7490dbb6eb	First version of VariantContextBuilder	2011-11-18 11:06:15 -05:00
Mark DePristo	fa454c88bb	UnitTests for VariantContext for chrCount, getSampleNames, Order function -- Major change to how chromosomeCounts is computed. Now NO_CALL alleles are always excluded. So ChromosomeCounts(A/.) is 1, the previous result would have been 2. -- Naming changes for getSamplesNameInOrder()	2011-11-17 20:37:22 -05:00
Mark DePristo	02f22cc9f8	No more VC integration tests. All tests are now unit tests	2011-11-17 15:33:09 -05:00
Khalid Shakir	c50274e02e	During flanking interval creation merging overlapping flanks so that on scatter the list doesn't accidentally genotype the same site twice. Moved flanking interval utilies to IntervalUtils with UnitTests.	2011-11-17 13:56:42 -05:00
Eric Banks	bad19779b9	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-11-17 13:29:43 -05:00
Eric Banks	16a021992b	Updated header description for the INFO and FORMAT DP fields to be more accurate.	2011-11-17 13:17:53 -05:00
Mark DePristo	7e66677769	Expanded UnitTests for VariantContext Tests for -- getGenotype and getGenotypes -- subContextBySample -- modify routines	2011-11-16 20:45:15 -05:00
Mauricio Carneiro	72f00e2883	Merging Roger's Unit tests for Reduce Reads from RR repository	2011-11-16 17:26:49 -05:00
Mark DePristo	aa0610ea92	GenotypeCollection renamed to GenotypesContext	2011-11-16 16:24:05 -05:00
Mark DePristo	974daaca4d	V13 version in archive. Can you pulled out wholesale for performance testing	2011-11-16 16:08:46 -05:00
Mark DePristo	101ffc4dfd	Expanded, contrastive VariantContextBenchmark -- Compares performance across a bunch of common operations with GATK 1.3 version of VariantContext and GATK 1.4 -- 1.3 VC and associated utilities copied wholesale into test directory under v13	2011-11-16 13:35:16 -05:00
Mark DePristo	e56d52006a	Continuing bugfixes to get new VC working	2011-11-16 10:39:17 -05:00
Eric Banks	c2ebe58712	Merge remote-tracking branch 'Laurent/master'	2011-11-16 09:34:47 -05:00
David Roazen	0d163e3f52	SnpEff 2.0.4 support -Modified the SnpEff parser to work with the SnpEff 2.0.4 VCF output format -Assigning functional classes and effect impacts now handled directly by SnpEff rather than the GATK -Removed support for SnpEff 2.0.2, as we no longer trust the output of that version since it doesn't exclude effects associated with certain nonsensical transcripts. These effects are excluded as of 2.0.4. -Updated unit and integration tests This support is based on a release-candidate of SnpEff 2.0.4, and so is subject to change between now and the next GATK release.	2011-11-15 18:36:22 -05:00
Mark DePristo	df415da4ab	More bug fixes on the way to passing all tests	2011-11-15 17:38:12 -05:00
Laurent Francioli	fb685f88ec	Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-11-15 16:23:53 -05:00
Mark DePristo	460a51f473	ID field now stored in the VariantContext itself, not the attributes	2011-11-15 14:56:33 -05:00
Eric Banks	7fada320a9	The right fix for this test is just to delete it.	2011-11-15 14:53:27 -05:00
Mark DePristo	233e581828	Merging in Master	2011-11-15 09:28:24 -05:00
Mark DePristo	6e1a86bc3e	Bug fixes to VariantContext and GenotypeCollection	2011-11-15 09:21:30 -05:00
Roger Zurawicki	284430d61d	Added more basic UnitTests for ReadClipper hardClipByReadCoordinatesWorks hardClipLowQualTailsWorks	2011-11-15 00:13:52 -05:00
Roger Zurawicki	8e91e19229	Merge branch 'master' of ssh://nickel/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-11-15 00:13:37 -05:00
Mauricio Carneiro	cde829899d	compress Reduce Read counts bytes by offset compressed the representation of the reduce reads counts by offset results in 17% average compression in final BAM file size. Example compression --> from : 10, 10, 11, 11, 12, 12, 12, 11, 10 to: 10, 0, 1, 1,2, 2, 2, 1, 0	2011-11-14 18:30:24 -05:00
Mark DePristo	4ff8225d78	GenotypeMap -> GenotypeCollection part 3 -- Test code actually builds	2011-11-14 17:51:41 -05:00
Mark DePristo	f0234ab67f	GenotypeMap -> GenotypeCollection part 2 -- Code actually builds	2011-11-14 17:42:55 -05:00
Mark DePristo	2e9d5363e7	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-11-14 15:32:06 -05:00
Mark DePristo	1fbdcb4f43	GenotypeMap -> GenotypeCollection	2011-11-14 15:32:03 -05:00
Eric Banks	7b2a7cfbe7	Transfer headers from the resource VCF when possible when using expressions. While there, VA was modified so that it didn't assume that the ID field was present in the VC's info map in preparation for Mark's upcoming changes.	2011-11-14 14:31:27 -05:00
Mark DePristo	9b5c79b49d	Renamed InferredGeneticContext to CommonInfo -- I have no idea why I named this InferredGeneticContext, a totally meaningless term -- Renamed to CommonInfo. -- Made package protected, as no one should use this outside of VariantContext and Genotype -- UGEngine was using IGC constant, but it's now using the public one in VariantContext.	2011-11-14 14:28:52 -05:00
Mark DePristo	077397cb4b	Deleted MutableVariantContext -- All methods that used this capable now use VariantContext directly instead	2011-11-14 14:19:06 -05:00
Mark DePristo	79987d685c	GenotypeMap contains a Map, not extends it -- On path to replacing it with GenotypeCollection	2011-11-14 12:55:03 -05:00
Laurent Francioli	1347beef40	Merge branch 'PhaseByTransmission'	2011-11-14 11:31:28 +01:00
Laurent Francioli	6881d4800c	Added Integration tests for Phasing by Transmission	2011-11-14 10:47:51 +01:00
Laurent Francioli	34acf8b978	Added Unit tests for new methods in GenotypeLikelihoods	2011-11-14 10:47:02 +01:00
Roger Zurawicki	1202a809cb	Added Basic Unit Tests for ReadClipper Tests some but not all functions Some tests have been disabled because they are not working	2011-11-13 22:27:49 -05:00
Mark DePristo	fee9b367e4	VariantContext genotypes are now stored as GenotypeMap objects -- Enables further sophisticated optimizations, as this class can be smarter about storing the data and will directly support operations like subset to samples -- All instances in the gatk that used Map<String, Genotype> now use GenotypeMap type. -- Amazingly, there were many places where HashMap<String, Genotype> is used, so that the order of the genotypes is technically undefined and could be dangerous. Now everything uses GenotypeMap with a specific ordering of samples (by name) -- Integrationtests updated and all pass	2011-11-11 15:00:35 -05:00
Mark DePristo	4938569b3a	More general handling of parameters for VariantContextBenchmark	2011-11-11 10:22:19 -05:00
Mark DePristo	e216e85465	First working version of VariantContextBenchmark	2011-11-11 09:56:00 -05:00
Mark DePristo	ee40791776	Attributes are now Map<String,Object> not Map<String,?> -- Allows us to avoid an unnecessary copy when creating InferredGeneticContext (whose name really needs to change).	2011-11-11 09:55:42 -05:00
Mark DePristo	153e52ffed	VariantEvalIntegrationTest for IntervalStratification	2011-11-10 14:10:39 -05:00
Mauricio Carneiro	d00b2c6599	Adding a synthetic read for filtered data * Generalized the concept of a synthetic read to cread both running consensus and a synthetic reads of filtered data. * Synthetic reads can now have deletions (but not insertions) * New reduced read tag for filtered data synthetic reads (RF) * Sliding window header now keeps information of consensus and filtered data * Synthetic reads are created simultaneously, new functionality is controlled internally by addToSyntheticReads	2011-11-09 20:16:22 -05:00
Eric Banks	02d5e3025e	Added integration test for intervals from bed file	2011-11-09 15:34:19 -05:00
Ryan Poplin	94dc447a70	Merged bug fix from Stable into Unstable	2011-11-07 15:26:35 -05:00
Ryan Poplin	0b181be61f	Bug fix in SelectVariants when using a discordance track but no sample specifications. Added integration test to test this.	2011-11-07 15:25:16 -05:00
Eric Banks	759f4fe6b8	Moving unclaimed walker with bad integration test to archive	2011-11-07 13:16:38 -05:00
Eric Banks	3517489a22	Better --sample selection integration test for VE. The previous one would return true even if --sample was not working at all.	2011-11-06 01:07:49 -04:00
Eric Banks	ad57bcd693	Adding integration test to cover using expressions with IDs (-E foo.ID)	2011-11-05 23:53:15 -04:00
Mauricio Carneiro	e89ff063fc	GATKSAMRecord refactor The GATK engine will now provide a GATKSAMRecord to all tools which incorporates the functionality used by the GATK to the bam file (ReadGroups, Reduced Reads, ...). * No tools should create SAMRecord anymore, use GATKSAMRecord instead *	2011-11-03 15:43:26 -04:00
Eric Banks	e8bceb1eaa	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-11-02 21:13:54 -04:00
Eric Banks	78a00d2ddc	Updating UG integration tests (needed updating only because the -mbq default is different from the old -mmq one).	2011-11-02 21:13:44 -04:00
Eric Banks	e1edd6bd12	Removing the min mapping quality argument since it wasn't being used in the normal processing of the pileups in UG - only for indel pileups. Instead, we apply the min base quality to the reads in the pileup for indels and define it to be the min 'confidence' of the base. Docs are updated but I didn't rename the argument as I don't want people to complain.	2011-11-02 20:32:58 -04:00
Mark DePristo	8a2929c1dd	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-11-02 16:21:00 -04:00
Eric Banks	4501dce58d	Fixing merge conflict	2011-11-02 12:50:32 -04:00
Eric Banks	54331b44e9	New way of looking at the size of a pileup: there's a physical number of elements in the data structure and there's a representative depth of coverage (since a reduced read represents depth >= 1). The size() method has been removed because its meaning is ambiguous. Updated several annotations and the UG engine to make use of the representative depths.	2011-11-02 12:47:30 -04:00
Mark DePristo	392e0aeace	Moved unit tests into master IntervalUtilsUnitTest	2011-11-02 10:52:00 -04:00
Mark DePristo	c2b97030a4	IntervalUtils for completely balanced locus-based scatter/gather -- scatterLocusIntervals master utility -- Moved around some general functionality from GenomeLocSortedSet to GenomeLoc -- Util function for reversing a list (List<T> -> List<T>, unlike Collections version) -- DoC is PartitionType.INTERVAL -- Significant unit tests on new functionality (all passing) -- Ready for real-world testing, as soon as I can get LocusScatterFunction.scala to actually work	2011-11-02 10:49:40 -04:00
Mauricio Carneiro	b004489c6d	Moving ReduceRead TAG to GATKSAMRecord ReduceReads are now a feature of a GATKSAMRecord, so the tag and the special methods needed to use it will now be housed by the GATKSAMRecord.	2011-11-01 17:12:09 -04:00
Eric Banks	0ca7428e76	Allow processing of empty intervals, but warn user when this case is encountered.	2011-10-28 12:12:14 -04:00
Eric Banks	649dfe98f0	Add VCF header for any expressions that are requested	2011-10-28 10:22:19 -04:00
Eric Banks	8b1a62da27	Adding unit test to cover overlapping intervals from the same source with the intersection rule.	2011-10-28 09:59:43 -04:00
Eric Banks	6ba08a103d	Empty ROD files should generate an exception when used for creating intervals. Moved some now obsolete files to the archive as the realigner will now read all target intervals into memory.	2011-10-28 09:23:25 -04:00
Eric Banks	19e27d4568	Removing all instances of -BTI (in tests and in GATKdocs) and replacing them with the appropriate alternative.	2011-10-27 23:55:11 -04:00
Eric Banks	ccfd853b34	Added further integration tests for rod-based intervals that deal with more complex cases. Good call by Mark to test the empty VCF example because we were failing on it; fixed.	2011-10-27 20:43:50 -04:00
Khalid Shakir	b80d407dc7	No more hunting down R "resources". As a tradeoff Rscript cannot be specified on the commandline and will be found in the environment path. Other minor cleanup.	2011-10-27 14:17:07 -04:00
Eric Banks	8c4dbce6d8	Don't serialize the GATKArgumentCollection for the GATKRunReports (which would have meant dealing with the new IntervalBindings). Also, forgot to remove a test that's no longer relevant to BED parsing.	2011-10-27 13:58:19 -04:00
Eric Banks	4a7e6fee3f	Remove support for BED file interval parsing in the GATK; it should all go through Tribble now. IndelRealigner no longer supports unordered interval input (which shouldn't have been used anyways). Temporarily commenting out serialization of arguments so that tests pass; this whole piece will be deleted soon anyways.	2011-10-27 13:38:08 -04:00
Eric Banks	44f905b5e5	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-10-26 23:31:11 -04:00
Mark DePristo	034a997d07	Generalized Reads -> Fragment calculation -- Supports ReadBackedPileup -> FragmentCollection as before -- Added support for List<SAMRecord> -> FragmentCollection for Ryan's haplotype caller -- General cleanup, renaming, move to separate package, more extensive unit tests, etc. -- Added toFragment() function to ReadBackedPileup interface	2011-10-26 15:54:38 -04:00
Eric Banks	b39fcb1bea	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-10-26 15:44:25 -04:00
Eric Banks	3273c20c98	Added integration tests for Tribble-based intervals and fixed up some of the other tests based on some method changes.	2011-10-26 15:29:18 -04:00
Mark DePristo	7fa943aef1	Renamed FragmentPileup to FragmentUtils	2011-10-26 14:01:45 -04:00
Mark DePristo	1b722c21cf	merge master	2011-10-25 16:08:39 -04:00
David Roazen	2794e5c1d4	Modified the VCFJarClassLoadingUnitTest to play nice with the packaged-jar test targets.	2011-10-25 14:47:15 -04:00
Khalid Shakir	fac9932938	Embedding gsalib source and queueJobReport R scripts in the dist and package jars. Moved gsalib and queueJobReport.R to embeddable namespaced locations. Updated packager dependencies/dir to add an @includes which filters the embedded fileset. RScriptExecutor can now JIT compiles the gsalib. RScriptExecutor uses ProcessController and sends the Rscript output to java's stdout when run under -l DEBUG. Refactored ProcessController and IOUtils from Queue to Sting Utils. Added more unit tests to ProcessController along with a utility class to hard stop OutputStreams at a specified byte count. Replaced uses of some IOUtils with Apache Commons IO. ShellJobRunner refactored to use direct ProcessController and now kills jobs on shutdown. Better QGraph responsiveness on shutdown by using Object.wait() instead of Thread.sleep().	2011-10-24 15:58:34 -04:00
Khalid Shakir	89a581a66f	Added ability to specify arguments in files via -args/--arg_file Pushing back downsample and read filter args so they show up in getApproximateCommandLineArgs()	2011-10-24 15:58:34 -04:00
Mark DePristo	502592671d	Cleanup FragmentPileup before main repo commit -- removed intermiate functions. Now only original version and best optimized new version remain -- Moved general artificial read backed pileup creation code into ArtificialSamUtils	2011-10-24 14:40:05 -04:00
Mark DePristo	166174a551	Google caliper example execution script -- FragmentPileup with final performance testing	2011-10-24 14:04:53 -04:00
Mark DePristo	42bf9adede	Initial version of "fast" FragmentPileup code -- Uses mayOverlapRoutine in ReadUtils -- Attempts to be smart when doing overlap calculation, to avoid unnecessary allocations -- PileupElement now comparable (sorts on offset than on start) -- Caliper microbenchmark to assess performance	2011-10-22 21:36:37 -04:00
Guillermo del Angel	f4b409fa0d	CombineVariants bug fix: when merging records with disparate alleles we were leaving AC,AF fields intact. This had as a consequence that we could end up with a record with 3 alt alleles but only 2 values in AC,AF fields. Now, if alleles in combined vc are different from original, and if AC,AF fields can't be recomputed from genotypes, we remove attributes from vc map since they'll be invalid anyway. Integration test md5 changed since there were several badly merged records in result	2011-10-21 14:07:20 -04:00
Mark DePristo	b863390cb1	Moving reduced read functionality into GATKSAMRecord -- More functions take / produce GATKSAMRecords instead of SAMRecord	2011-10-21 13:28:05 -04:00
Mark DePristo	110e13bc1e	Merge branch 'master' into SamRecordFactory	2011-10-21 09:43:52 -04:00
Mark DePristo	3227143a1c	Systematic test code for FragmentPileup -- Creates all combinatinos of overlapping and non-overlapping read pair pileups in all orientations and first/second pairings to validate fragment detection.	2011-10-19 17:50:27 -04:00
Eric Banks	d8d73fe4f2	Treat ./X genotypes as MIXED so that isHet, isHom, etc. still return the expected and correct values. Added docs to these accessors with contracts explicitly mentioned. Fixed case where NPE could be thrown.	2011-10-19 15:11:13 -04:00
Eric Banks	5a6468c11e	Allowing ./X genotypes and adding a unit test to ensure that this case is covered from now on (especially given that we may want to revert in the future). Reverting this change is really easy and entails uncommenting a few lines of code. But for now, despite Mark's objections, this case is allowed in the VCF spec and we are wrong not to allow it.	2011-10-19 11:52:05 -04:00
David Roazen	88d6b8bc1f	Merged bug fix from Stable into Unstable	2011-10-14 20:13:38 -04:00
David Roazen	bd8bb93811	Split RScriptExecutorUnitTest into public and private test classes. We can't have a public test that depends on both public and private code/data -- the new release system needs to do public-only tests, and will catch this sort of thing.	2011-10-14 20:04:42 -04:00
David Roazen	4f01a742cb	Merged bug fix from Stable into Unstable	2011-10-13 21:39:52 -04:00
David Roazen	edfd6f8a06	Removing a public -> private dependency from the test suite. The public integration test VariantContextIntegrationTest was dependent on the private walker TestVariantContextWalker. Moved this walker to public/java/test (NOT public/java/src, since this walker is only used by the test suite) to avoid errors during public-only tests.	2011-10-13 21:32:52 -04:00
Mark DePristo	404ef741f1	Merged bug fix from Stable into Unstable	2011-10-13 18:02:06 -04:00
Mark DePristo	2ebdff074c	Update MD5s for SOLiD recalibration -- MD5 db had spelling error; fixed -- Bug in AlignmentUtils resulted in some bases not being color space corrected. The integration test caught the change, and it's clear that the new version is correct, as the prev. version was not considering the last the N qualities for reads with a ND operation.	2011-10-13 18:01:51 -04:00
Eric Banks	9aecd50473	Adding ability to exclude annotations from the VA and UG lists. As described in the docs, this argument trumps all others (including -all) so that we can get around the SnpEff issue brought up by Menachem. Added integration test for it.	2011-10-12 15:44:54 -04:00
David Roazen	cfd0ac8410	Merged bug fix from Stable into Unstable Conflicts: public/java/test/org/broadinstitute/sting/gatk/walkers/genotyper/UnifiedGenotyperIntegrationTest.java	2011-10-11 12:03:51 -04:00
David Roazen	24b72334b3	UnifiedGenotyper now correctly initializes the VariantAnnotator engine. This allows the annotation classes to perform any necessary initialization/validation. For example, it allows the SnpEff annotator to (among other things) validate its rod binding. This will prevent a NullPointerException when SnpEff annotation is requested but no rod binding is present. Added an integration test to cover this case so that it doesn't break again.	2011-10-11 12:02:05 -04:00
Mark DePristo	fb72bcf732	DiffObjects no longer prints out the file name in the status so MD5 are stable	2011-10-10 15:10:57 -04:00
Mark DePristo	e3ff4f4266	Failing MD5 because output now contains absolute path	2011-10-10 11:05:02 -04:00
Mark DePristo	3e6c16d961	CombineVariants preserves allele order	2011-10-10 11:04:38 -04:00
Mark DePristo	a4bb842958	RankSum tests have lightly different MD5 results based on allele order -- UG GENOTYPE_GIVEN_ALLELES now uses the order of alleles in the VCF, so this changes the MD5	2011-10-10 11:04:07 -04:00
Mark DePristo	46e7370128	this.allele, getAlleles(), and getAltAlleles() now return List not set -- Changes associated code throughout the codebase -- Updated necessary (but minimal) UnitTests to reflect new behavior -- Much better makealleles() function in VC.java that enforces a lot of key constraints in VC	2011-10-09 11:45:55 -07:00
Mark DePristo	822654b119	UnitTests for allele getting functions in VC in prep for move from set to list	2011-10-09 10:36:14 -07:00
Mark DePristo	c67f6c076b	simpleMerge now preserves allele order -- UnitTests for dangerous PL merging cases in the multi-allelic case. The new behavior is correct	2011-10-08 17:39:53 -07:00
Mark DePristo	e94e6ba101	A UnitTest to ensure that the order of alleles is maintained -> A, C, T and A, T, C are different and must be maintained. The constructors were doing this appropriately, so nothing needed to be changed	2011-10-08 08:47:58 -07:00
Matt Hanna	6fbd41724a	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-10-07 11:20:00 -04:00
Matt Hanna	4514bc350f	More reliable way of finding the Tribble jar.	2011-10-07 11:19:29 -04:00
Eric Banks	181c76750e	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-10-06 22:38:55 -04:00
Eric Banks	ca9cd9b688	Minor fix for merging intervals which hadn't been necessary when only merging from the left to right. Added integration tests to cover the parallelization of RTC.	2011-10-06 22:38:44 -04:00
Khalid Shakir	f91b015e0e	Made the BaseTest.testDir absolute	2011-10-06 22:33:21 -04:00
Eric Banks	61a3dfae24	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-10-06 15:58:04 -04:00
Eric Banks	6eb87bf58a	RTC now caches all intervals as GenomeLocs (which is expected to take < 1Gb whole genome based on back of the envelope calculations with Matt) so that 1) we don't have to worry about emitting outside of the leaves in the hierarchical reductions and 2) we can emit the intervals in sorted order which is a big performance plus for the realigner. Integration tests change only because intervals whose start=stop are now printed as chr:start instead of chr:start-stop.	2011-10-06 15:57:49 -04:00
Mark DePristo	6d9c210460	Updating MD5s for updated BAM with read groups	2011-10-06 12:15:48 -07:00
Matt Hanna	3961733590	Merged bug fix from Stable into Unstable	2011-10-06 12:54:52 -04:00
Matt Hanna	4fa5045e84	Abandoning classfileset/rootfileset approach due to difficulting managing classloading of bcel.jar/ant-apache-bcel.jar. Switching instead to manually specifying a minimal set of packages/classes to include in the vcf.jar via build.xml, and adding a unit test which creates a limited classloader only aware of vcf.jar and tribble.jar and tries to use it to load the core classes in the vcf jar. Hopefully third time's the charm.	2011-10-06 12:49:51 -04:00
Mark DePristo	4b5b9155a9	Fixed bad expected value in PedReaderUnitTest	2011-10-06 08:16:47 -07:00
Mark DePristo	3226d5dc0d	Merge branch 'master' into ped	2011-10-05 15:03:09 -07:00
Mark DePristo	e7c80f7c45	Renaming quantitative trait to OtherPhenotype which is now a String not a double -- we can now use PED file to represent population data or other arbitrary phenotype data, not just doubles	2011-10-05 12:26:33 -07:00
Mark DePristo	51ecc20867	getFamily() and associated methods implemented and tested -- Sample no longer serializable -- Sample now implements Comparable	2011-10-05 09:55:05 -07:00
Mark DePristo	f4bac58f14	Merged bug fix from Stable into Unstable	2011-10-04 21:00:34 -07:00
Mark DePristo	d1d39943d0	Updating MD5 for BAMs that I added a read group to, part 2	2011-10-04 21:00:15 -07:00
Mark DePristo	9bd3ba4c7e	Missed one MD5	2011-10-04 16:04:52 -07:00
Mark DePristo	ffdfdcde3f	Updating MD5s -- Interval test now uses RG containing BAM -- DoC sample name ordering has changed.	2011-10-04 15:54:45 -07:00
Mark DePristo	463eab7604	All MD5 mismatches for test are shown -- Now for tests like DoC, with 20 output md5s, you see all of the differences before failing.	2011-10-04 15:53:52 -07:00
Mark DePristo	c642a080d4	Merged bug fix from Stable into Unstable	2011-10-04 14:08:41 -07:00
Mark DePristo	941317167e	Updating MD5 for BAMs that I added a read group to	2011-10-04 14:08:00 -07:00
Mark DePristo	e1d6c7a50a	Updating MD5 that have changed due to sample ordering differences	2011-10-04 09:33:23 -07:00
Mark DePristo	343a7b6b2f	Updating UG integration tests for arbitrary impact of sample order changes on downsampling	2011-10-04 08:14:00 -07:00
Mark DePristo	a27641e1fc	Cleaned up imports	2011-10-04 06:28:36 -07:00
Mark DePristo	b20689ff55	No longer supports extraProperties -- the underlying data structure is still present, but until I decide what to do for the extensible system I've completely disabled the subsystem -- Added code to merge Samples, so that a mostly full record can be merged with a consistent empty record. If the two records are inconsistent, an error is thrown -- addSample() in Sample.class now invokes mergeSample() when appropriate -- Validation types are now only STRICT or SILENT -- Validation code implemented in SampleDBBuilder -- Extensive unit tests for SampleDBBuilder	2011-10-03 19:20:33 -07:00
Mark DePristo	867a7476c1	Systematic unit tests for the sample object	2011-10-03 19:09:02 -07:00
Mauricio Carneiro	3837aa45b4	Fixing conflicts Conflicts: public/java/test/org/broadinstitute/sting/utils/clipreads/ReadClipperUnitTest.java	2011-10-03 19:07:59 -07:00
Mark DePristo	2e3dc52088	Minor function renaming	2011-10-03 14:41:13 -07:00
Mark DePristo	dd71884b0c	On path to SampleDB engine integration -- PedReader tag parser -- Separation of SampleDBBuilder from SampleDB (now immutable) -- Removed old sample engine arguments	2011-10-03 12:08:07 -07:00
Mark DePristo	89ac50e86e	SampleDataSource -> SampleDB	2011-10-03 09:33:30 -07:00
Mark DePristo	93fba06cb5	Support for whitespace only lines	2011-10-03 09:30:10 -07:00
Mark DePristo	0604ce55d1	PedReader support for ; separated lines, not only newline	2011-10-03 09:19:58 -07:00
Mark DePristo	52f670c8b8	100% version of PedReader -- Passes all unit tests -- Added unit tests for missing fields	2011-10-03 06:12:58 -07:00
Roger Zurawicki	bf6a3a6532	Added framework to do batch CigarClip Testing *NOTE: This commit has not been compiled!	2011-10-02 22:33:46 -04:00
Mark DePristo	dd75ad9f49	95% PedReader -- Passes significiant unit tests -- Implicit sample creation for mom / dad when you create single samples -- Continuing cleanup of Sample and SampleDataSource	2011-09-30 18:03:34 -04:00
Mark DePristo	84160bd83f	Reorganization of Sample -- Moved Gender and Afflication to separate public enums -- PedReader 90% implemented -- Improve interface cleanup to XReadLines and UserException	2011-09-30 15:50:54 -04:00
Mark DePristo	56f10b40a8	Fixing test bugs for WindowMaker that required empty sample list	2011-09-30 14:18:27 -04:00
Mark DePristo	30d23942b1	Renamed ReadBackedPileup getXSampleName() functions to getXSample -- now that we don't have Sample objects floating around we don't have to have all of the Name extensions on our functions	2011-09-30 10:02:57 -04:00
Mark DePristo	e055a78f6e	LIBS now requires at least one sample be present -- UnitTest provides a "null" sample for matching the reads without read groups	2011-09-30 09:49:35 -04:00
Mark DePristo	b71b51751e	Bug fix for UnitTest -- Provide the null sample to the LIBS, as this seems to be required for correctly passing this unit test -- Will be fixed in a future update	2011-09-29 17:30:01 -04:00
Mark DePristo	1765fbeb6b	Merge branch 'master' into ped	2011-09-29 17:18:51 -04:00
Mark DePristo	98ecaf8aa0	Support for ReducedReads with reduced counts and average quals -- ReadUtils and UnitTest updated to support new byte[] style -- Removed unnecessary read transformer in PairHMM	2011-09-29 17:18:39 -04:00
Mark DePristo	9458f01409	Test cleanup of Sample object	2011-09-29 15:13:05 -04:00
Mark DePristo	625ffb6a07	LocusIteratorByState and ReadBackedPileups no long use Sample	2011-09-29 14:52:11 -04:00
Mark DePristo	505416b6c0	Merge branch 'master' into ped	2011-09-29 12:22:39 -04:00
Mauricio Carneiro	4086fa768f	Disabling all ReadClipperUnitTests	2011-09-29 12:20:35 -04:00
Mark DePristo	5043d76c3d	Removing more bad uses of SampleDataSource creation	2011-09-29 12:16:34 -04:00
Mark DePristo	5c9227cf5e	Further cleanup of Sample database -- Removing more and more unnecessary code -- Partial removal of type safe Sample usage. On the road to SampleDB only	2011-09-29 11:50:05 -04:00
Mark DePristo	2a0cd556d3	Further cleanup of Sample -- Cleaned up interface functions in GAE -- Added Walker.getSampleDB() function which is an easier option for tools to get the samples db	2011-09-29 10:34:51 -04:00
Mark DePristo	e76f381628	Moved sample package from DataSources to gatk, and renamed it samples -- All associated changes to the codebase are just header updates	2011-09-29 09:57:15 -04:00
Mauricio Carneiro	fc86cd6fd8	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/carneiro/gatk/RR into rr	2011-09-29 00:12:15 -04:00
Roger Zurawicki	4fd5630f6a	Added ReadClipper Unit Test * Includes tests that include HardClip to Read and Reference Coords. * Changed ReadUtils.HardClipByReferenceCoordinates from private to protected to allow for testing	2011-09-28 23:13:50 -04:00
Matt Hanna	9272ed03b5	Merged bug fix from Stable into Unstable	2011-09-28 21:26:43 -04:00
Matt Hanna	0acaf2df65	Fix an embarrassing issue where a specific configuration of minimal coverage over small intervals could cause reads to be dropped from the pileup. Nothing to see here...	2011-09-28 21:23:01 -04:00
Mark DePristo	4f09453470	Refactored reduced read utilities -- UnitTests for key functions on reduced reads -- PileupElement calls static functions in ReadUtils -- Simple routine that takes a reduced read and fills in its quals with its reduced qual	2011-09-26 12:58:31 -04:00
Guillermo del Angel	3eef800889	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-09-24 21:20:11 -04:00
Guillermo del Angel	4707ab4a7d	Added unit tests to test genotype merges with PL's	2011-09-24 21:17:15 -04:00
Guillermo del Angel	203517fbb7	a) Cleanups/bug fixes to previous commit to CombineVariants. b) Change md5 to reflect records that are now merged correctly. c) Change unit merge alleles test to reflect the fact that a null non-variant vc object is not valid and not supported because there's no way to codify such object in a vcf. The code correctly converts this to a non-variant single-base event with whatever the reference is at that location.	2011-09-24 19:08:00 -04:00
Guillermo del Angel	cd058dd10f	a) Fixed md5 for legit change in UG output that now also no-calls genotypes w/0,0,0 in PL's in SNP case. b) First reimplementation of new vc merger of different types. Previous version did it in two steps, first merging all vc's per type and then trying to see if resulting vc's would be merged if alleles of one type were a subset of another, but this won't work when uniquifying genotypes since sample names would be messed up and GT sample names wouldn't match VC sample names. Now, it's actually simpler: when splitting vc's by type before merging, we check for alleles of one vc being a subset of alleles of vc of another type and if so we put them together in same list.	2011-09-24 13:40:11 -04:00
Mark DePristo	8d9e136bba	Merge branch 'stable'	2011-09-24 09:26:28 -04:00
Mark DePristo	f792353dcd	Framework for genotype unit test	2011-09-24 08:56:45 -04:00
Mark DePristo	c0bb0cb465	Make DiploidGenotype enum private to walkers.genotyper	2011-09-24 08:48:33 -04:00
Khalid Shakir	1803bd6ae2	Merged bug fix from Stable into Unstable	2011-09-23 21:05:00 -04:00
Khalid Shakir	8ceb93b8ac	Fixed an integration test which crashed on the out of date LSF DRMAA library when run against the obsolete LSF dotkit instead of .combined_LSF_SGE	2011-09-23 21:03:22 -04:00
David Roazen	40202c85e0	Merged bug fix from Stable into Unstable	2011-09-23 16:35:55 -04:00
David Roazen	e1cb5f6459	SnpEff annotator now assigns a functional class to each effect and distinguishes between actual effects and mere modifiers. -We now assign a functional class (nonsense, missense, silent, or none) to each SnpEff effect, and add a SNPEFF_FUNCTIONAL_CLASS annotation to the INFO field of the output VCF. -Effects are now prioritized according to both biological impact and functional class, instead of impact only. -Many of SnpEff's "low-impact" effects are now classified as "modifiers" with lower priority than every other effect. This includes such "effects" as DOWNSTREAM, UPSTREAM, INTRON, GENE, EXON, and others that really describe the location of the variant rather than its biological effect. This code will be short-lived (likely 1.2-only), as the next version of SnpEff will include most of these features directly. Checking this change into Stable+Unstable instead of Unstable because the current functional class stratification in VariantEval is basically broken and urgently needs to be fixed for production purposes.	2011-09-23 16:06:52 -04:00
Mark DePristo	106a26c42d	Minor file cleanup	2011-09-23 08:25:20 -04:00
Mark DePristo	a9f073fa68	Genotype merging unit tests for simpleMerge -- Remaining TODOs are all for GdA	2011-09-23 08:24:49 -04:00
Eric Banks	a8e0fb26ea	Updating md5 because the file changed	2011-09-23 07:33:20 -04:00
Mark DePristo	c49cc623de	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-09-22 17:26:21 -04:00
Mark DePristo	dab7232e9a	simpleMerge UnitTest for not annotating and annotating to different info key	2011-09-22 17:26:11 -04:00
Mark DePristo	30ab3af0c8	A few more simpleMerge UnitTest tests for filtered vcs	2011-09-22 17:14:59 -04:00
Mark DePristo	5cf82f9236	simpleMerge UnitTest tests filtered VC merging	2011-09-22 17:05:12 -04:00
Mark DePristo	46ca33dc04	TestDataProvider now can be named	2011-09-22 17:04:32 -04:00
Mark DePristo	68da555932	UnitTest for simpleMerge for alleles	2011-09-22 15:16:37 -04:00
Eric Banks	80d7300de4	Unit test was passing in FORMAT as one of the sample names. There used to be a hack in the VCFHeader to check for this and remove it and I couldn't figure out why, but now I know. Hack was removed and now the unit test passes in only the sample names as per the contract.	2011-09-22 13:28:42 -04:00
Eric Banks	9c1728416c	Revert "Updating md5 for fixed file" because this was fixed properly in unstable (but will break SnpEff if put into Stable). This reverts commit 6b4182c6ab3e214da4c73bc6f3687ac6d1c0b72c.	2011-09-22 13:16:42 -04:00
Eric Banks	888d8697b1	Merged bug fix from Stable into Unstable	2011-09-22 13:16:31 -04:00
Eric Banks	15a410b24b	Updating md5 for fixed file	2011-09-22 13:15:41 -04:00
Mark DePristo	ba5f83fee2	start of VariantContextUtils UnitTest -- tests rsID merging	2011-09-22 12:10:39 -04:00
Mark DePristo	a05c959e5a	Empty unit tests for VariantContextUtils -- will be expanded over the day	2011-09-22 11:20:07 -04:00
Mark DePristo	3fdee2b9ed	Merge from stable into unstable	2011-09-22 11:19:43 -04:00
Mark DePristo	c514df6d18	Merge of stable into unstable	2011-09-22 10:34:27 -04:00
Mark DePristo	f81a41b889	Updating MD5s for CombineVariants -- Old version had broken RSIDs, new version is fixed. No longer see rs1234,. as it is now just rs1234	2011-09-22 10:30:25 -04:00
Eric Banks	b8ea9ceb68	Adding integration test that uses the -V:dbsnp binding to make sure it won't fail later on if someone messes with Tribble.	2011-09-21 22:43:31 -04:00
Mark DePristo	6bcfce225f	Fix for dynamic type determination for bgzip files -- GZipInputStream handles bgzip files under linux, but not mac -- Added BlockCompressedInputStream test as well, which works properly on bgzip files	2011-09-21 15:39:19 -04:00
Mark DePristo	74f9ccf6dd	Merge	2011-09-21 11:30:11 -04:00
Mark DePristo	6592972f82	Putative fix for BAQ array out of bounds -- Old code required qual to be <64, which isn't strictly necessary. Now uses the Picard SAMUtils.MAX_PHRED_SCORE constant -- Unittest to enforce this behavior	2011-09-21 11:25:08 -04:00
Mark DePristo	7d11f93b82	Final bugfix for CombineVariants -- Now handles multiple records at a site, so that you don't see records like set=dbsnp-dbsnp-dbsnp when combining something with dbsnp -- Proper handling of ids. If you are merging files with multiple ids for the same record, the ids are merged into a comma separated list	2011-09-21 10:58:32 -04:00
Mark DePristo	a91ac0c5db	Intermediate commit of bugfixes to CombineVariants	2011-09-21 10:15:05 -04:00
David Roazen	b04d8eab55	Merged bug fix from Stable into Unstable	2011-09-20 17:24:14 -04:00
David Roazen	d9ea764611	SnpEff annotator now adds OriginalSnpEffVersion and OriginalSnpEffCmd lines to the header of the VCF output file. This change is urgently required for production, which is why it's going into Stable+Unstable instead of just Unstable. The keys for the SnpEff version and command header lines in the VCF file output by VariantAnnotator (OriginalSnpEffVersion and OriginalSnpEffCmd) are intentionally different from the keys for those same lines in the SnpEff output file (SnpEffVersion and SnpEffCmd), so that output files from VariantAnnotator won't be confused with output files from SnpEff itself.	2011-09-20 16:30:55 -04:00
Mark DePristo	b7511c5ff3	Fixed long-standing bug in tribble index creation -- Previously, on the fly indices didn't have dictionary set on the fly, so the GATK would read, add dictionary, and rewrite the index. This is now fixed, so that the on the fly index contains the reference dictionary when first written, avoiding the unnecessary read and write -- Added a GenomeAnalysisEngine and Walker function called getMasterSequenceDictionary() that fetches the reference sequence dictionary. This can be used conveniently everywhere, and is what's written into the Tribble index -- Refactored tribble index utilities from RMDTrackBuilder into IndexDictionaryUtils -- VCFWriter now requires the master sequence dictionary -- Updated walkers that create VCFWriters to provide the master sequence dictionary	2011-09-20 10:53:18 -04:00
Mark DePristo	aa8afa3899	Merge	2011-09-19 21:16:47 -04:00
Mark DePristo	4ad330008d	Final intervals cleanup -- No functional changes (my algorithm wouldn't work) -- Major structural cleanup (returning more basic data structures that allow us to development new algorithm) -- Unit tests for the efficiency of interval partitioning	2011-09-19 10:19:10 -04:00
Mark DePristo	6ea57bf036	Merge branch 'master' into sgintervals	2011-09-19 09:50:19 -04:00
Guillermo del Angel	7fa1e237d9	Forgot to git stash pop new MD5's for CombineVariants integration test	2011-09-16 12:53:54 -04:00
David Roazen	d78e00e5b2	Renaming VariantAnnotator SnpEff keys This is to head off potential confusion with the output from the SnpEff tool itself, which also uses a key named EFF.	2011-09-15 17:42:15 -04:00
Eric Banks	9dc6354130	Oops didn't mean to touch this test before	2011-09-15 16:55:24 -04:00
Eric Banks	202405b1a1	Updating the FunctionalClass stratification in VariantEval to handle the snpEff annotations; this change really needs to be in before the release so that the pipeline can output semi-meaningful plots. This commit maintains backwards compatibility with the crappy Genomic Annotator output. However, I did clean up the code a bit so that we now use an Enum instead of hard-coded values (so it's now much easier to change things if we choose to do so in the future). I do not see this as the final commit on this topic - I think we need to make some changes to the snpEff annotator to preferentially choose certain annotations within effect classes; Mark, let's chat about this for a bit when you get back next week. Also, for the record, I should be blamed for David's temporary commit the other day because I gave him the green light (since when do you care about backwards compatibility anyways?). In any case, at least now we have something that works for both the old and new annotations.	2011-09-15 13:52:31 -04:00
David Roazen	3db457ed01	Revert "Modified VariantEval FunctionalClass stratification to remove hardcoded GenomicAnnotator keynames" After discussing this with Mark, it seems clear that the old version of the VariantEval FunctionalClass stratification is preferable to this version. By reverting, we maintain backwards compatibility with legacy output files from the old GenomicAnnotator, and can add SnpEff support later without breaking that backwards compatibility. This reverts commit b44acd1abd9ab6eec37111a19fa797f9e2ca3326.	2011-09-14 10:47:28 -04:00
David Roazen	e0c8c0ddcb	Modified VariantEval FunctionalClass stratification to remove hardcoded GenomicAnnotator keynames This is a temporary and hopefully short-lived solution. I've modified the FunctionalClass stratification to stratify by effect impact as defined by SnpEff annotations (high, moderate, and low impact) rather than by the silent/missense/nonsense categories. If we want to bring back the silent/missense/nonsense stratification, we should probably take the approach of asking the SnpEff author to add it as a feature to SnpEff rather than coding it ourselves, since the whole point of moving to SnpEff was to outsource genomic annotation.	2011-09-14 07:09:47 -04:00
David Roazen	1213b2f8c6	SnpEff 2.0.2 support -Rewrote SnpEff support in VariantAnnotator to support the latest SnpEff release (version 2.0.2) -Removed support for SnpEff 1.9.6 (and associated tribble codec) -Will refuse to parse SnpEff output files produced by unsupported versions (or without a version tag) -Correctly matches ref/alt alleles before annotating a record, unlike the previous version -Correctly handles indels (again, unlike the previous version	2011-09-14 07:09:47 -04:00
Guillermo del Angel	5b1bf6e244	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-09-13 17:04:43 -04:00
Guillermo del Angel	c6672f2397	Intermediate (but necessary) fix for Beagle walkers: if a marker is absent in the Beagle output files, but present in the input vcf, there's no reason why it should be omitted in the output vcf. Rather, the vc is written as is from the input vcf	2011-09-13 16:57:37 -04:00
Ryan Poplin	981b78ea50	Changing the VQSR command line syntax back to the parsed tags approach. This cleans up the code and makes sure we won't be parsing the same rod file multiple times. I've tried to update the appropriate qscripts.	2011-09-12 12:17:43 -04:00
Guillermo del Angel	9344938360	Uncomment code to add deleted bases covering an indel to per-sample genotype reporting, update integration tests accordingly	2011-09-10 19:41:01 -04:00
Guillermo del Angel	b399424a9c	Fix integration test affected by non-calling all-zero PL samples, and add a more complicated multi-sample integration test from a phase 1 case, GBR with mixed technologies and complex input alleles	2011-09-09 20:44:47 -04:00
Mark DePristo	72536e5d6d	Done	2011-09-09 15:44:47 -04:00
Ryan Poplin	1953edcd2d	updating Validate Variants deletion integration test	2011-09-09 13:39:08 -04:00
Ryan Poplin	9ada9b3ed4	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-09-09 13:15:36 -04:00
Ryan Poplin	354529bff3	adding Validate Variants integration test with a deletion	2011-09-09 13:15:24 -04:00
Mark DePristo	06cb20f2a5	Intermediate commit cleaning up scatter intervals -- Adding unit tests to ensure uniformity of intervals	2011-09-09 12:56:45 -04:00
Eric Banks	51eb95d638	Missed these tests before	2011-09-09 11:46:37 -04:00
Eric Banks	6ad8943ca0	CompOverlap no longer keeps track of the number of comp sites since it wasn't (and cannot) keeping track of them correctly.	2011-09-09 09:45:24 -04:00
Eric Banks	eaaba6eb51	Confirming that when stratifying by sample in VE the monomorphic sites for a given sample are not counted for the relevant metrics. Adding integration test to cover it.	2011-09-08 13:17:34 -04:00
Ryan Poplin	2636d216de	Adding indel vqsr integration test	2011-09-08 10:38:13 -04:00
Ryan Poplin	9cba1019c8	Another fix for genotype given alleles for indels. Expanding the indel integration tests to include multiallelics and indel records that overlap	2011-09-08 09:25:13 -04:00
Ryan Poplin	e0020b2b29	Fixing PrintRODs. Now has input and only prints out one copy of each record	2011-09-08 08:58:37 -04:00
Mark DePristo	2ded027762	Removed dysfunctional tranches support from VariantEval	2011-09-07 16:09:24 -04:00
Eric Banks	aa9e32f2f1	Reverting Mark's previous commit as per the open discussion. Now the eval modules check isPolymorphic() before accruing stats when appropriate. Fixed the IndelLengthHistogram module not to error out if the indel isn't simple (that would have been bad). Only integration test that needed to be updated was the tranches one based on a separate commit from Mark.	2011-09-07 15:48:06 -04:00
Mark DePristo	9127849f5d	BugFix for unit test	2011-09-07 14:54:10 -04:00
Eric Banks	da9c8ab386	Revving the Tribble jar where the DbsnpCodec class was renamed to OldDbsnpCodec. Updating GATK code accordingly.	2011-09-06 20:39:42 -04:00
Mark DePristo	1aa4b12ff0	Reduced the number of combinations being tested here, which was overkill	2011-09-01 10:42:43 -04:00
Mark DePristo	3af001fff2	Bugfix for file that must not exist on disk	2011-08-29 17:00:10 -04:00
Mark DePristo	1ceb020fae	UnitTests for RScript	2011-08-27 10:50:05 -04:00
Mark DePristo	c0503283df	Spelling fix requires md5 updates	2011-08-26 07:40:44 -04:00
Guillermo del Angel	e618cb1e79	a) Renamed/expanded SelectVariants arguments that choose particular kinds of variants and particular allelic types, now instead of -Indels or -SNPs we can specify for example -selectType [MIXED\|INDEL\|SNP\|MNP\|SYMBOLIC]. To select biallelic, multiallelic variants, use -restrictAllelesTo [BIALLELIC\|MULTIALLELIC]. Corresponding gatkdocs changes. b) More useful AC,AF logging in VariantsToTable with multiallelic sites: instead of logging comma-separated values, log max value by default. Hidden, experimental argument -logACSum to log sum of ACs instead. This is due to extreme slowness of R in parsing strings to tokens and computing max/sum itself (~100x slower than gatk). c) Added integrationtest for new SelectVariants commands	2011-08-24 12:25:50 -04:00
Khalid Shakir	1ecbf05aae	Avoid segfaults due to out of date and possibly abandonded LSF DRMAA implementation when use'ing LSF instead of .combined_LSF_SGE	2011-08-23 23:49:36 -04:00
Khalid Shakir	c4c90c8826	Updates to JobRunners from the Queue developer community and from running the WholeGenomePipeline: - Ability to pass a different resident memory reservation and limits. Useful for large pileups of low pass genome data that sometimes need high -Xmx6g but usually don't exceed 2-3g in actual heap size. - Fixed jobPriority to work for all job runners. Now must be a integer between 0 and 100- even for GridEngine- and will be mapped to the correct values. - Passing parallel environment and job resource requests to LSF and GridEngine. Useful for passing tokens like iodine_io=1 and -pe pe_slots 8 - Refactored GridEngine JobRunner to also provide basic support for other job dispatchers with DRMAA implementations such as Torque/PBS. Should work for basic running but advanced users must pass their own jobNativeArgs from the command line or in customized QScripts until someone maps properties like jobQueue, jobPriority, residentRequest, etc. into a Torque/PBS/etc. dispatcher.	2011-08-22 15:13:27 -04:00
Guillermo del Angel	782453235a	Updated VariantEvalIntegrationTest since there's a new column separating nMixed and nComplex in CountVariants Misc updates to WholeGenomeIndelCalling.scala Bug fix in VariantEval (may be temporary, need more investigation): if -disc option is used in sites-only vcf's then a null pointer exception is produced, caused by recent introduction of -xl_sf options.	2011-08-20 12:24:22 -04:00
Guillermo del Angel	4939648fd4	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-08-20 08:50:43 -04:00
Mark DePristo	ff018c7964	Swapped argument order but not MD5 order	2011-08-19 16:55:56 -04:00
Mark DePristo	b08d63a6b8	Documentation and code cleanup for ClipReads, CallableLoci, and VariantsToTable -- Swapped -o [summary] and -ob [bam] for more standard -o [bam] and -os [summary] arguments. -- @Advanced arguments	2011-08-19 15:06:37 -04:00
Guillermo del Angel	269ed1206c	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-08-19 09:32:20 -04:00
Mark DePristo	a5e279d697	Dynamic typing of vcf.gz files -- CombineVariantsIntegrationTests now use dynamic typing of vcf.gz files -- FeatureManagerUnitTests tests for correctness.	2011-08-19 09:05:11 -04:00
Guillermo del Angel	58560a6d50	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-08-18 16:17:52 -04:00
Guillermo del Angel	3dfb60a46e	Fixing up and refactoring usage of indel categories. On a variant context, isInsertion() and isDeletion() are now removed because behavior before was wrong in case of multiallelic sites. Now, methods isSimpleInsertion() and isSimpleDeletion() will return true only if sites are biallelic. For multiallelic sites, isComplex() will return true in all cases. VariantEval module CountVariants is corrected and an additional column is added so that we log mixed events and complex indels separately (before they were being conflated). VariantEval module IndelStatistics is considerably simplified as the sample stratification was wrong and redundant, now it should work with the VE-generic Sample stratification. Several columns are renamed or removed since they're not really useful	2011-08-18 16:17:38 -04:00
Mark DePristo	c2287c93d7	Cleanup of codec locations. No more dbSNPHelper -- refdata/features now in utils/codecs with the other codecs -- Deleted dbsnpHelper. rsID function now in VCFutils. Remaining code either deleted or put into VariantContextAdaptors -- Many associated import updates due to code move	2011-08-18 10:02:46 -04:00
Eric Banks	b75a1807e3	Adding integration test to cover sample exclusion	2011-08-17 22:40:09 -04:00
David Roazen	53006da9a5	Improved descriptions for the SnpEff annotations in the VCF header (based on Eric's feedback).	2011-08-17 16:09:10 -04:00
Mark DePristo	6e828260a0	Removed -B support. Now explodes with error if -B provided.	2011-08-16 16:13:47 -04:00
David Roazen	9d2cda3d41	Removed a public -> private dependency in our test suite.	2011-08-12 17:29:10 -04:00
Menachem Fromer	9121b8ed65	Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-08-12 12:24:19 -04:00
Menachem Fromer	7ed120361d	Fixed bug that required symbolic alleles to be padded with reference base and added integration test to test parsing and output of symbolic alleles	2011-08-12 12:23:44 -04:00
Eric Banks	27f0748b33	Renaming the HapMap codec and feature to RawHapMap so that we don't get esoteric errors when trying to bind a rod with the name 'hapmap' (since it was also a feature).	2011-08-12 11:11:56 -04:00
Eric Banks	005bd71be3	Working too quickly earlier. Fixing syntax.	2011-08-12 10:29:36 -04:00
Menachem Fromer	c7ca33cbff	Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-08-12 10:12:09 -04:00
Eric Banks	639a01f382	Updating integration test now that VE has been updated	2011-08-12 07:15:08 -04:00
Eric Banks	41f3da75d7	Implementation in VE was confusing 'variant' status vs. 'polymorphic' status. This led to issues because we now match types of eval and comp; specifically, subsetting a VC to a monomorphic sample can't change the 'variant' status of the VC (it's still a variant site or otherwise we'll never match the comps, which breaks GenotypeConcordance). CountVariants really got this wrong. Fixed. VE now passes all integration tests.	2011-08-12 02:22:44 -04:00
Eric Banks	45f973ab1f	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-08-12 00:40:18 -04:00
Eric Banks	eba316621d	Finish moving VE over to new rod system and fixing up the type inconsistency between eval and comp rods. Now the novel count is always 0 under the known stratification. :)	2011-08-12 00:40:08 -04:00
Menachem Fromer	9de06560df	Update to new RodBinding system	2011-08-11 17:54:16 -04:00
Ryan Poplin	f1d1252be2	Fixing syntax of BQSR and UG performance tests.	2011-08-11 17:04:09 -04:00
Ryan Poplin	902eb0c61e	Adding dbsnp annotation back into the UG integration tests	2011-08-11 13:55:03 -04:00
Ryan Poplin	c7b9a9ef0a	Updating UnifiedGenotyper to use the new rod binding system.	2011-08-11 11:02:11 -04:00
Ryan Poplin	79c86e211f	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-08-11 09:59:20 -04:00
Ryan Poplin	ea42ee4a95	Updating BQSR for the new rod binding system.	2011-08-11 09:58:42 -04:00
Mark DePristo	8cdc0cbd9c	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-08-11 08:58:49 -04:00
Mark DePristo	40e06f9afb	Fixed broken RodBinding defaults. -- Verified now to be correct at runtime -- UnitTest covers this -- createTypeDefault now takes a Type, not a Class, so that parameterized classes can have their parameter fetched in the defaults.	2011-08-11 08:58:30 -04:00
Eric Banks	bdb1da30fd	Better interface for getting RodBindings to the VariantAnnotatorEngine and its annotations: pass around an AnnotatorCompatibleWalker (interface) object. Updating VA to use the new rod system.	2011-08-10 22:43:08 -04:00
Eric Banks	07ad8c78a9	More tools moved over. Fixed the VariantContextIntegrationTest which was not useful because the md5s were all removed. In the future, instead of removing md5s (putting it in 'parameterization' mode), you should instead use @Test{enabled=false} since it's easier to track.	2011-08-10 14:24:40 -04:00
Eric Banks	8d14d32a62	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-08-10 13:42:37 -04:00
Eric Banks	749c8bfbcd	Moving more tools over to the new rod system	2011-08-10 13:42:35 -04:00
David Roazen	0497170bc9	SnpEffCodec now implements SelfScopingFeatureCodec so that we no longer have to specify the codec name on the command line for SnpEff files.	2011-08-10 13:12:09 -04:00
Eric Banks	a42f90db11	Moving more tools over to use the standard VC arg collection. Also, while I'm in there, I removed all of the empty references to @Requires given that it's no longer relevant.	2011-08-10 12:20:18 -04:00
Ryan Poplin	c60cf52f73	Updating VQSR for new RodBinding syntax. Cleaning up indel specific parts of VQSR.	2011-08-10 10:20:37 -04:00
Eric Banks	1ea5ec276b	Minor cleanup	2011-08-09 23:28:59 -04:00
Eric Banks	bc2d4f554d	Bringing Indel Realigner up to speed with the new rod binding syntax; now use -known to specify the known indels track.	2011-08-09 23:21:17 -04:00
Eric Banks	489e5cffc1	Missed a few 'variants'	2011-08-09 14:29:15 -04:00
Eric Banks	b20c4d5286	Thanks to Mark for agreeing to transition from 'variants' back to 'variant'. I think I got them all but I've been jumping all around the code, so there might be a straggler or two.	2011-08-09 12:04:55 -04:00
Eric Banks	7afb5c9f1c	More updates to be consistent with the new rod syntax.	2011-08-09 10:11:37 -04:00
Eric Banks	1e490e0dec	Bringing up to speed with new syntax	2011-08-09 09:26:06 -04:00
Eric Banks	70b3daf689	VariantsToVCF is up and running again; integration tests are reenabled (and added one for dbSNP).ant	2011-08-09 03:03:43 -04:00
David Roazen	2efa376619	Made the necessary changes to get SnpEff support working with the new rodbinding system.	2011-08-08 23:29:39 -04:00
David Roazen	b180a1311a	Merge branch 'snpEff'	2011-08-08 22:12:14 -04:00
David Roazen	28d8c8fcbc	Modified the SnpEff integration test to run on a much smaller interval.	2011-08-08 21:51:16 -04:00
David Roazen	a13bc7b929	Added an integration test for the SnpEff annotation support, as well as some extra safety checks and comments.	2011-08-08 20:01:24 -04:00
Mark DePristo	80924d24de	Single positional arguments are now treated as names unless they actually match a tribble feature	2011-08-08 19:26:27 -04:00
Mark DePristo	f8a56bc64b	Merge branch 'master' into rodRefactor	2011-08-08 16:58:18 -04:00
Mark DePristo	f8ad91b16f	Reverting a bunch of bad -B type drops	2011-08-08 16:57:38 -04:00
David Roazen	5e288136e0	Added unit tests for the SnpEff codec, and made minor adjustments to the codec itself.	2011-08-08 16:51:43 -04:00
Eric Banks	d7813db217	Combine Variants was actually outputting invalid VCFs in cases where it was combining Variant Contexts with different alternate alleles: if any of the genotypes had PLs they were no longer valid/correct. Added a check for such cases (the combined VC has more alleles than an original VC) and strip out the PLs when triggered; added integration test to cover it. I also added the check to Select Variants, although it currently doesn't remove unused alleles so it should never trigger. Is there any reason not to strip out unused alleles after a select?	2011-08-08 16:25:35 -04:00
Mark DePristo	4f8fc0f2f1	VCF3 now dynamically determined	2011-08-08 15:05:47 -04:00
Mark DePristo	ba7353c561	Updated IntegrationTests to use the new type free format for VCF files	2011-08-08 15:04:38 -04:00
Mark DePristo	0810c42309	GATK now does dynamic type determination for VCF files Added UnitTests covering all of the cases.	2011-08-08 14:45:46 -04:00
Mark DePristo	e36994e36b	Refactored a FeatureManager class from RMDTrackBuilder New class handles (vastly more cleanly) the db of tribble codecs, features, and names for use throughout the GATK. Added SelfScopingFeatureCodec interface that allows a FeatureCodec to examine a file and determine if the file can be parsed. This is the first step towards allowing the GATK to dynamically determine the type of a RodBinding.	2011-08-08 14:04:46 -04:00
Mark DePristo	e5fde0d16b	Merge branch 'master' into rodRefactor	2011-08-08 10:08:43 -04:00
Mark DePristo	526b524c3c	CombineVariants with new RodBinding. Bugfix -- CombineVariants now uses the new RodBinding syntax, -V / --variants. Passed all integration tests on first run -- Exposed gapping bug in the List<RodBinding<T>> system now fixed. ParserEngine now has a addRodBinding() that is called by RodBindingArgumentTypeDescriptor when it encounters each RodBinding. This allows the system to work with collection types that are recursively parsed by the system.	2011-08-07 20:16:51 -04:00
Ryan Poplin	6693407bd8	Merged bug fix from Stable into Unstable	2011-08-07 17:39:03 -04:00
Mark DePristo	1d8b1bae0a	Need to rename the integration test argument -mask to -maskName	2011-08-07 13:32:26 -04:00
Mark DePristo	ece8f0db5e	Added b37dbSNP129, needed for Queue	2011-08-07 11:26:07 -04:00
Mark DePristo	b0e91f85cf	fix merge from Khalid's Queue fix	2011-08-07 10:33:20 -04:00
Mark DePristo	4d88e72958	Merge remote-tracking branch 'remotes/khalid/rodRefactor' into rodRefactor Conflicts: public/java/src/org/broadinstitute/sting/gatk/walkers/variantutils/SelectVariants.java public/java/test/org/broadinstitute/sting/BaseTest.java	2011-08-07 10:32:27 -04:00
Khalid Shakir	f049461120	Changed @Argument to @Input on input RodBindings. Changed shortname collision with longname. Restored scala builds. Updated HSP to use new syntax.	2011-08-06 20:44:19 -04:00
Mark DePristo	573700d18d	Adding missing import	2011-08-04 21:57:00 -04:00
Mark DePristo	14e43c3382	Final fix to RodBindingUnitTest to reset global counter variable	2011-08-04 21:52:39 -04:00
Mark DePristo	9308fbe3fb	VariantEval Integration Test parameterized for new novelty stratification	2011-08-04 18:08:47 -04:00
Ryan Poplin	98a96f07c1	Updated standard deviation parameter in VQSR to our current recommended value	2011-08-04 14:06:26 -04:00
Mark DePristo	58a60d4901	Merge branch 'master' into rodRefactor	2011-08-04 12:48:56 -04:00
Mark DePristo	d2078f09b2	Minor fixes to ITs	2011-08-04 12:47:55 -04:00
Eric Banks	f10588420c	Fixing path to dbSNP file as the other one was replaced	2011-08-04 12:36:24 -04:00
Mark DePristo	f0d798d47c	Bug fix: call RodBinding.resetNameCounter() in new ParsingEngine() so that we don't magically misnumber arguments in the integration tests where the GATK is only instantiated once.	2011-08-04 12:06:10 -04:00
Mark DePristo	490ca475fc	Replacing hardcoded dbsnp129 with BaseTest variable	2011-08-03 22:15:22 -04:00
Eric Banks	a831af1166	Another misprint when removing the references to -D	2011-08-03 21:29:21 -04:00
Mark DePristo	d0279bb28c	RodBinding names are now defaulting to the ArgumentTypeDescriptor fullname Nearly all of the tools are passing integrationtests	2011-08-03 20:48:11 -04:00
Mark DePristo	d8f1ebf8c6	Parameterized RecalibrationWalkers with clean unstable database	2011-08-03 20:06:00 -04:00
Mark DePristo	41b3840d26	Took latest VEIT and updated to use dbsnp132 vcf	2011-08-03 18:40:32 -04:00
Mark DePristo	0ef85647f7	A working version of a GATKReportDiffableReader for the diffEngine!	2011-08-03 18:21:18 -04:00
Mark DePristo	acbd3d0922	Fixing up integration tests so more	2011-08-03 17:26:35 -04:00
Mark DePristo	8f696c7731	Continuing progress towards RodBinding 1.0 -- Cleaning up old interface to RMDT, docs and contracts added -- Proper type checking for RodBinding for cases where the Tribble type isn't found or is the wrong type	2011-08-03 17:19:28 -04:00
Mark DePristo	800bb97f0b	Removed getFeaturesAsGATKFeature and created createGenomeLoc(Feature) in genomeLocParser Updated all walkers that used the now deleted methods.	2011-08-03 16:04:51 -04:00
Mark DePristo	79e4a8f6d3	Merge Conflicts: private/java/src/org/broadinstitute/sting/gatk/walkers/qc/TestVariantContextWalker.java public/java/src/org/broadinstitute/sting/gatk/walkers/phasing/PhaseByTransmission.java public/java/src/org/broadinstitute/sting/gatk/walkers/variantrecalibration/VariantDataManager.java public/java/src/org/broadinstitute/sting/gatk/walkers/variantutils/SelectVariants.java public/java/src/org/broadinstitute/sting/gatk/walkers/variantutils/VariantValidationAssessor.java public/java/test/org/broadinstitute/sting/gatk/walkers/recalibration/RecalibrationWalkersIntegrationTest.java public/java/test/org/broadinstitute/sting/gatk/walkers/recalibration/RecalibrationWalkersPerformanceTest.java public/java/test/org/broadinstitute/sting/gatk/walkers/varianteval/VariantEvalIntegrationTest.java public/java/test/org/broadinstitute/sting/utils/variantcontext/VariantContextIntegrationTest.java	2011-08-03 15:09:47 -04:00
Mark DePristo	b25140db83	Contracts and documentation for some of RefMetaDataTracker Continuing to fix integration tests that don't pass / run	2011-08-03 13:34:20 -04:00
Eric Banks	3de10b1ef8	Fixing misprint from Ryan's commit	2011-08-03 12:37:50 -04:00
Eric Banks	db2e0aaa1a	Darn, forgot to update unit tests.	2011-08-03 12:31:08 -04:00
Eric Banks	020b2408a8	Adding integration test for left alignment of indels	2011-08-03 12:19:44 -04:00
Eric Banks	5dc324ff35	Dealing with merge confict	2011-08-03 11:03:47 -04:00
Eric Banks	7c89fe01b3	Instead of having the padded reference base be some hackish attribute it is now an actual variable in the Variant Context class. More importantly, we now always require that it be present when padding is necessary - and validate as such upon construction of the VC. This cleans up the interface significantly because we no longer require that a reference base be passed in when writing a VC/VCF record.	2011-08-03 11:00:36 -04:00
Mark DePristo	d9bc673ff2	Fixed bad constructor in RMDTUnitTest	2011-08-03 09:42:43 -04:00
Khalid Shakir	5dcac7b064	GATKReport v0.2: - Floating point column widths are measured correctly - Using fixed width columns instead of white space separated which allows spaces embedded in cell values - Legacy support for parsing white space separated v0.1 tables where the columns may not be fixed width - Enforcing that table descriptions do not contain newlines so that tables can be parsed correctly Replaced GATKReportTableParser with existing functionality in GATKReport	2011-08-03 00:24:47 -04:00
Mark DePristo	2874835997	Bug fix for type checking RodBindings Now compares the feature class not the codec class. UnitTests improvements integrationtests on their way to actually running	2011-08-02 22:25:41 -04:00
Mark DePristo	b5e843f8f0	Approaching the end for the new RodBinding system -- support for explicit naming of bindings (-X:name,type x) -- support for automatic naming of bindings in lists (-X:vcf foo.vcf -X:vcf bar.vcf will generate internal names X and X2) -- ParserEngineUnitTest expanded to cover all of the Rodbinding cases -- RodBindingUnitTest tests all of the low-level accessors -- Parsing engine throws UserExceptions when bad bindings are provided on the command line	2011-08-02 22:00:06 -04:00
Mark DePristo	83891271b5	--variants throughout integrationtests	2011-08-02 20:28:47 -04:00
Mark DePristo	3a27a25cfc	Validates that the tribble binding provides the right object types at startup Tests to ensure this remains working	2011-08-02 20:11:24 -04:00
Ryan Poplin	b2cde87378	Removing --DBSNP syntax from BQSR integration tests	2011-08-02 15:34:38 -04:00
Mark DePristo	e4a67f3df1	RefMetaDataTracker has complete set of get() functions for List<RodBinding<T>> Including unit tests	2011-08-02 14:28:35 -04:00
Mark DePristo	03741fb640	Merge branch 'master' into rodRefactor Conflicts: public/java/src/org/broadinstitute/sting/gatk/walkers/annotator/VariantAnnotatorEngine.java public/java/test/org/broadinstitute/sting/gatk/walkers/indels/IndelRealignerIntegrationTest.java public/java/test/org/broadinstitute/sting/gatk/walkers/indels/IndelRealignerPerformanceTest.java public/java/test/org/broadinstitute/sting/utils/variantcontext/VariantContextIntegrationTest.java	2011-08-02 14:21:58 -04:00
Mark DePristo	a366f9a18d	Updating tools to use the RodBinding<T> syntax	2011-08-02 14:05:51 -04:00
Eric Banks	b9d0d2af22	Adding back temporarily removed integration test now that the file permissions have been fixed.	2011-08-02 12:39:11 -04:00
Eric Banks	1c387848de	No more use of -D in the integration tests but instead stick with VCFs only. Since all of these tests were duplicated (one each for dbSNP format and for VCF), we don't actually lose coverage in the integration tests.	2011-08-02 10:39:50 -04:00
Eric Banks	2c5e526eb7	Don't use the mismatch fraction by default in the RealignerTargetCreator (since it's only useful when using SW in the indel realigner). Also, no more use of -D but instead move over to using VCFs. One integration test is temporarily commented out while I wait for a VCF file to get fixed.	2011-08-02 10:34:46 -04:00
Eric Banks	5626199bb6	The Unified Genotyper now does NOT emit SLOD/SB by default; to compute SB use --computeSLOD	2011-08-02 10:14:21 -04:00
Mark DePristo	8b1adb8c95	Removed getVariantContext() code	2011-08-01 13:41:09 -04:00
Mark DePristo	f69bff5dd6	Commented out, because these fail the now removed dbSNP conversion.	2011-08-01 13:34:25 -04:00
Mark DePristo	7b07c4e04e	RefMetaDataTracker now has get() methods accepting RodBindings RodBinding no longer duplicates the get() methods in RMDT. This is just an object now that connects the command line system to the RMDT. Updated programs to use new style Added UnitTests for the RodBinding accessors.	2011-07-30 15:34:11 -04:00
Mark DePristo	3b799db61a	RefMetaDataTracker cleanup and unit tests You know have to provide an explicit list of RODRecordLists upfront to the constructor. RefMetaDataTracker is now immutable. Changes in engine to incorporate these differences Extensive UnitTests for RefMetaDataTracker now.	2011-07-29 13:23:17 -04:00
Mark DePristo	39b4e76fde	Continuing refactoring of RefMetaDataTracker. On the path towards converging getVariantContext() and getValues() in tracker so that we can have a single approach to get values from RODs with the new RodBinding() types	2011-07-28 17:48:28 -04:00
Mark DePristo	7c5c656b46	Uncovered fundamental accounting bug in VariantEval. Will be fixed by dev. team Problem is that Novelty sees multiple records at a site (SNP, INDEL) to calculate whether a site is novel, but VariantEvalWalker makes an arbitrary decision which to use for analysis and CompOverlap may not see a comp record of the same type as eval. So you get lines where the stratification is known but there are 10 novel sites!	2011-07-28 14:19:27 -04:00
Eric Banks	33b32c4211	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-07-28 13:57:22 -04:00
Eric Banks	7a2a65155f	Merged bug fix from Stable into Unstable	2011-07-28 13:56:43 -04:00
Eric Banks	1afc49a297	There are some really 'interesting' (but apparently valid) records in the Mus musculus dbSNP file. Generalized the handling of complex cases in the dbSNP adaptor to handle it all. I just grabbed the actual Mus musculus dbSNP file as a test, ran it whole genome, and confirmed that we finally produce a valid VCF on it. Should be the last commit needed on this adaptor.	2011-07-28 13:55:58 -04:00
Mark DePristo	c83f9432eb	Cleaned up RefMetaDataTracker Renamed many functions to more clearly state what they are actually doing Removed unnecessary / unused functionality, reducing interface complexity Updated all uses of this code in GATK Added generic, type-safe accessors to RefMetaDataTracker such as public <T> List<T> getValues(final String name, Class<T> clazz) Added standard refMetaDataTracker accessors to RodBinding, so you can do everything you can for generic rods with the tracker directly with with the RodBinding	2011-07-27 23:25:52 -04:00
Eric Banks	1865211b6d	Merged bug fix from Stable into Unstable	2011-07-27 22:52:06 -04:00
Eric Banks	6230315ff2	Along with my half-written commit message from earlier, I also forgot to commit the integration test updates. This is what happens when you try to do things 30 seconds before you leave for the day. To finish up from before: complex events weren't being padded with the reference base as per the VCF spec. They are now.	2011-07-27 22:51:21 -04:00
Mark DePristo	f3ad4ec94b	Removed annoying FastaSequenceIndexBuilderProgressListener infrastructure that was just a boolean switch on whether to print progress or not.	2011-07-27 22:06:23 -04:00
Eric Banks	ff31fa7990	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-07-27 16:15:23 -04:00
Mark DePristo	15be383d5b	Merge branch 'master' into rodRefactor	2011-07-27 15:36:49 -04:00
Mark DePristo	38a2518668	Merge branch 'master' into rodRefactor	2011-07-27 15:34:54 -04:00
Kiran V Garimella	405e521d44	Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-07-26 17:56:48 -04:00
Kiran V Garimella	92a11ed8dc	Updated MD5 for PhaseByTransmissionIntegrationTest	2011-07-26 17:52:25 -04:00
Mark DePristo	f6a5e0e36a	Go for global integrationtest path first, if possible.	2011-07-26 17:35:30 -04:00
Eric Banks	a53aeb75ab	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-07-25 15:10:35 -04:00
Eric Banks	a29554e565	Removing the Genomic Annotator and its supporting classes	2011-07-25 15:10:25 -04:00
Mark DePristo	3afcb3415d	Max of 1000 records will be loaded and compared to avoid heap size problem.	2011-07-25 14:58:31 -04:00
Mark DePristo	2a51543693	Actually should have been gone...	2011-07-25 13:27:42 -04:00
Mark DePristo	ebfd8df06c	Restoring accidentially deleted unit test	2011-07-25 13:25:30 -04:00
Mark DePristo	f3049fba63	refdata directory cleanup Removing unused files RODRecordIterator, ReferenceOrderedData, QueryableTrack, RMDTrackCreationException, GATKFeatureIterator, ReferenceOrderedDataUnitTest Refactored dbSNP and refseq utilities to be closer to the other files implementing these features	2011-07-25 13:21:52 -04:00
Kiran V Garimella	bbb8473f03	Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-07-25 10:59:00 -04:00
Mark DePristo	1d3bcce2c4	Merge branch 'master' into NoDistributedGATK	2011-07-23 20:04:50 -04:00
Kiran V Garimella	0b36b6540f	Merge branch 'laptop'	2011-07-23 01:44:54 -04:00
Kiran V Garimella	e23cb27451	Modified MD5 to account for the triple hets that shouldn't be phased	2011-07-23 01:44:44 -04:00
Kiran V Garimella	f366124778	Merge branch 'laptop'	2011-07-23 01:25:36 -04:00
Kiran V Garimella	45f2ca8d99	Changed MD5 to reflect latest changes to PhaseByTransmission.	2011-07-23 01:21:07 -04:00
Kiran V Garimella	b5deff48e6	Merge branch 'laptop'	2011-07-23 00:56:50 -04:00
Kiran V Garimella	5638017137	Removed the nofilters argument specification in the integrationtest	2011-07-23 00:56:23 -04:00
Kiran V Garimella	ffa361f57f	Merge branch 'laptop'	2011-07-23 00:50:38 -04:00
Kiran V Garimella	9417ba8c2c	Modified to accept multi-sample VCFs, removed the application of filters, and changed transmission probability field to be a genotype field rather than an INFO field.	2011-07-23 00:48:26 -04:00
Matt Hanna	f50145b872	Reinitialize random seed in the bwa bindings from the fixed seed stored in the BWA support files every time the support files are loaded.	2011-07-22 13:41:53 -04:00
Mark DePristo	172b35372b	Moved all of the distributed GATK code to archive.	2011-07-22 09:20:32 -04:00
Khalid Shakir	8b8f121cfb	Merge branch 'master' of ssh://gsa3.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-07-21 23:01:11 -04:00
Khalid Shakir	59eb1f4663	Memory limits changed from Int to Double. Updated LSF calls to read memory units from config along with tweaks to select hosts. Moved some common code from GridEngine and LSF to super classes.	2011-07-21 22:57:18 -04:00
Matt Hanna	7054c5342f	When using the BWA bindings, you have to explicitly call close() to get the bindings to release memory. It may or may not be possible to implicitly close triggered by the GC; I'll add a JIRA.	2011-07-21 12:13:29 -04:00
Christopher Hartl	15610ce0c3	Per Matt's request, disabling BWA-based integration tests so he can assess bamboo memory usage.	2011-07-21 11:04:22 -04:00
Mark DePristo	d31b176e15	Removed GATK use of distributed parallelism framework. Moved distributed GATK prototype code into distributedutils, separating from threading package	2011-07-20 16:26:09 -04:00
Christopher Hartl	5d706c9e92	Merge branch 'master' of ssh://chartl@tin.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable Removing PSP and CSM Conflicts: public/java/src/org/broadinstitute/sting/gatk/walkers/sequenom/CreateSequenomMask.java public/java/src/org/broadinstitute/sting/gatk/walkers/sequenom/PickSequenomProbes.java	2011-07-19 20:25:33 -04:00
Christopher Hartl	92c7cfa1c8	BWA bindings and tests moved to public (was required for ValidationAmplicons) Integration tests for ValidationAmplicons. New argument to disable BWA, lowercase letters only for repetitiveness instead.	2011-07-19 20:11:31 -04:00
David Roazen	baae381acb	Revert "Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable" This reverts commit 039a6bb01f345322ce2be50ae3634308bb24e77e, reversing changes made to b9c9973d1c638dfc9f8c19b5eb845e99844f9d29.	2011-07-19 18:38:53 -04:00
Mark DePristo	8f0badc52b	Updating md5s, as the diffobjects walker now emits the summary in reverse order.	2011-07-18 15:44:21 -04:00
Eric Banks	83ba2c066a	Making it deterministic	2011-07-18 13:59:02 -04:00
Eric Banks	80b5c5261a	CombineVariants no longer combines records of different types. So now when combining SNP and indel callsets, overlapping calls get their own records. Useful for Khalid in the pipeline. For those interested, it turns out the previous behavior was doing the wrong thing occasionally (and this was even captured in the integration tests).	2011-07-18 13:42:45 -04:00
Mark DePristo	51b0dd01c3	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-07-18 10:47:29 -04:00
Mark DePristo	d6e2e89f99	Walker test system refactoring. All MD5DB related functions are now in MD5DB.java. System has the concept of a local and a global MD5 db. The local one is like it operated previously. The global one lives in /humgen/gsa-hpprojects/GATK/data/integrationtests. If the system can find this directory then MD5s will also be read / written to this location. This means that gsabamboo will print differences as appropriate. And all users will in effect have access to a complete history of MD5 file results. A few minor code reshuffles changed VariantRecalibration and VCFHeader test files.	2011-07-18 10:46:01 -04:00
Mark DePristo	6f26c07b85	Removed the SpecificDifference class. Now Difference classes always have the option to remember specific master and test values. This means that all summarized differences carry with them specific examples of their differences. Consequently, now even summarized differences give at least one example of the specific difference, even when the count of the difference is > 1. Unit tests updated. Added DiffObjects integrationtest. VCFDiffableReader now specifically reads the first line of the VCF file to capture the version number.	2011-07-18 10:42:35 -04:00
Kiran V Garimella	ac9c66138d	Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-07-18 00:20:33 -04:00
Kiran V Garimella	824100e57f	Corrected typo in MergeAndMatchHaplotypes integration test	2011-07-17 22:50:54 -04:00
Kiran V Garimella	8167aba601	Moved (poorly named) MergeAndMatchHaplotypes to public. Added integration test	2011-07-17 22:47:32 -04:00
Kiran V Garimella	afb506e128	Added MD5s for PhaseByTransmission integration tests	2011-07-17 21:55:33 -04:00
Kiran V Garimella	558e197989	Integration test for PhaseByTransmission	2011-07-17 21:25:08 -04:00
Mark DePristo	9ca9cf52ac	Uncommenting a stray commented test.	2011-07-17 15:38:33 -04:00
Mark DePristo	4db2b13e9e	Rev tribble. Just added more documentation for diffEngine and pointer to new wiki: http://www.broadinstitute.org/gsa/wiki/index.php/DiffEngine	2011-07-17 13:05:04 -04:00
Mark DePristo	eacf205f40	Tests needed to be updated to reflect the code reorg of tribble.	2011-07-16 09:22:34 -04:00
Mark DePristo	c0bbeb23ba	Now providing more information when the index on the fly isn't equal to the one created by reading the file from disk.	2011-07-14 15:12:28 -04:00
Eric Banks	9540df6998	Oops, forgot to update unit test	2011-07-14 14:00:19 -04:00
Eric Banks	bb0e3a26fc	Added integration test for VCF writing. Also, bug fix for writing the GT-free records.	2011-07-13 14:57:21 -04:00
Eric Banks	6a431da554	Don't output source and ref header lines anymore. Short-term motivation for this is that I'd like this tool when run on a VCF to emit the exact same VCF. Long-term motivation is that these tags should be output by the VCF writer itself for all tools.	2011-07-13 14:40:01 -04:00
Eric Banks	969227c657	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-07-13 10:01:28 -04:00
Eric Banks	797c50e689	Fixing integration tests I broke yesterday; removing batch merging test since we don't support that anymore.	2011-07-13 10:01:23 -04:00
Ryan Poplin	837fb8f689	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-07-12 15:39:26 -04:00
Ryan Poplin	5077c94d85	Adding MappingQualityUnavailableReadFilter to the SNP and indel CountCovariates	2011-07-12 15:39:07 -04:00
Mark DePristo	01fd6a6949	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-07-12 15:20:44 -04:00
Mark DePristo	ccedd6ff4c	Difference is now the general form -- used to be SummarizedDifference. The old Difference class is now a subclass of Difference that includes pointers to specific the master and test DiffElements. Added a size() function that calculates the number of elements tree from a DiffElement.	2011-07-12 15:20:28 -04:00
Eric Banks	a2597e7f00	This commit incorporates several different changes that each pretty much break all the VCF-based integration tests, so I bunched them all together. We now officially emit VCF4.1 files (woo hoo), which means that the VCF headers are now all different (header version is 4.1 plus counts for some of the annotations are 'A' or 'G'). Also, I've added a Read Filter for reads with MQ=255 ('unavailable' in the SAM spec) and have applied this to the UG and the RMS MQ annotation.	2011-07-12 14:11:53 -04:00
Mark DePristo	05212aea62	reader now takes an argument for the maximum number of elements to read from the file.	2011-07-12 08:53:19 -04:00
Mark DePristo	f313e14e4e	Now deletes the dump directory on ant clean Moving diffengine tests from private to public	2011-07-12 08:50:58 -04:00
Mark DePristo	5e593793af	DiffEngine utility function simpleDiffFiles printSummaryReport now uses GATKReport for nice formating Moved print formatting arguments into inner class provided to printing functions themselves, not the class BAMDiffableReader only reads 1000 entries to avoid performance issue. Work around for BAM files with non-unique names Uncommented all of the incorrectly commented out CombineVariants integrationtests BaseTest now uses DiffEngine to provide inline differences to VCF and BAM files	2011-07-11 23:10:27 -04:00
Mark DePristo	ccf34f7e45	(1) Added very useful helper class TestDataProvider to BaseTest that making creating data providers for TestNG far easier (2) DiffEngine now officially working with with summaries. Extensive UnitTests all around!	2011-07-06 21:57:22 -04:00
Ryan Poplin	17ff5bb094	Variant records coming out of the VQSR are now annotated with which input annotation was most divergent from the Gaussian mixture model. This gives a general sense for why each variant was removed from the callset.	2011-07-02 09:55:35 -04:00
Khalid Shakir	b6bc64a0c8	Cleanup of the utils.broad package. Using Picard IoUtils on sample names.	2011-07-01 20:47:03 -04:00
David Roazen	d647ea4fdc	Long-delayed change to CachingIndexedFastaSequenceFile. Made the cache non-static to avoid problems when multiple references are used within the same thread (eg., during integration tests). This should kill the intermittent IndelRealignerIntegrationTest failures.	2011-07-01 16:04:30 -04:00
Eric Banks	761347b8d5	The VariantContext utility method used by SelectVariants wasn't checking the filter status (unfiltered vs. passing filters) and always returned a VC that was passing filters. This is fixed and the md5 from the VCF Streaming test has been re-updated.	2011-06-30 15:26:09 -04:00
Mauricio Carneiro	867056af51	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/stable	2011-06-30 15:03:18 -04:00
Mark A. DePristo	defa3cfe85	Moved around private walkers into appropriate directories in private gatk.walkers. Moved a few public walkers into private qc package, and some private qc walkers into the public directory. Removed several obviously broken and/or unused walkers.	2011-06-30 14:59:58 -04:00
Mauricio Carneiro	2cb1376ed0	VCFStreaming was failing integration tests because now select variants outputs the samples in alphabetical order, instead of random as before. Fixed the MD5.	2011-06-30 14:55:39 -04:00
Eric Banks	352c38fc0b	Updated to reflect dbsnp conversion fix	2011-06-30 11:55:56 -04:00
David Roazen	f18fffd625	Fixing broken paths to the testdata directory throughout the codebase.	2011-06-29 17:36:47 -04:00
Eric Banks	33c67a139c	Wrong package; this should have been moved when VC got moved in from Tribble	2011-06-29 14:56:02 -04:00
Guillermo del Angel	dee10140dd	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/stable	2011-06-29 13:58:04 -04:00
Eric Banks	8586c86bc4	My commit from last week to fix the old dbsnp rod conversion only worked for locus traversals. Updated now to work for all traversals.	2011-06-29 13:56:37 -04:00
Guillermo del Angel	f736a1d61b	Updated md5's from previous checkin	2011-06-29 13:37:15 -04:00
David Roazen	3c9497788e	Reorganized the codebase beneath top-level public and private directories, removing the playground and oneoffprojects directories in the process. Updated build.xml accordingly.	2011-06-28 06:55:19 -04:00

... 9 10 11 12 13 ...

1030 Commits (5a4e2a5fa4d7ee7c6d7773d261eebc8a3ff349f1)