gatk-3.8

Commit Graph

Author	SHA1	Message	Date
Mark DePristo	5443d3634a	Again, fixing the add call when we really mean replace -- Updating MD5s for UG to reflect that what was previously called ./.:.:10:0,0,0 is now just ./. Eric will fix long-standing bug in QD observed from this change -- VFW MD5s restored to their old correct values. There was a bug in my implementation to caused the genotypes to not be parsed from the lazy output even through the header was incorrect.	2011-11-21 19:15:56 -05:00
Mauricio Carneiro	5ad3dfcd62	BugFix: byte overflow in SyntheticRead compressed base counts * fixed and added unit test	2011-11-21 17:11:50 -05:00
Mark DePristo	2c501364b8	GenotypesContext no longer have immutability in constructor -- additional bug fixes throughout VariantContext and GenotypesContext objects	2011-11-21 14:34:31 -05:00
David Roazen	1296dd41be	Removing the legacy -L "interval1;interval2" syntax This syntax predates the ability to have multiple -L arguments, is inconsistent with the syntax of all other GATK arguments, requires quoting to avoid interpretation by the shell, and was causing problems in Queue. A UserException is now thrown if someone tries to use this syntax.	2011-11-21 13:18:53 -05:00
Mark DePristo	2e9ecf639e	Generalized interface to LazyGenotypesContext -- Now you provide a LazyParsing object -- LazyGenotypesContext now knows nothing about the VCF parser itself. The parser holds all of the necessary data to parse the VCF genotypes when necessarily, and the LGC only has a pointer to this object -- Using new interface added LazyGenotypesContext to unit tests with a simple lazy version -- Deleted VCFParser interface, as it was no longer necessary	2011-11-21 09:30:40 -05:00
Mark DePristo	f0ac588d32	Extensive unit test for GenotypeContextUnitTest -- Currently only tests base class. Adding subclass testing in a bit	2011-11-20 18:28:01 -05:00
Mark DePristo	9cb3fe3a59	Vastly better way of doing on-demand genotyping loading -- With our GenotypesContext class we can naturally create a LazyGenotypesContext subclass that does the on-demand loading. -- This new class was replaced all of the old, complex functionality -- Better still, there were many cases were the genotypes were being loaded unnecessarily, resulting in efficiency. This was detected because some of the integration tests changed as the genotypes were no longer being parsing unnecessarily -- Misc. bug fixes throughout the system -- Bug fixes for PhaseByTransmission with new GenotypesContext	2011-11-20 08:23:09 -05:00
Mark DePristo	7d09c0064b	Bug fixes and code cleanup throughout -- chromosomeCounts now takes builder as well, cleaning up a lot of code throughout the codebase.	2011-11-19 18:40:15 -05:00
Mark DePristo	707bd30b3f	Should have been @BeforeMethod	2011-11-19 16:10:09 -05:00
Mark DePristo	8f7eebbaaf	Bugfix for pError not being checked correctly in CommonInfo -- UnitTests to ensure correct behavior -- UnitTests to ensure correct behavior for pass filters vs. failed filters vs. unfiltered	2011-11-19 15:58:59 -05:00
Mark DePristo	b7b57ef39a	Updating MD5 to reflect canonical ordering of calculation -- We should no longer have md5s changing because of hashmaps changing their sort order on us -- Added GenotypeLikelihoodsUnitTests -- Refactored ExactAFCaclculation to put the PL -> QUAL calculation in the GenotypeLikelihoods class to avoid the code copy.	2011-11-19 15:57:33 -05:00
Mark DePristo	73119c8e3c	Merge with master -- A few bug fixes	2011-11-19 09:56:06 -05:00
Mark DePristo	f685fff79b	Killing the final versions of old new VariantContext interface	2011-11-18 21:32:43 -05:00
Mark DePristo	6cf315e17b	Change interface to getNegLog10PError to getLog10PError	2011-11-18 21:07:30 -05:00
Matt Hanna	8bb4d4dca3	First pass of the asynchronous block loader. Block loads are only triggered on queue empty at this point. Disabled by default (enable with nt:io=?).	2011-11-18 15:02:59 -05:00
Mark DePristo	f54afc19b4	VariantContextBuilder -- New approach to making VariantContexts modeled on StringBuilder -- No more modify routines -- use VariantContextBuilder -- Renamed isPolymorphic to isPolymorphicInSamples. Same for mono -- getChromosomeCount -> getCalledChrCount -- Walkers changed to use new VariantContext. Some deprecated new VariantContext calls remain -- VCFCodec now uses optimized cached information to create GenotypesContext.	2011-11-18 12:39:10 -05:00
Mark DePristo	7490dbb6eb	First version of VariantContextBuilder	2011-11-18 11:06:15 -05:00
Mark DePristo	fa454c88bb	UnitTests for VariantContext for chrCount, getSampleNames, Order function -- Major change to how chromosomeCounts is computed. Now NO_CALL alleles are always excluded. So ChromosomeCounts(A/.) is 1, the previous result would have been 2. -- Naming changes for getSamplesNameInOrder()	2011-11-17 20:37:22 -05:00
Mark DePristo	02f22cc9f8	No more VC integration tests. All tests are now unit tests	2011-11-17 15:33:09 -05:00
Khalid Shakir	c50274e02e	During flanking interval creation merging overlapping flanks so that on scatter the list doesn't accidentally genotype the same site twice. Moved flanking interval utilies to IntervalUtils with UnitTests.	2011-11-17 13:56:42 -05:00
Eric Banks	bad19779b9	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-11-17 13:29:43 -05:00
Eric Banks	16a021992b	Updated header description for the INFO and FORMAT DP fields to be more accurate.	2011-11-17 13:17:53 -05:00
Mark DePristo	7e66677769	Expanded UnitTests for VariantContext Tests for -- getGenotype and getGenotypes -- subContextBySample -- modify routines	2011-11-16 20:45:15 -05:00
Mauricio Carneiro	72f00e2883	Merging Roger's Unit tests for Reduce Reads from RR repository	2011-11-16 17:26:49 -05:00
Mark DePristo	aa0610ea92	GenotypeCollection renamed to GenotypesContext	2011-11-16 16:24:05 -05:00
Mark DePristo	974daaca4d	V13 version in archive. Can you pulled out wholesale for performance testing	2011-11-16 16:08:46 -05:00
Mark DePristo	101ffc4dfd	Expanded, contrastive VariantContextBenchmark -- Compares performance across a bunch of common operations with GATK 1.3 version of VariantContext and GATK 1.4 -- 1.3 VC and associated utilities copied wholesale into test directory under v13	2011-11-16 13:35:16 -05:00
Mark DePristo	e56d52006a	Continuing bugfixes to get new VC working	2011-11-16 10:39:17 -05:00
Eric Banks	c2ebe58712	Merge remote-tracking branch 'Laurent/master'	2011-11-16 09:34:47 -05:00
David Roazen	0d163e3f52	SnpEff 2.0.4 support -Modified the SnpEff parser to work with the SnpEff 2.0.4 VCF output format -Assigning functional classes and effect impacts now handled directly by SnpEff rather than the GATK -Removed support for SnpEff 2.0.2, as we no longer trust the output of that version since it doesn't exclude effects associated with certain nonsensical transcripts. These effects are excluded as of 2.0.4. -Updated unit and integration tests This support is based on a release-candidate of SnpEff 2.0.4, and so is subject to change between now and the next GATK release.	2011-11-15 18:36:22 -05:00
Mark DePristo	df415da4ab	More bug fixes on the way to passing all tests	2011-11-15 17:38:12 -05:00
Laurent Francioli	fb685f88ec	Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-11-15 16:23:53 -05:00
Mark DePristo	460a51f473	ID field now stored in the VariantContext itself, not the attributes	2011-11-15 14:56:33 -05:00
Eric Banks	7fada320a9	The right fix for this test is just to delete it.	2011-11-15 14:53:27 -05:00
Mark DePristo	233e581828	Merging in Master	2011-11-15 09:28:24 -05:00
Mark DePristo	6e1a86bc3e	Bug fixes to VariantContext and GenotypeCollection	2011-11-15 09:21:30 -05:00
Roger Zurawicki	284430d61d	Added more basic UnitTests for ReadClipper hardClipByReadCoordinatesWorks hardClipLowQualTailsWorks	2011-11-15 00:13:52 -05:00
Roger Zurawicki	8e91e19229	Merge branch 'master' of ssh://nickel/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-11-15 00:13:37 -05:00
Mauricio Carneiro	cde829899d	compress Reduce Read counts bytes by offset compressed the representation of the reduce reads counts by offset results in 17% average compression in final BAM file size. Example compression --> from : 10, 10, 11, 11, 12, 12, 12, 11, 10 to: 10, 0, 1, 1,2, 2, 2, 1, 0	2011-11-14 18:30:24 -05:00
Mark DePristo	4ff8225d78	GenotypeMap -> GenotypeCollection part 3 -- Test code actually builds	2011-11-14 17:51:41 -05:00
Mark DePristo	f0234ab67f	GenotypeMap -> GenotypeCollection part 2 -- Code actually builds	2011-11-14 17:42:55 -05:00
Mark DePristo	2e9d5363e7	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-11-14 15:32:06 -05:00
Mark DePristo	1fbdcb4f43	GenotypeMap -> GenotypeCollection	2011-11-14 15:32:03 -05:00
Eric Banks	7b2a7cfbe7	Transfer headers from the resource VCF when possible when using expressions. While there, VA was modified so that it didn't assume that the ID field was present in the VC's info map in preparation for Mark's upcoming changes.	2011-11-14 14:31:27 -05:00
Mark DePristo	9b5c79b49d	Renamed InferredGeneticContext to CommonInfo -- I have no idea why I named this InferredGeneticContext, a totally meaningless term -- Renamed to CommonInfo. -- Made package protected, as no one should use this outside of VariantContext and Genotype -- UGEngine was using IGC constant, but it's now using the public one in VariantContext.	2011-11-14 14:28:52 -05:00
Mark DePristo	077397cb4b	Deleted MutableVariantContext -- All methods that used this capable now use VariantContext directly instead	2011-11-14 14:19:06 -05:00
Mark DePristo	79987d685c	GenotypeMap contains a Map, not extends it -- On path to replacing it with GenotypeCollection	2011-11-14 12:55:03 -05:00
Laurent Francioli	1347beef40	Merge branch 'PhaseByTransmission'	2011-11-14 11:31:28 +01:00
Laurent Francioli	6881d4800c	Added Integration tests for Phasing by Transmission	2011-11-14 10:47:51 +01:00
Laurent Francioli	34acf8b978	Added Unit tests for new methods in GenotypeLikelihoods	2011-11-14 10:47:02 +01:00
Roger Zurawicki	1202a809cb	Added Basic Unit Tests for ReadClipper Tests some but not all functions Some tests have been disabled because they are not working	2011-11-13 22:27:49 -05:00
Mark DePristo	fee9b367e4	VariantContext genotypes are now stored as GenotypeMap objects -- Enables further sophisticated optimizations, as this class can be smarter about storing the data and will directly support operations like subset to samples -- All instances in the gatk that used Map<String, Genotype> now use GenotypeMap type. -- Amazingly, there were many places where HashMap<String, Genotype> is used, so that the order of the genotypes is technically undefined and could be dangerous. Now everything uses GenotypeMap with a specific ordering of samples (by name) -- Integrationtests updated and all pass	2011-11-11 15:00:35 -05:00
Mark DePristo	4938569b3a	More general handling of parameters for VariantContextBenchmark	2011-11-11 10:22:19 -05:00
Mark DePristo	e216e85465	First working version of VariantContextBenchmark	2011-11-11 09:56:00 -05:00
Mark DePristo	ee40791776	Attributes are now Map<String,Object> not Map<String,?> -- Allows us to avoid an unnecessary copy when creating InferredGeneticContext (whose name really needs to change).	2011-11-11 09:55:42 -05:00
Mark DePristo	153e52ffed	VariantEvalIntegrationTest for IntervalStratification	2011-11-10 14:10:39 -05:00
Mauricio Carneiro	d00b2c6599	Adding a synthetic read for filtered data * Generalized the concept of a synthetic read to cread both running consensus and a synthetic reads of filtered data. * Synthetic reads can now have deletions (but not insertions) * New reduced read tag for filtered data synthetic reads (RF) * Sliding window header now keeps information of consensus and filtered data * Synthetic reads are created simultaneously, new functionality is controlled internally by addToSyntheticReads	2011-11-09 20:16:22 -05:00
Eric Banks	02d5e3025e	Added integration test for intervals from bed file	2011-11-09 15:34:19 -05:00
Ryan Poplin	94dc447a70	Merged bug fix from Stable into Unstable	2011-11-07 15:26:35 -05:00
Ryan Poplin	0b181be61f	Bug fix in SelectVariants when using a discordance track but no sample specifications. Added integration test to test this.	2011-11-07 15:25:16 -05:00
Eric Banks	759f4fe6b8	Moving unclaimed walker with bad integration test to archive	2011-11-07 13:16:38 -05:00
Eric Banks	3517489a22	Better --sample selection integration test for VE. The previous one would return true even if --sample was not working at all.	2011-11-06 01:07:49 -04:00
Eric Banks	ad57bcd693	Adding integration test to cover using expressions with IDs (-E foo.ID)	2011-11-05 23:53:15 -04:00
Mauricio Carneiro	e89ff063fc	GATKSAMRecord refactor The GATK engine will now provide a GATKSAMRecord to all tools which incorporates the functionality used by the GATK to the bam file (ReadGroups, Reduced Reads, ...). * No tools should create SAMRecord anymore, use GATKSAMRecord instead *	2011-11-03 15:43:26 -04:00
Eric Banks	e8bceb1eaa	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-11-02 21:13:54 -04:00
Eric Banks	78a00d2ddc	Updating UG integration tests (needed updating only because the -mbq default is different from the old -mmq one).	2011-11-02 21:13:44 -04:00
Eric Banks	e1edd6bd12	Removing the min mapping quality argument since it wasn't being used in the normal processing of the pileups in UG - only for indel pileups. Instead, we apply the min base quality to the reads in the pileup for indels and define it to be the min 'confidence' of the base. Docs are updated but I didn't rename the argument as I don't want people to complain.	2011-11-02 20:32:58 -04:00
Mark DePristo	8a2929c1dd	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-11-02 16:21:00 -04:00
Eric Banks	4501dce58d	Fixing merge conflict	2011-11-02 12:50:32 -04:00
Eric Banks	54331b44e9	New way of looking at the size of a pileup: there's a physical number of elements in the data structure and there's a representative depth of coverage (since a reduced read represents depth >= 1). The size() method has been removed because its meaning is ambiguous. Updated several annotations and the UG engine to make use of the representative depths.	2011-11-02 12:47:30 -04:00
Mark DePristo	392e0aeace	Moved unit tests into master IntervalUtilsUnitTest	2011-11-02 10:52:00 -04:00
Mark DePristo	c2b97030a4	IntervalUtils for completely balanced locus-based scatter/gather -- scatterLocusIntervals master utility -- Moved around some general functionality from GenomeLocSortedSet to GenomeLoc -- Util function for reversing a list (List<T> -> List<T>, unlike Collections version) -- DoC is PartitionType.INTERVAL -- Significant unit tests on new functionality (all passing) -- Ready for real-world testing, as soon as I can get LocusScatterFunction.scala to actually work	2011-11-02 10:49:40 -04:00
Mauricio Carneiro	b004489c6d	Moving ReduceRead TAG to GATKSAMRecord ReduceReads are now a feature of a GATKSAMRecord, so the tag and the special methods needed to use it will now be housed by the GATKSAMRecord.	2011-11-01 17:12:09 -04:00
Eric Banks	0ca7428e76	Allow processing of empty intervals, but warn user when this case is encountered.	2011-10-28 12:12:14 -04:00
Eric Banks	649dfe98f0	Add VCF header for any expressions that are requested	2011-10-28 10:22:19 -04:00
Eric Banks	8b1a62da27	Adding unit test to cover overlapping intervals from the same source with the intersection rule.	2011-10-28 09:59:43 -04:00
Eric Banks	6ba08a103d	Empty ROD files should generate an exception when used for creating intervals. Moved some now obsolete files to the archive as the realigner will now read all target intervals into memory.	2011-10-28 09:23:25 -04:00
Eric Banks	19e27d4568	Removing all instances of -BTI (in tests and in GATKdocs) and replacing them with the appropriate alternative.	2011-10-27 23:55:11 -04:00
Eric Banks	ccfd853b34	Added further integration tests for rod-based intervals that deal with more complex cases. Good call by Mark to test the empty VCF example because we were failing on it; fixed.	2011-10-27 20:43:50 -04:00
Khalid Shakir	b80d407dc7	No more hunting down R "resources". As a tradeoff Rscript cannot be specified on the commandline and will be found in the environment path. Other minor cleanup.	2011-10-27 14:17:07 -04:00
Eric Banks	8c4dbce6d8	Don't serialize the GATKArgumentCollection for the GATKRunReports (which would have meant dealing with the new IntervalBindings). Also, forgot to remove a test that's no longer relevant to BED parsing.	2011-10-27 13:58:19 -04:00
Eric Banks	4a7e6fee3f	Remove support for BED file interval parsing in the GATK; it should all go through Tribble now. IndelRealigner no longer supports unordered interval input (which shouldn't have been used anyways). Temporarily commenting out serialization of arguments so that tests pass; this whole piece will be deleted soon anyways.	2011-10-27 13:38:08 -04:00
Eric Banks	44f905b5e5	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-10-26 23:31:11 -04:00
Mark DePristo	034a997d07	Generalized Reads -> Fragment calculation -- Supports ReadBackedPileup -> FragmentCollection as before -- Added support for List<SAMRecord> -> FragmentCollection for Ryan's haplotype caller -- General cleanup, renaming, move to separate package, more extensive unit tests, etc. -- Added toFragment() function to ReadBackedPileup interface	2011-10-26 15:54:38 -04:00
Eric Banks	b39fcb1bea	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-10-26 15:44:25 -04:00
Eric Banks	3273c20c98	Added integration tests for Tribble-based intervals and fixed up some of the other tests based on some method changes.	2011-10-26 15:29:18 -04:00
Mark DePristo	7fa943aef1	Renamed FragmentPileup to FragmentUtils	2011-10-26 14:01:45 -04:00
Mark DePristo	1b722c21cf	merge master	2011-10-25 16:08:39 -04:00
David Roazen	2794e5c1d4	Modified the VCFJarClassLoadingUnitTest to play nice with the packaged-jar test targets.	2011-10-25 14:47:15 -04:00
Khalid Shakir	fac9932938	Embedding gsalib source and queueJobReport R scripts in the dist and package jars. Moved gsalib and queueJobReport.R to embeddable namespaced locations. Updated packager dependencies/dir to add an @includes which filters the embedded fileset. RScriptExecutor can now JIT compiles the gsalib. RScriptExecutor uses ProcessController and sends the Rscript output to java's stdout when run under -l DEBUG. Refactored ProcessController and IOUtils from Queue to Sting Utils. Added more unit tests to ProcessController along with a utility class to hard stop OutputStreams at a specified byte count. Replaced uses of some IOUtils with Apache Commons IO. ShellJobRunner refactored to use direct ProcessController and now kills jobs on shutdown. Better QGraph responsiveness on shutdown by using Object.wait() instead of Thread.sleep().	2011-10-24 15:58:34 -04:00
Khalid Shakir	89a581a66f	Added ability to specify arguments in files via -args/--arg_file Pushing back downsample and read filter args so they show up in getApproximateCommandLineArgs()	2011-10-24 15:58:34 -04:00
Mark DePristo	502592671d	Cleanup FragmentPileup before main repo commit -- removed intermiate functions. Now only original version and best optimized new version remain -- Moved general artificial read backed pileup creation code into ArtificialSamUtils	2011-10-24 14:40:05 -04:00
Mark DePristo	166174a551	Google caliper example execution script -- FragmentPileup with final performance testing	2011-10-24 14:04:53 -04:00
Mark DePristo	42bf9adede	Initial version of "fast" FragmentPileup code -- Uses mayOverlapRoutine in ReadUtils -- Attempts to be smart when doing overlap calculation, to avoid unnecessary allocations -- PileupElement now comparable (sorts on offset than on start) -- Caliper microbenchmark to assess performance	2011-10-22 21:36:37 -04:00
Guillermo del Angel	f4b409fa0d	CombineVariants bug fix: when merging records with disparate alleles we were leaving AC,AF fields intact. This had as a consequence that we could end up with a record with 3 alt alleles but only 2 values in AC,AF fields. Now, if alleles in combined vc are different from original, and if AC,AF fields can't be recomputed from genotypes, we remove attributes from vc map since they'll be invalid anyway. Integration test md5 changed since there were several badly merged records in result	2011-10-21 14:07:20 -04:00
Mark DePristo	b863390cb1	Moving reduced read functionality into GATKSAMRecord -- More functions take / produce GATKSAMRecords instead of SAMRecord	2011-10-21 13:28:05 -04:00
Mark DePristo	110e13bc1e	Merge branch 'master' into SamRecordFactory	2011-10-21 09:43:52 -04:00
Mark DePristo	3227143a1c	Systematic test code for FragmentPileup -- Creates all combinatinos of overlapping and non-overlapping read pair pileups in all orientations and first/second pairings to validate fragment detection.	2011-10-19 17:50:27 -04:00
Eric Banks	d8d73fe4f2	Treat ./X genotypes as MIXED so that isHet, isHom, etc. still return the expected and correct values. Added docs to these accessors with contracts explicitly mentioned. Fixed case where NPE could be thrown.	2011-10-19 15:11:13 -04:00
Eric Banks	5a6468c11e	Allowing ./X genotypes and adding a unit test to ensure that this case is covered from now on (especially given that we may want to revert in the future). Reverting this change is really easy and entails uncommenting a few lines of code. But for now, despite Mark's objections, this case is allowed in the VCF spec and we are wrong not to allow it.	2011-10-19 11:52:05 -04:00
David Roazen	88d6b8bc1f	Merged bug fix from Stable into Unstable	2011-10-14 20:13:38 -04:00
David Roazen	bd8bb93811	Split RScriptExecutorUnitTest into public and private test classes. We can't have a public test that depends on both public and private code/data -- the new release system needs to do public-only tests, and will catch this sort of thing.	2011-10-14 20:04:42 -04:00
David Roazen	4f01a742cb	Merged bug fix from Stable into Unstable	2011-10-13 21:39:52 -04:00
David Roazen	edfd6f8a06	Removing a public -> private dependency from the test suite. The public integration test VariantContextIntegrationTest was dependent on the private walker TestVariantContextWalker. Moved this walker to public/java/test (NOT public/java/src, since this walker is only used by the test suite) to avoid errors during public-only tests.	2011-10-13 21:32:52 -04:00
Mark DePristo	404ef741f1	Merged bug fix from Stable into Unstable	2011-10-13 18:02:06 -04:00
Mark DePristo	2ebdff074c	Update MD5s for SOLiD recalibration -- MD5 db had spelling error; fixed -- Bug in AlignmentUtils resulted in some bases not being color space corrected. The integration test caught the change, and it's clear that the new version is correct, as the prev. version was not considering the last the N qualities for reads with a ND operation.	2011-10-13 18:01:51 -04:00
Eric Banks	9aecd50473	Adding ability to exclude annotations from the VA and UG lists. As described in the docs, this argument trumps all others (including -all) so that we can get around the SnpEff issue brought up by Menachem. Added integration test for it.	2011-10-12 15:44:54 -04:00
David Roazen	cfd0ac8410	Merged bug fix from Stable into Unstable Conflicts: public/java/test/org/broadinstitute/sting/gatk/walkers/genotyper/UnifiedGenotyperIntegrationTest.java	2011-10-11 12:03:51 -04:00
David Roazen	24b72334b3	UnifiedGenotyper now correctly initializes the VariantAnnotator engine. This allows the annotation classes to perform any necessary initialization/validation. For example, it allows the SnpEff annotator to (among other things) validate its rod binding. This will prevent a NullPointerException when SnpEff annotation is requested but no rod binding is present. Added an integration test to cover this case so that it doesn't break again.	2011-10-11 12:02:05 -04:00
Mark DePristo	fb72bcf732	DiffObjects no longer prints out the file name in the status so MD5 are stable	2011-10-10 15:10:57 -04:00
Mark DePristo	e3ff4f4266	Failing MD5 because output now contains absolute path	2011-10-10 11:05:02 -04:00
Mark DePristo	3e6c16d961	CombineVariants preserves allele order	2011-10-10 11:04:38 -04:00
Mark DePristo	a4bb842958	RankSum tests have lightly different MD5 results based on allele order -- UG GENOTYPE_GIVEN_ALLELES now uses the order of alleles in the VCF, so this changes the MD5	2011-10-10 11:04:07 -04:00
Mark DePristo	46e7370128	this.allele, getAlleles(), and getAltAlleles() now return List not set -- Changes associated code throughout the codebase -- Updated necessary (but minimal) UnitTests to reflect new behavior -- Much better makealleles() function in VC.java that enforces a lot of key constraints in VC	2011-10-09 11:45:55 -07:00
Mark DePristo	822654b119	UnitTests for allele getting functions in VC in prep for move from set to list	2011-10-09 10:36:14 -07:00
Mark DePristo	c67f6c076b	simpleMerge now preserves allele order -- UnitTests for dangerous PL merging cases in the multi-allelic case. The new behavior is correct	2011-10-08 17:39:53 -07:00
Mark DePristo	e94e6ba101	A UnitTest to ensure that the order of alleles is maintained -> A, C, T and A, T, C are different and must be maintained. The constructors were doing this appropriately, so nothing needed to be changed	2011-10-08 08:47:58 -07:00
Matt Hanna	6fbd41724a	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-10-07 11:20:00 -04:00
Matt Hanna	4514bc350f	More reliable way of finding the Tribble jar.	2011-10-07 11:19:29 -04:00
Eric Banks	181c76750e	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-10-06 22:38:55 -04:00
Eric Banks	ca9cd9b688	Minor fix for merging intervals which hadn't been necessary when only merging from the left to right. Added integration tests to cover the parallelization of RTC.	2011-10-06 22:38:44 -04:00
Khalid Shakir	f91b015e0e	Made the BaseTest.testDir absolute	2011-10-06 22:33:21 -04:00
Eric Banks	61a3dfae24	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-10-06 15:58:04 -04:00
Eric Banks	6eb87bf58a	RTC now caches all intervals as GenomeLocs (which is expected to take < 1Gb whole genome based on back of the envelope calculations with Matt) so that 1) we don't have to worry about emitting outside of the leaves in the hierarchical reductions and 2) we can emit the intervals in sorted order which is a big performance plus for the realigner. Integration tests change only because intervals whose start=stop are now printed as chr:start instead of chr:start-stop.	2011-10-06 15:57:49 -04:00
Mark DePristo	6d9c210460	Updating MD5s for updated BAM with read groups	2011-10-06 12:15:48 -07:00
Matt Hanna	3961733590	Merged bug fix from Stable into Unstable	2011-10-06 12:54:52 -04:00
Matt Hanna	4fa5045e84	Abandoning classfileset/rootfileset approach due to difficulting managing classloading of bcel.jar/ant-apache-bcel.jar. Switching instead to manually specifying a minimal set of packages/classes to include in the vcf.jar via build.xml, and adding a unit test which creates a limited classloader only aware of vcf.jar and tribble.jar and tries to use it to load the core classes in the vcf jar. Hopefully third time's the charm.	2011-10-06 12:49:51 -04:00
Mark DePristo	4b5b9155a9	Fixed bad expected value in PedReaderUnitTest	2011-10-06 08:16:47 -07:00
Mark DePristo	3226d5dc0d	Merge branch 'master' into ped	2011-10-05 15:03:09 -07:00
Mark DePristo	e7c80f7c45	Renaming quantitative trait to OtherPhenotype which is now a String not a double -- we can now use PED file to represent population data or other arbitrary phenotype data, not just doubles	2011-10-05 12:26:33 -07:00
Mark DePristo	51ecc20867	getFamily() and associated methods implemented and tested -- Sample no longer serializable -- Sample now implements Comparable	2011-10-05 09:55:05 -07:00
Mark DePristo	f4bac58f14	Merged bug fix from Stable into Unstable	2011-10-04 21:00:34 -07:00
Mark DePristo	d1d39943d0	Updating MD5 for BAMs that I added a read group to, part 2	2011-10-04 21:00:15 -07:00
Mark DePristo	9bd3ba4c7e	Missed one MD5	2011-10-04 16:04:52 -07:00
Mark DePristo	ffdfdcde3f	Updating MD5s -- Interval test now uses RG containing BAM -- DoC sample name ordering has changed.	2011-10-04 15:54:45 -07:00
Mark DePristo	463eab7604	All MD5 mismatches for test are shown -- Now for tests like DoC, with 20 output md5s, you see all of the differences before failing.	2011-10-04 15:53:52 -07:00
Mark DePristo	c642a080d4	Merged bug fix from Stable into Unstable	2011-10-04 14:08:41 -07:00
Mark DePristo	941317167e	Updating MD5 for BAMs that I added a read group to	2011-10-04 14:08:00 -07:00
Mark DePristo	e1d6c7a50a	Updating MD5 that have changed due to sample ordering differences	2011-10-04 09:33:23 -07:00
Mark DePristo	343a7b6b2f	Updating UG integration tests for arbitrary impact of sample order changes on downsampling	2011-10-04 08:14:00 -07:00
Mark DePristo	a27641e1fc	Cleaned up imports	2011-10-04 06:28:36 -07:00
Mark DePristo	b20689ff55	No longer supports extraProperties -- the underlying data structure is still present, but until I decide what to do for the extensible system I've completely disabled the subsystem -- Added code to merge Samples, so that a mostly full record can be merged with a consistent empty record. If the two records are inconsistent, an error is thrown -- addSample() in Sample.class now invokes mergeSample() when appropriate -- Validation types are now only STRICT or SILENT -- Validation code implemented in SampleDBBuilder -- Extensive unit tests for SampleDBBuilder	2011-10-03 19:20:33 -07:00
Mark DePristo	867a7476c1	Systematic unit tests for the sample object	2011-10-03 19:09:02 -07:00
Mauricio Carneiro	3837aa45b4	Fixing conflicts Conflicts: public/java/test/org/broadinstitute/sting/utils/clipreads/ReadClipperUnitTest.java	2011-10-03 19:07:59 -07:00
Mark DePristo	2e3dc52088	Minor function renaming	2011-10-03 14:41:13 -07:00
Mark DePristo	dd71884b0c	On path to SampleDB engine integration -- PedReader tag parser -- Separation of SampleDBBuilder from SampleDB (now immutable) -- Removed old sample engine arguments	2011-10-03 12:08:07 -07:00
Mark DePristo	89ac50e86e	SampleDataSource -> SampleDB	2011-10-03 09:33:30 -07:00
Mark DePristo	93fba06cb5	Support for whitespace only lines	2011-10-03 09:30:10 -07:00
Mark DePristo	0604ce55d1	PedReader support for ; separated lines, not only newline	2011-10-03 09:19:58 -07:00
Mark DePristo	52f670c8b8	100% version of PedReader -- Passes all unit tests -- Added unit tests for missing fields	2011-10-03 06:12:58 -07:00
Roger Zurawicki	bf6a3a6532	Added framework to do batch CigarClip Testing *NOTE: This commit has not been compiled!	2011-10-02 22:33:46 -04:00
Mark DePristo	dd75ad9f49	95% PedReader -- Passes significiant unit tests -- Implicit sample creation for mom / dad when you create single samples -- Continuing cleanup of Sample and SampleDataSource	2011-09-30 18:03:34 -04:00
Mark DePristo	84160bd83f	Reorganization of Sample -- Moved Gender and Afflication to separate public enums -- PedReader 90% implemented -- Improve interface cleanup to XReadLines and UserException	2011-09-30 15:50:54 -04:00
Mark DePristo	56f10b40a8	Fixing test bugs for WindowMaker that required empty sample list	2011-09-30 14:18:27 -04:00
Mark DePristo	30d23942b1	Renamed ReadBackedPileup getXSampleName() functions to getXSample -- now that we don't have Sample objects floating around we don't have to have all of the Name extensions on our functions	2011-09-30 10:02:57 -04:00
Mark DePristo	e055a78f6e	LIBS now requires at least one sample be present -- UnitTest provides a "null" sample for matching the reads without read groups	2011-09-30 09:49:35 -04:00
Mark DePristo	b71b51751e	Bug fix for UnitTest -- Provide the null sample to the LIBS, as this seems to be required for correctly passing this unit test -- Will be fixed in a future update	2011-09-29 17:30:01 -04:00
Mark DePristo	1765fbeb6b	Merge branch 'master' into ped	2011-09-29 17:18:51 -04:00
Mark DePristo	98ecaf8aa0	Support for ReducedReads with reduced counts and average quals -- ReadUtils and UnitTest updated to support new byte[] style -- Removed unnecessary read transformer in PairHMM	2011-09-29 17:18:39 -04:00
Mark DePristo	9458f01409	Test cleanup of Sample object	2011-09-29 15:13:05 -04:00
Mark DePristo	625ffb6a07	LocusIteratorByState and ReadBackedPileups no long use Sample	2011-09-29 14:52:11 -04:00
Mark DePristo	505416b6c0	Merge branch 'master' into ped	2011-09-29 12:22:39 -04:00
Mauricio Carneiro	4086fa768f	Disabling all ReadClipperUnitTests	2011-09-29 12:20:35 -04:00
Mark DePristo	5043d76c3d	Removing more bad uses of SampleDataSource creation	2011-09-29 12:16:34 -04:00
Mark DePristo	5c9227cf5e	Further cleanup of Sample database -- Removing more and more unnecessary code -- Partial removal of type safe Sample usage. On the road to SampleDB only	2011-09-29 11:50:05 -04:00
Mark DePristo	2a0cd556d3	Further cleanup of Sample -- Cleaned up interface functions in GAE -- Added Walker.getSampleDB() function which is an easier option for tools to get the samples db	2011-09-29 10:34:51 -04:00
Mark DePristo	e76f381628	Moved sample package from DataSources to gatk, and renamed it samples -- All associated changes to the codebase are just header updates	2011-09-29 09:57:15 -04:00
Mauricio Carneiro	fc86cd6fd8	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/carneiro/gatk/RR into rr	2011-09-29 00:12:15 -04:00
Roger Zurawicki	4fd5630f6a	Added ReadClipper Unit Test * Includes tests that include HardClip to Read and Reference Coords. * Changed ReadUtils.HardClipByReferenceCoordinates from private to protected to allow for testing	2011-09-28 23:13:50 -04:00
Matt Hanna	9272ed03b5	Merged bug fix from Stable into Unstable	2011-09-28 21:26:43 -04:00
Matt Hanna	0acaf2df65	Fix an embarrassing issue where a specific configuration of minimal coverage over small intervals could cause reads to be dropped from the pileup. Nothing to see here...	2011-09-28 21:23:01 -04:00
Mark DePristo	4f09453470	Refactored reduced read utilities -- UnitTests for key functions on reduced reads -- PileupElement calls static functions in ReadUtils -- Simple routine that takes a reduced read and fills in its quals with its reduced qual	2011-09-26 12:58:31 -04:00
Guillermo del Angel	3eef800889	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-09-24 21:20:11 -04:00
Guillermo del Angel	4707ab4a7d	Added unit tests to test genotype merges with PL's	2011-09-24 21:17:15 -04:00
Guillermo del Angel	203517fbb7	a) Cleanups/bug fixes to previous commit to CombineVariants. b) Change md5 to reflect records that are now merged correctly. c) Change unit merge alleles test to reflect the fact that a null non-variant vc object is not valid and not supported because there's no way to codify such object in a vcf. The code correctly converts this to a non-variant single-base event with whatever the reference is at that location.	2011-09-24 19:08:00 -04:00
Guillermo del Angel	cd058dd10f	a) Fixed md5 for legit change in UG output that now also no-calls genotypes w/0,0,0 in PL's in SNP case. b) First reimplementation of new vc merger of different types. Previous version did it in two steps, first merging all vc's per type and then trying to see if resulting vc's would be merged if alleles of one type were a subset of another, but this won't work when uniquifying genotypes since sample names would be messed up and GT sample names wouldn't match VC sample names. Now, it's actually simpler: when splitting vc's by type before merging, we check for alleles of one vc being a subset of alleles of vc of another type and if so we put them together in same list.	2011-09-24 13:40:11 -04:00
Mark DePristo	8d9e136bba	Merge branch 'stable'	2011-09-24 09:26:28 -04:00
Mark DePristo	f792353dcd	Framework for genotype unit test	2011-09-24 08:56:45 -04:00
Mark DePristo	c0bb0cb465	Make DiploidGenotype enum private to walkers.genotyper	2011-09-24 08:48:33 -04:00
Khalid Shakir	1803bd6ae2	Merged bug fix from Stable into Unstable	2011-09-23 21:05:00 -04:00
Khalid Shakir	8ceb93b8ac	Fixed an integration test which crashed on the out of date LSF DRMAA library when run against the obsolete LSF dotkit instead of .combined_LSF_SGE	2011-09-23 21:03:22 -04:00
David Roazen	40202c85e0	Merged bug fix from Stable into Unstable	2011-09-23 16:35:55 -04:00
David Roazen	e1cb5f6459	SnpEff annotator now assigns a functional class to each effect and distinguishes between actual effects and mere modifiers. -We now assign a functional class (nonsense, missense, silent, or none) to each SnpEff effect, and add a SNPEFF_FUNCTIONAL_CLASS annotation to the INFO field of the output VCF. -Effects are now prioritized according to both biological impact and functional class, instead of impact only. -Many of SnpEff's "low-impact" effects are now classified as "modifiers" with lower priority than every other effect. This includes such "effects" as DOWNSTREAM, UPSTREAM, INTRON, GENE, EXON, and others that really describe the location of the variant rather than its biological effect. This code will be short-lived (likely 1.2-only), as the next version of SnpEff will include most of these features directly. Checking this change into Stable+Unstable instead of Unstable because the current functional class stratification in VariantEval is basically broken and urgently needs to be fixed for production purposes.	2011-09-23 16:06:52 -04:00
Mark DePristo	106a26c42d	Minor file cleanup	2011-09-23 08:25:20 -04:00
Mark DePristo	a9f073fa68	Genotype merging unit tests for simpleMerge -- Remaining TODOs are all for GdA	2011-09-23 08:24:49 -04:00
Eric Banks	a8e0fb26ea	Updating md5 because the file changed	2011-09-23 07:33:20 -04:00
Mark DePristo	c49cc623de	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-09-22 17:26:21 -04:00
Mark DePristo	dab7232e9a	simpleMerge UnitTest for not annotating and annotating to different info key	2011-09-22 17:26:11 -04:00
Mark DePristo	30ab3af0c8	A few more simpleMerge UnitTest tests for filtered vcs	2011-09-22 17:14:59 -04:00
Mark DePristo	5cf82f9236	simpleMerge UnitTest tests filtered VC merging	2011-09-22 17:05:12 -04:00
Mark DePristo	46ca33dc04	TestDataProvider now can be named	2011-09-22 17:04:32 -04:00
Mark DePristo	68da555932	UnitTest for simpleMerge for alleles	2011-09-22 15:16:37 -04:00
Eric Banks	80d7300de4	Unit test was passing in FORMAT as one of the sample names. There used to be a hack in the VCFHeader to check for this and remove it and I couldn't figure out why, but now I know. Hack was removed and now the unit test passes in only the sample names as per the contract.	2011-09-22 13:28:42 -04:00
Eric Banks	9c1728416c	Revert "Updating md5 for fixed file" because this was fixed properly in unstable (but will break SnpEff if put into Stable). This reverts commit 6b4182c6ab3e214da4c73bc6f3687ac6d1c0b72c.	2011-09-22 13:16:42 -04:00
Eric Banks	888d8697b1	Merged bug fix from Stable into Unstable	2011-09-22 13:16:31 -04:00
Eric Banks	15a410b24b	Updating md5 for fixed file	2011-09-22 13:15:41 -04:00
Mark DePristo	ba5f83fee2	start of VariantContextUtils UnitTest -- tests rsID merging	2011-09-22 12:10:39 -04:00
Mark DePristo	a05c959e5a	Empty unit tests for VariantContextUtils -- will be expanded over the day	2011-09-22 11:20:07 -04:00
Mark DePristo	3fdee2b9ed	Merge from stable into unstable	2011-09-22 11:19:43 -04:00
Mark DePristo	c514df6d18	Merge of stable into unstable	2011-09-22 10:34:27 -04:00

... 2 3 4 5 6 ...

587 Commits (ff26f2bf688048bbd6e2b9ffcf31cedce4fa99dd)