gatk-3.8

Commit Graph

Author	SHA1	Message	Date
Mark DePristo	68da555932	UnitTest for simpleMerge for alleles	2011-09-22 15:16:37 -04:00
Mark DePristo	ba5f83fee2	start of VariantContextUtils UnitTest -- tests rsID merging	2011-09-22 12:10:39 -04:00
Mark DePristo	a05c959e5a	Empty unit tests for VariantContextUtils -- will be expanded over the day	2011-09-22 11:20:07 -04:00
Mark DePristo	3fdee2b9ed	Merge from stable into unstable	2011-09-22 11:19:43 -04:00
Mark DePristo	c514df6d18	Merge of stable into unstable	2011-09-22 10:34:27 -04:00
Mark DePristo	f81a41b889	Updating MD5s for CombineVariants -- Old version had broken RSIDs, new version is fixed. No longer see rs1234,. as it is now just rs1234	2011-09-22 10:30:25 -04:00
Eric Banks	b8ea9ceb68	Adding integration test that uses the -V:dbsnp binding to make sure it won't fail later on if someone messes with Tribble.	2011-09-21 22:43:31 -04:00
Mark DePristo	6bcfce225f	Fix for dynamic type determination for bgzip files -- GZipInputStream handles bgzip files under linux, but not mac -- Added BlockCompressedInputStream test as well, which works properly on bgzip files	2011-09-21 15:39:19 -04:00
Mark DePristo	74f9ccf6dd	Merge	2011-09-21 11:30:11 -04:00
Mark DePristo	6592972f82	Putative fix for BAQ array out of bounds -- Old code required qual to be <64, which isn't strictly necessary. Now uses the Picard SAMUtils.MAX_PHRED_SCORE constant -- Unittest to enforce this behavior	2011-09-21 11:25:08 -04:00
Mark DePristo	7d11f93b82	Final bugfix for CombineVariants -- Now handles multiple records at a site, so that you don't see records like set=dbsnp-dbsnp-dbsnp when combining something with dbsnp -- Proper handling of ids. If you are merging files with multiple ids for the same record, the ids are merged into a comma separated list	2011-09-21 10:58:32 -04:00
Mark DePristo	a91ac0c5db	Intermediate commit of bugfixes to CombineVariants	2011-09-21 10:15:05 -04:00
David Roazen	b04d8eab55	Merged bug fix from Stable into Unstable	2011-09-20 17:24:14 -04:00
David Roazen	d9ea764611	SnpEff annotator now adds OriginalSnpEffVersion and OriginalSnpEffCmd lines to the header of the VCF output file. This change is urgently required for production, which is why it's going into Stable+Unstable instead of just Unstable. The keys for the SnpEff version and command header lines in the VCF file output by VariantAnnotator (OriginalSnpEffVersion and OriginalSnpEffCmd) are intentionally different from the keys for those same lines in the SnpEff output file (SnpEffVersion and SnpEffCmd), so that output files from VariantAnnotator won't be confused with output files from SnpEff itself.	2011-09-20 16:30:55 -04:00
Mark DePristo	b7511c5ff3	Fixed long-standing bug in tribble index creation -- Previously, on the fly indices didn't have dictionary set on the fly, so the GATK would read, add dictionary, and rewrite the index. This is now fixed, so that the on the fly index contains the reference dictionary when first written, avoiding the unnecessary read and write -- Added a GenomeAnalysisEngine and Walker function called getMasterSequenceDictionary() that fetches the reference sequence dictionary. This can be used conveniently everywhere, and is what's written into the Tribble index -- Refactored tribble index utilities from RMDTrackBuilder into IndexDictionaryUtils -- VCFWriter now requires the master sequence dictionary -- Updated walkers that create VCFWriters to provide the master sequence dictionary	2011-09-20 10:53:18 -04:00
Mark DePristo	aa8afa3899	Merge	2011-09-19 21:16:47 -04:00
Mark DePristo	4ad330008d	Final intervals cleanup -- No functional changes (my algorithm wouldn't work) -- Major structural cleanup (returning more basic data structures that allow us to development new algorithm) -- Unit tests for the efficiency of interval partitioning	2011-09-19 10:19:10 -04:00
Mark DePristo	6ea57bf036	Merge branch 'master' into sgintervals	2011-09-19 09:50:19 -04:00
Guillermo del Angel	7fa1e237d9	Forgot to git stash pop new MD5's for CombineVariants integration test	2011-09-16 12:53:54 -04:00
David Roazen	d78e00e5b2	Renaming VariantAnnotator SnpEff keys This is to head off potential confusion with the output from the SnpEff tool itself, which also uses a key named EFF.	2011-09-15 17:42:15 -04:00
Eric Banks	9dc6354130	Oops didn't mean to touch this test before	2011-09-15 16:55:24 -04:00
Eric Banks	202405b1a1	Updating the FunctionalClass stratification in VariantEval to handle the snpEff annotations; this change really needs to be in before the release so that the pipeline can output semi-meaningful plots. This commit maintains backwards compatibility with the crappy Genomic Annotator output. However, I did clean up the code a bit so that we now use an Enum instead of hard-coded values (so it's now much easier to change things if we choose to do so in the future). I do not see this as the final commit on this topic - I think we need to make some changes to the snpEff annotator to preferentially choose certain annotations within effect classes; Mark, let's chat about this for a bit when you get back next week. Also, for the record, I should be blamed for David's temporary commit the other day because I gave him the green light (since when do you care about backwards compatibility anyways?). In any case, at least now we have something that works for both the old and new annotations.	2011-09-15 13:52:31 -04:00
David Roazen	3db457ed01	Revert "Modified VariantEval FunctionalClass stratification to remove hardcoded GenomicAnnotator keynames" After discussing this with Mark, it seems clear that the old version of the VariantEval FunctionalClass stratification is preferable to this version. By reverting, we maintain backwards compatibility with legacy output files from the old GenomicAnnotator, and can add SnpEff support later without breaking that backwards compatibility. This reverts commit b44acd1abd9ab6eec37111a19fa797f9e2ca3326.	2011-09-14 10:47:28 -04:00
David Roazen	e0c8c0ddcb	Modified VariantEval FunctionalClass stratification to remove hardcoded GenomicAnnotator keynames This is a temporary and hopefully short-lived solution. I've modified the FunctionalClass stratification to stratify by effect impact as defined by SnpEff annotations (high, moderate, and low impact) rather than by the silent/missense/nonsense categories. If we want to bring back the silent/missense/nonsense stratification, we should probably take the approach of asking the SnpEff author to add it as a feature to SnpEff rather than coding it ourselves, since the whole point of moving to SnpEff was to outsource genomic annotation.	2011-09-14 07:09:47 -04:00
David Roazen	1213b2f8c6	SnpEff 2.0.2 support -Rewrote SnpEff support in VariantAnnotator to support the latest SnpEff release (version 2.0.2) -Removed support for SnpEff 1.9.6 (and associated tribble codec) -Will refuse to parse SnpEff output files produced by unsupported versions (or without a version tag) -Correctly matches ref/alt alleles before annotating a record, unlike the previous version -Correctly handles indels (again, unlike the previous version	2011-09-14 07:09:47 -04:00
Guillermo del Angel	5b1bf6e244	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-09-13 17:04:43 -04:00
Guillermo del Angel	c6672f2397	Intermediate (but necessary) fix for Beagle walkers: if a marker is absent in the Beagle output files, but present in the input vcf, there's no reason why it should be omitted in the output vcf. Rather, the vc is written as is from the input vcf	2011-09-13 16:57:37 -04:00
Ryan Poplin	981b78ea50	Changing the VQSR command line syntax back to the parsed tags approach. This cleans up the code and makes sure we won't be parsing the same rod file multiple times. I've tried to update the appropriate qscripts.	2011-09-12 12:17:43 -04:00
Guillermo del Angel	9344938360	Uncomment code to add deleted bases covering an indel to per-sample genotype reporting, update integration tests accordingly	2011-09-10 19:41:01 -04:00
Guillermo del Angel	b399424a9c	Fix integration test affected by non-calling all-zero PL samples, and add a more complicated multi-sample integration test from a phase 1 case, GBR with mixed technologies and complex input alleles	2011-09-09 20:44:47 -04:00
Mark DePristo	72536e5d6d	Done	2011-09-09 15:44:47 -04:00
Ryan Poplin	1953edcd2d	updating Validate Variants deletion integration test	2011-09-09 13:39:08 -04:00
Ryan Poplin	9ada9b3ed4	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-09-09 13:15:36 -04:00
Ryan Poplin	354529bff3	adding Validate Variants integration test with a deletion	2011-09-09 13:15:24 -04:00
Mark DePristo	06cb20f2a5	Intermediate commit cleaning up scatter intervals -- Adding unit tests to ensure uniformity of intervals	2011-09-09 12:56:45 -04:00
Eric Banks	51eb95d638	Missed these tests before	2011-09-09 11:46:37 -04:00
Eric Banks	6ad8943ca0	CompOverlap no longer keeps track of the number of comp sites since it wasn't (and cannot) keeping track of them correctly.	2011-09-09 09:45:24 -04:00
Eric Banks	eaaba6eb51	Confirming that when stratifying by sample in VE the monomorphic sites for a given sample are not counted for the relevant metrics. Adding integration test to cover it.	2011-09-08 13:17:34 -04:00
Ryan Poplin	2636d216de	Adding indel vqsr integration test	2011-09-08 10:38:13 -04:00
Ryan Poplin	9cba1019c8	Another fix for genotype given alleles for indels. Expanding the indel integration tests to include multiallelics and indel records that overlap	2011-09-08 09:25:13 -04:00
Ryan Poplin	e0020b2b29	Fixing PrintRODs. Now has input and only prints out one copy of each record	2011-09-08 08:58:37 -04:00
Mark DePristo	2ded027762	Removed dysfunctional tranches support from VariantEval	2011-09-07 16:09:24 -04:00
Eric Banks	aa9e32f2f1	Reverting Mark's previous commit as per the open discussion. Now the eval modules check isPolymorphic() before accruing stats when appropriate. Fixed the IndelLengthHistogram module not to error out if the indel isn't simple (that would have been bad). Only integration test that needed to be updated was the tranches one based on a separate commit from Mark.	2011-09-07 15:48:06 -04:00
Mark DePristo	9127849f5d	BugFix for unit test	2011-09-07 14:54:10 -04:00
Eric Banks	da9c8ab386	Revving the Tribble jar where the DbsnpCodec class was renamed to OldDbsnpCodec. Updating GATK code accordingly.	2011-09-06 20:39:42 -04:00
Mark DePristo	1aa4b12ff0	Reduced the number of combinations being tested here, which was overkill	2011-09-01 10:42:43 -04:00
Mark DePristo	3af001fff2	Bugfix for file that must not exist on disk	2011-08-29 17:00:10 -04:00
Mark DePristo	1ceb020fae	UnitTests for RScript	2011-08-27 10:50:05 -04:00
Mark DePristo	c0503283df	Spelling fix requires md5 updates	2011-08-26 07:40:44 -04:00
Guillermo del Angel	e618cb1e79	a) Renamed/expanded SelectVariants arguments that choose particular kinds of variants and particular allelic types, now instead of -Indels or -SNPs we can specify for example -selectType [MIXED\|INDEL\|SNP\|MNP\|SYMBOLIC]. To select biallelic, multiallelic variants, use -restrictAllelesTo [BIALLELIC\|MULTIALLELIC]. Corresponding gatkdocs changes. b) More useful AC,AF logging in VariantsToTable with multiallelic sites: instead of logging comma-separated values, log max value by default. Hidden, experimental argument -logACSum to log sum of ACs instead. This is due to extreme slowness of R in parsing strings to tokens and computing max/sum itself (~100x slower than gatk). c) Added integrationtest for new SelectVariants commands	2011-08-24 12:25:50 -04:00

1 2 3 4 5

242 Commits (68da555932761567f79f0b3bfdbc808dcf72ca5e)