gatk-3.8

Commit Graph

Author	SHA1	Message	Date
Roger Zurawicki	091c7197cd	Fixed memory leak and bug with deletions in clipping The ClippingOp clip cigar function would run into a endless loop if the parameter were out of the reads range, I stopped the bug. * There is no check to make sure the read coordinate are covered by the read though When Hard clipping to interval, I added a check for deletions. NOTE: method works for NA12878 WEx but needs to be more thoroughly tested/optimized	2011-09-18 19:21:51 -04:00
Guillermo del Angel	7fa1e237d9	Forgot to git stash pop new MD5's for CombineVariants integration test	2011-09-16 12:53:54 -04:00
Guillermo del Angel	e7b9a009b7	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-09-16 12:48:30 -04:00
Menachem Fromer	b2e8e11128	Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-09-16 00:52:27 -04:00
Christopher Hartl	57b3efa2e2	Merge branch 'master' of ssh://chartl@tin.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-09-15 21:06:38 -04:00
Christopher Hartl	939babc820	Updating formating for ValidationAmplicons GATK docs	2011-09-15 21:05:51 -04:00
Christopher Hartl	9fdf1f8eb6	Fix some doc formatting for Depth of Coverage	2011-09-15 21:05:22 -04:00
Menachem Fromer	e6e9b08c9a	Must provide alleles VCF to UGCallVariants	2011-09-15 18:51:09 -04:00
David Roazen	d78e00e5b2	Renaming VariantAnnotator SnpEff keys This is to head off potential confusion with the output from the SnpEff tool itself, which also uses a key named EFF.	2011-09-15 17:42:15 -04:00
Eric Banks	1971fb35d7	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-09-15 16:55:33 -04:00
Eric Banks	9dc6354130	Oops didn't mean to touch this test before	2011-09-15 16:55:24 -04:00
Ryan Poplin	2a8b8efd2f	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-09-15 16:26:35 -04:00
Ryan Poplin	2f58fdb369	Adding expected output doc to CountCovariates	2011-09-15 16:26:11 -04:00
Eric Banks	fd1831b4a5	Updating docs to include more details	2011-09-15 16:25:03 -04:00
Eric Banks	6d02a34bfb	Updating docs to include output	2011-09-15 16:17:54 -04:00
Eric Banks	4ef6a4598c	Updating docs to include output	2011-09-15 16:10:34 -04:00
Eric Banks	fe474b77f8	Updating docs so printing looks nicer	2011-09-15 16:05:39 -04:00
Eric Banks	f04e51c6c2	Adding docs from Andrey since his repo was all screwed up.	2011-09-15 15:38:56 -04:00
Guillermo del Angel	86480b2e13	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-09-15 15:31:07 -04:00
Eric Banks	d369d10593	Adding documentation before the release for GATK wiki page	2011-09-15 13:56:23 -04:00
Eric Banks	202405b1a1	Updating the FunctionalClass stratification in VariantEval to handle the snpEff annotations; this change really needs to be in before the release so that the pipeline can output semi-meaningful plots. This commit maintains backwards compatibility with the crappy Genomic Annotator output. However, I did clean up the code a bit so that we now use an Enum instead of hard-coded values (so it's now much easier to change things if we choose to do so in the future). I do not see this as the final commit on this topic - I think we need to make some changes to the snpEff annotator to preferentially choose certain annotations within effect classes; Mark, let's chat about this for a bit when you get back next week. Also, for the record, I should be blamed for David's temporary commit the other day because I gave him the green light (since when do you care about backwards compatibility anyways?). In any case, at least now we have something that works for both the old and new annotations.	2011-09-15 13:52:31 -04:00
David Roazen	1e682deb26	Minor html-formatting-related documentation fix to the SnpEff class.	2011-09-15 13:07:50 -04:00
Guillermo del Angel	a942fa38ef	Refine the way we merge records in CombineVariants of different types. As of before, two records of different types were not combined and were kept separate. This is still the case, except when the alleles of one record are a strict subset of alleles of another record. For example, a SNP with alleles {A,T} and a mixed record with alleles {A,T, AAT} are now combined when start position matches.	2011-09-15 10:22:28 -04:00
David Roazen	3db457ed01	Revert "Modified VariantEval FunctionalClass stratification to remove hardcoded GenomicAnnotator keynames" After discussing this with Mark, it seems clear that the old version of the VariantEval FunctionalClass stratification is preferable to this version. By reverting, we maintain backwards compatibility with legacy output files from the old GenomicAnnotator, and can add SnpEff support later without breaking that backwards compatibility. This reverts commit b44acd1abd9ab6eec37111a19fa797f9e2ca3326.	2011-09-14 10:47:28 -04:00
David Roazen	e0c8c0ddcb	Modified VariantEval FunctionalClass stratification to remove hardcoded GenomicAnnotator keynames This is a temporary and hopefully short-lived solution. I've modified the FunctionalClass stratification to stratify by effect impact as defined by SnpEff annotations (high, moderate, and low impact) rather than by the silent/missense/nonsense categories. If we want to bring back the silent/missense/nonsense stratification, we should probably take the approach of asking the SnpEff author to add it as a feature to SnpEff rather than coding it ourselves, since the whole point of moving to SnpEff was to outsource genomic annotation.	2011-09-14 07:09:47 -04:00
David Roazen	1213b2f8c6	SnpEff 2.0.2 support -Rewrote SnpEff support in VariantAnnotator to support the latest SnpEff release (version 2.0.2) -Removed support for SnpEff 1.9.6 (and associated tribble codec) -Will refuse to parse SnpEff output files produced by unsupported versions (or without a version tag) -Correctly matches ref/alt alleles before annotating a record, unlike the previous version -Correctly handles indels (again, unlike the previous version	2011-09-14 07:09:47 -04:00
Guillermo del Angel	5b1bf6e244	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-09-13 17:04:43 -04:00
Guillermo del Angel	c6672f2397	Intermediate (but necessary) fix for Beagle walkers: if a marker is absent in the Beagle output files, but present in the input vcf, there's no reason why it should be omitted in the output vcf. Rather, the vc is written as is from the input vcf	2011-09-13 16:57:37 -04:00
Mark DePristo	edf29d0616	Explicit info message about uploading S3 log	2011-09-12 22:16:52 -04:00
Mark DePristo	2316b6aad3	Trying to fix problems with S3 uploading behind firewalls -- Cannot reproduce the very long waits reported by some users. -- Fixed problem that exception might result in an undeleted file, which is now fixed with deleteOnExit()	2011-09-12 22:02:42 -04:00
Matt Hanna	64707c33bb	Merged bug fix from Stable into Unstable	2011-09-12 21:54:11 -04:00
Matt Hanna	e63d9d8f8e	Mauricio pointed out to me that dynamic merging the unmapped regions of multiple BAMs ('-L unmapped' with a BAM list) was completely broken. Sorry about this! Fixed.	2011-09-12 21:50:59 -04:00
Eric Banks	ec4b30de6d	Patch from Laurent: typo leads to bad error messages.	2011-09-12 14:45:53 -04:00
David Roazen	9d9d438bc4	New VariantAnnotatorEngine capability: an initialize() method for all annotation classes. All VariantAnnotator annotation classes may now have an (optional) initialize() method that gets called by the VariantAnnotatorEngine ONCE before annotation starts. As an example of how this can be used, the SnpEff annotation class will use the initialize() method to check whether the SnpEff version number stored in the vcf header is a supported version, and also to verify that its required RodBinding is present.	2011-09-12 13:00:53 -04:00
Ryan Poplin	981b78ea50	Changing the VQSR command line syntax back to the parsed tags approach. This cleans up the code and makes sure we won't be parsing the same rod file multiple times. I've tried to update the appropriate qscripts.	2011-09-12 12:17:43 -04:00
Ryan Poplin	60ebe68aff	Fixing issue in VariantEval in which insertion and deletion events weren't treated symmetrically. Added new option to require strict allele matching.	2011-09-12 09:43:23 -04:00
Guillermo del Angel	9344938360	Uncomment code to add deleted bases covering an indel to per-sample genotype reporting, update integration tests accordingly	2011-09-10 19:41:01 -04:00
Guillermo del Angel	b399424a9c	Fix integration test affected by non-calling all-zero PL samples, and add a more complicated multi-sample integration test from a phase 1 case, GBR with mixed technologies and complex input alleles	2011-09-09 20:44:47 -04:00
Guillermo del Angel	e95d484757	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-09-09 18:31:14 -04:00
Guillermo del Angel	a807205fc3	a) Minor optimization to softMax() computation to avoid redundant operations, results in about 5-10% increase in speed in indel calling. b) Added (but left commented out since it may affect integration tests and to isolate commits) fix to per-sample DP reporting, so that deletions are included in count. c) Bug fix to avoid having non-reference genotypes assigned to samples with PL=0,0,0. Correct behavior should be to no-call these samples, and to ignore these samples when computing AC distribution since their likelihoods are not informative.	2011-09-09 18:00:23 -04:00
Mauricio Carneiro	9e650dfc17	Fixing SelectVariants documentation getting rid of messages telling users to go for the YAML file. The idea is to not support these anymore.	2011-09-09 16:25:31 -04:00
Mark DePristo	72536e5d6d	Done	2011-09-09 15:44:47 -04:00
Mark DePristo	3c8445b934	Performance bugfix for GenomeLoc.hashcode -- old version overflowed so most GenomeLocs had 0 hashcode. Now uses or not plus to combine	2011-09-09 14:25:37 -04:00
Mark DePristo	c6436ee5f0	Whitespace cleanup	2011-09-09 14:24:29 -04:00
Mark DePristo	87dc5cfb24	Whitespace cleanup	2011-09-09 14:23:42 -04:00
Ryan Poplin	1953edcd2d	updating Validate Variants deletion integration test	2011-09-09 13:39:08 -04:00
Ryan Poplin	9ada9b3ed4	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-09-09 13:15:36 -04:00
Ryan Poplin	354529bff3	adding Validate Variants integration test with a deletion	2011-09-09 13:15:24 -04:00
Ryan Poplin	91c949db74	Fixing ValidateVariants so that it validates deletion records. Fixing GATKdocs.	2011-09-09 12:57:14 -04:00
Mark DePristo	06cb20f2a5	Intermediate commit cleaning up scatter intervals -- Adding unit tests to ensure uniformity of intervals	2011-09-09 12:56:45 -04:00
Eric Banks	51eb95d638	Missed these tests before	2011-09-09 11:46:37 -04:00
Eric Banks	6ad8943ca0	CompOverlap no longer keeps track of the number of comp sites since it wasn't (and cannot) keeping track of them correctly.	2011-09-09 09:45:24 -04:00
Mark DePristo	507574b1c8	Merge branch 'cancer'	2011-09-08 16:10:02 -04:00
Mark DePristo	48461b34af	Added TYPE argument to print out VariantType	2011-09-08 15:01:13 -04:00
Eric Banks	eaaba6eb51	Confirming that when stratifying by sample in VE the monomorphic sites for a given sample are not counted for the relevant metrics. Adding integration test to cover it.	2011-09-08 13:17:34 -04:00
Ryan Poplin	2636d216de	Adding indel vqsr integration test	2011-09-08 10:38:13 -04:00
Ryan Poplin	9cba1019c8	Another fix for genotype given alleles for indels. Expanding the indel integration tests to include multiallelics and indel records that overlap	2011-09-08 09:25:13 -04:00
Ryan Poplin	e0020b2b29	Fixing PrintRODs. Now has input and only prints out one copy of each record	2011-09-08 08:58:37 -04:00
Ryan Poplin	29c968ab60	clean up	2011-09-08 08:42:43 -04:00
Ryan Poplin	59841f8232	Fixing genotype given alleles for indels. Only take the records that start at this locus.	2011-09-08 08:41:16 -04:00
Mark DePristo	cd2c511c4a	GCF improvements -- Support for streaming VCF writing via the VCFWriter interface -- GCF now has a header and a footer. The header is minimal, and contains a forward pointer to the position of the footer in the file. -- Readers now read the header, and then jump to the footer to get the rest of the "header" information -- Version now a field in GCF	2011-09-07 23:28:46 -04:00
Mark DePristo	fe5724b6ea	Refactored indexing part of StandardVCFWriter into superclass -- Now other implementations of the VCFWriter can easily share common functions, such as writing an index on the fly	2011-09-07 23:27:08 -04:00
Mark DePristo	01b6177ce1	Renaming GVCF -> GCF	2011-09-07 17:10:56 -04:00
Mark DePristo	b220ed0d75	Merge branch 'master' into rodrewrite	2011-09-07 17:05:35 -04:00
Guillermo del Angel	45d54f6258	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-09-07 16:49:49 -04:00
Guillermo del Angel	9604fb2ba3	Necessary but not sufficient step to fix GenotypeGivenAlleles mode in UG which is now busted	2011-09-07 16:49:16 -04:00
Mark DePristo	2ded027762	Removed dysfunctional tranches support from VariantEval	2011-09-07 16:09:24 -04:00
Eric Banks	aa9e32f2f1	Reverting Mark's previous commit as per the open discussion. Now the eval modules check isPolymorphic() before accruing stats when appropriate. Fixed the IndelLengthHistogram module not to error out if the indel isn't simple (that would have been bad). Only integration test that needed to be updated was the tranches one based on a separate commit from Mark.	2011-09-07 15:48:06 -04:00
Mark DePristo	d7e355b4b6	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-09-07 14:54:16 -04:00
Mark DePristo	9127849f5d	BugFix for unit test	2011-09-07 14:54:10 -04:00
Eric Banks	3a04955a30	We already had isPolymorphic and isMonomorphic in the VariantContext, but the implementation was incorrect for many edge cases (e.g. sites-only files, sites with samples who were no-called). Fixing. Moving on to VE now.	2011-09-07 14:01:42 -04:00
Guillermo del Angel	743bf7784c	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-09-07 13:21:26 -04:00
Guillermo del Angel	5f22ef9a8c	Added missing javadoc info to Beagle arguments	2011-09-07 13:21:11 -04:00
Mark DePristo	3bcbfa6e06	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-09-07 13:13:17 -04:00
Mark DePristo	430da23446	At least 2 minutes must pass before a status message is printed, further stabilizing time estimates	2011-09-07 13:13:07 -04:00
Mauricio Carneiro	6857d0324e	Merge branch 'master' into rr	2011-09-07 12:59:08 -04:00
Mark DePristo	7e9e20fed0	Forgot to delete previous call	2011-09-07 12:54:52 -04:00
Mark DePristo	d23d620494	Pushing traversal engine timer start to as close to actual start as possible -- Should make initial timings more accurate	2011-09-07 12:52:33 -04:00
Mark DePristo	6ff432e1f2	BugFix for TF argument to VariantEval, actually making it work properly	2011-09-07 12:50:17 -04:00
Mauricio Carneiro	131cb7effd	Bringing Reduce Reads bug fixes to the main repository	2011-09-07 12:25:53 -04:00
Mark DePristo	a1920397e8	Major bugfix for per sample VariantEval -- per sample stratification was not being calculated correctly. The alt allele was always remaining, even if the genotype of the sample was hom-ref. Although conceptually fine, this breaks the assumptions of all of the eval modules, so per sample stratifications actually included all variants for everything. Eric is going to fix the system in general, so this commit may break the build.	2011-09-07 12:18:11 -04:00
Mark DePristo	a02636a1ac	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/ebanks/Sting_rodrefactor into rodrewrite	2011-09-07 10:50:00 -04:00
Mark DePristo	d5641cfac5	Merge branch 'variantEvalST'	2011-09-07 10:44:23 -04:00
Mark DePristo	2f4cf82e3b	VariantEval cleanup. Added VariantType Stratification -- ArrayList are List where possible -- states refactored into VariantStratifier base class (reduces many lines of duplicate code) -- Added VariantType stratification that partitions report by VariantContext.Type	2011-09-07 10:43:53 -04:00
Christopher Hartl	436f6eb52b	Reverting Eric's change and pushing in some command-line-option documentation.	2011-09-07 08:53:30 -04:00
Eric Banks	1ef8a1750a	I asked nicely and got nothing. Then I threatened and still got nothing. So I am carrying through on my threats. Guillermo, you have a short reprieve because you were away on vacation, but let's get yours done tomorrow afternoon.	2011-09-06 21:07:49 -04:00
Eric Banks	da9c8ab386	Revving the Tribble jar where the DbsnpCodec class was renamed to OldDbsnpCodec. Updating GATK code accordingly.	2011-09-06 20:39:42 -04:00
Mark DePristo	3db7ecb920	ReducedRead flag cached in GATKSAMRecord. 20% performance improvement	2011-09-06 15:11:38 -04:00
Roger Zurawicki	47607a7eff	Fixed bug where deletions messed up interval clipping - Instead of using readLength, the ReadUtil function are used to get a proper read coordinate - Added debug info in interval clipping ( with -dl) NOTE: method might not be safe for production and checks need to be added to the ClippingOp code	2011-09-06 14:25:57 -04:00
Khalid Shakir	0adb388dee	Fixed bug in SelectVariants that was annotating sample_file / exclude_sample_file as @Argument instead of @Input meaning they weren't tracked in Queue. Updates for HybridSelectionPipeline: - Use VQSR on SNPs for projects using bait set whole_exome_agilent_1 and applying cut at 98.5. - If a whole_exome_agilent_1 project has less than 50 samples also mixing in 1000G samples to reach VQSR thresholds. - Updated SNP hard filters based on analysis done with ebanks to approximate VQSR results on small target batches. - Removed GSA_PRODUCTION_ONLY flag from indel caller. - Updated indel hard filters based on delangel's analysis. - Updated HybridSelectionPipelineTest to use HARD SNP filters only, for now.	2011-09-06 12:41:46 -04:00
Mark DePristo	d471617c65	GATK binary VCF (gvcf) prototype format for efficiency testing -- Very minimal working version that can read / write binary VCFs with genotypes -- Already 10x faster for sites, 5x for fully parsed genotypes, and 1000x for skipping genotypes when reading	2011-09-02 21:15:19 -04:00
Mark DePristo	048202d18e	Bugfix for cached quals	2011-09-02 21:13:28 -04:00
Mark DePristo	03aa04e37c	Simple refactoring to make formating functions public	2011-09-02 21:13:08 -04:00
Mark DePristo	124ef6c483	MISSING_VALUE now gets defaultValue in getAttribute functions	2011-09-02 21:12:28 -04:00
Mark DePristo	82f2131777	Simplied getAttributeAsX interfaces -- Removed versions getAttribriteAsX(key) that except on not having the value. -- Removed version that getAttributeAsXNoException(key) -- The only available assessors are now getAttributeAsX(key, default). -- This single accessors properly handle their argument types, so if the value is a double it is returned directly for getAttributeAsDouble(), or if it's a string it's converted to a double. If the key isn't found, default is returned.	2011-09-02 12:27:11 -04:00
Mauricio Carneiro	08ae6c0c61	ReadClipper is now handling unmapped reads	2011-09-02 11:32:30 -04:00
Mark DePristo	c57198a1b9	Optimizations in VCFCodec -- Don't create an empty LinkedHashSet() for PASS fields. Just return Collections.emptySet() instead. -- For filter fields with actual values, returns an unmodifiableSet instead of one that can be changed	2011-09-02 08:46:17 -04:00
Mark DePristo	c3ea96d856	Removing many unused functions of unquestionable purpose	2011-09-02 08:42:01 -04:00
Eric Banks	d241f0e903	Adding docs for the pcr error rate argument.	2011-09-01 21:57:02 -04:00
Eric Banks	827fe6130c	Adding hidden printing option. Also, always run UG in mode GENOTYPE_GIVEN_ALLELES given that we don't actually test for the correct alleles (otherwise UG may choose a different allele and we may falsely validate the wrong one).	2011-09-01 11:40:35 -04:00

1 2 3 4 5 ...

736 Commits (68da555932761567f79f0b3bfdbc808dcf72ca5e)