Commit Graph

707 Commits (48c413fee899082d3edf05ecc9c3381e4ab52a9e)

Author SHA1 Message Date
David Roazen b04d8eab55 Merged bug fix from Stable into Unstable 2011-09-20 17:24:14 -04:00
Mauricio Carneiro 758ecf2d43 Bringing latest updates of ReduceReads to the master repository 2011-09-20 16:35:09 -04:00
David Roazen d9ea764611 SnpEff annotator now adds OriginalSnpEffVersion and OriginalSnpEffCmd lines to the header of the VCF output file.
This change is urgently required for production, which is why it's going into Stable+Unstable
instead of just Unstable.

The keys for the SnpEff version and command header lines in the VCF file output by
VariantAnnotator (OriginalSnpEffVersion and OriginalSnpEffCmd) are intentionally
different from the keys for those same lines in the SnpEff output file (SnpEffVersion
and SnpEffCmd), so that output files from VariantAnnotator won't be confused
with output files from SnpEff itself.
2011-09-20 16:30:55 -04:00
Mark DePristo bffd3cca6f Bug fix for reduced read; only adds regular bases for calculation
-- No longer passes on deletions for genotyping
2011-09-20 15:07:06 -04:00
Mark DePristo a1b4cafe7a Bug fix for NPE when timer wasn't initialized 2011-09-20 13:59:59 -04:00
Mark DePristo b7511c5ff3 Fixed long-standing bug in tribble index creation
-- Previously, on the fly indices didn't have dictionary set on the fly, so the GATK would read, add dictionary, and rewrite the index.  This is now fixed, so that the on the fly index contains the reference dictionary when first written, avoiding the unnecessary read and write
-- Added a GenomeAnalysisEngine and Walker function called getMasterSequenceDictionary() that fetches the reference sequence dictionary.  This can be used conveniently everywhere, and is what's written into the Tribble index
-- Refactored tribble index utilities from RMDTrackBuilder into IndexDictionaryUtils
-- VCFWriter now requires the master sequence dictionary
-- Updated walkers that create VCFWriters to provide the master sequence dictionary
2011-09-20 10:53:18 -04:00
Mark DePristo 230e16d7c0 Merge branch 'master' into rodrewrite 2011-09-20 06:54:18 -04:00
Mark DePristo aa8afa3899 Merge 2011-09-19 21:16:47 -04:00
Mauricio Carneiro 56106d54ed Changing ReadUtils behavior to comply with GenomeLocParser
Now the functions getRefCoordSoftUnclippedStart and getRefCoordSoftUnclippedEnd will return getUnclippedStart if the read is all contained within an insertion. Updated the contracts accordingly. This should give the same behavior as the GenomeLocParser now.
2011-09-19 14:00:00 -04:00
Mauricio Carneiro 080c957547 Fixing contracts for SoftUnclippedEnd utils
Now accepts reads that are entirely contained inside an insertion.
2011-09-19 13:53:53 -04:00
Mauricio Carneiro 5e832254a4 Fixing ReadAndInterval overlap comments. 2011-09-19 13:28:41 -04:00
Christopher Hartl ecb8466662 Merged bug fix from Stable into Unstable 2011-09-19 12:32:08 -04:00
Christopher Hartl 8143def292 Fix the -T argument in the DepthOfCoverage docs
Add documentation for the RefSeqCodec, pointing users to the wiki page describing how to create the file
2011-09-19 12:31:47 -04:00
Christopher Hartl 034b868588 Revert "Fix the -T argument in the DepthOfCoverage docs"
This reverts commit 0994efda998cf3a41b1a43696dbc852a441d5316.
2011-09-19 12:16:07 -04:00
Mark DePristo cfde0e674b Merge branch 'sgintervals' 2011-09-19 12:02:41 -04:00
Mark DePristo 3e93f246f7 Support for sample sets in AssignSomaticStatus
-- Also cleaned up SampleUtils.getSamplesFromCommandLine() to return a set, not a list, and trim the sample names.
2011-09-19 11:40:45 -04:00
Mark DePristo 41ffb25b74 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-19 10:55:18 -04:00
Christopher Hartl ca1b30e4a4 Fix the -T argument in the DepthOfCoverage docs
Add documentation for the RefSeqCodec, pointing users to the wiki page describing how to create the file
2011-09-19 10:29:06 -04:00
Mark DePristo 4ad330008d Final intervals cleanup
-- No functional changes (my algorithm wouldn't work)
-- Major structural cleanup (returning more basic data structures that allow us to development new algorithm)
-- Unit tests for the efficiency of interval partitioning
2011-09-19 10:19:10 -04:00
Mark DePristo 6ea57bf036 Merge branch 'master' into sgintervals 2011-09-19 09:50:19 -04:00
Mark DePristo 6bd42c053d Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-18 20:18:39 -04:00
Roger Zurawicki 091c7197cd Fixed memory leak and bug with deletions in clipping
The ClippingOp clip cigar function would run into a endless loop if the parameter were out of the reads range, I stopped the bug.
* There is no check to make sure the read coordinate are covered by the read though
When Hard clipping to interval, I added a check for deletions.
NOTE: method works for NA12878 WEx but needs to be more thoroughly tested/optimized
2011-09-18 19:21:51 -04:00
Guillermo del Angel 7fa1e237d9 Forgot to git stash pop new MD5's for CombineVariants integration test 2011-09-16 12:53:54 -04:00
Guillermo del Angel e7b9a009b7 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-16 12:48:30 -04:00
Menachem Fromer b2e8e11128 Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-16 00:52:27 -04:00
Christopher Hartl 57b3efa2e2 Merge branch 'master' of ssh://chartl@tin.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-15 21:06:38 -04:00
Christopher Hartl 939babc820 Updating formating for ValidationAmplicons GATK docs 2011-09-15 21:05:51 -04:00
Christopher Hartl 9fdf1f8eb6 Fix some doc formatting for Depth of Coverage 2011-09-15 21:05:22 -04:00
Menachem Fromer e6e9b08c9a Must provide alleles VCF to UGCallVariants 2011-09-15 18:51:09 -04:00
David Roazen d78e00e5b2 Renaming VariantAnnotator SnpEff keys
This is to head off potential confusion with the output from the SnpEff tool itself,
which also uses a key named EFF.
2011-09-15 17:42:15 -04:00
Eric Banks 1971fb35d7 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-15 16:55:33 -04:00
Eric Banks 9dc6354130 Oops didn't mean to touch this test before 2011-09-15 16:55:24 -04:00
Ryan Poplin 2a8b8efd2f Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-15 16:26:35 -04:00
Ryan Poplin 2f58fdb369 Adding expected output doc to CountCovariates 2011-09-15 16:26:11 -04:00
Eric Banks fd1831b4a5 Updating docs to include more details 2011-09-15 16:25:03 -04:00
Eric Banks 6d02a34bfb Updating docs to include output 2011-09-15 16:17:54 -04:00
Eric Banks 4ef6a4598c Updating docs to include output 2011-09-15 16:10:34 -04:00
Eric Banks fe474b77f8 Updating docs so printing looks nicer 2011-09-15 16:05:39 -04:00
Eric Banks f04e51c6c2 Adding docs from Andrey since his repo was all screwed up. 2011-09-15 15:38:56 -04:00
Guillermo del Angel 86480b2e13 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-15 15:31:07 -04:00
Eric Banks d369d10593 Adding documentation before the release for GATK wiki page 2011-09-15 13:56:23 -04:00
Eric Banks 202405b1a1 Updating the FunctionalClass stratification in VariantEval to handle the snpEff annotations; this change really needs to be in before the release so that the pipeline can output semi-meaningful plots. This commit maintains backwards compatibility with the crappy Genomic Annotator output. However, I did clean up the code a bit so that we now use an Enum instead of hard-coded values (so it's now much easier to change things if we choose to do so in the future). I do not see this as the final commit on this topic - I think we need to make some changes to the snpEff annotator to preferentially choose certain annotations within effect classes; Mark, let's chat about this for a bit when you get back next week. Also, for the record, I should be blamed for David's temporary commit the other day because I gave him the green light (since when do you care about backwards compatibility anyways?). In any case, at least now we have something that works for both the old and new annotations. 2011-09-15 13:52:31 -04:00
David Roazen 1e682deb26 Minor html-formatting-related documentation fix to the SnpEff class. 2011-09-15 13:07:50 -04:00
Guillermo del Angel a942fa38ef Refine the way we merge records in CombineVariants of different types. As of before, two records of different types were not combined and were kept separate. This is still the case, except when the alleles of one record are a strict subset of alleles of another record. For example, a SNP with alleles {A*,T} and a mixed record with alleles {A*,T, AAT} are now combined when start position matches. 2011-09-15 10:22:28 -04:00
David Roazen 3db457ed01 Revert "Modified VariantEval FunctionalClass stratification to remove hardcoded GenomicAnnotator keynames"
After discussing this with Mark, it seems clear that the old version of the
VariantEval FunctionalClass stratification is preferable to this version.
By reverting, we maintain backwards compatibility with legacy output files
from the old GenomicAnnotator, and can add SnpEff support later without
breaking that backwards compatibility.

This reverts commit b44acd1abd9ab6eec37111a19fa797f9e2ca3326.
2011-09-14 10:47:28 -04:00
David Roazen e0c8c0ddcb Modified VariantEval FunctionalClass stratification to remove hardcoded GenomicAnnotator keynames
This is a temporary and hopefully short-lived solution. I've modified
the FunctionalClass stratification to stratify by effect impact as
defined by SnpEff annotations (high, moderate, and low impact) rather
than by the silent/missense/nonsense categories.

If we want to bring back the silent/missense/nonsense stratification,
we should probably take the approach of asking the SnpEff author
to add it as a feature to SnpEff rather than coding it ourselves,
since the whole point of moving to SnpEff was to outsource genomic
annotation.
2011-09-14 07:09:47 -04:00
David Roazen 1213b2f8c6 SnpEff 2.0.2 support
-Rewrote SnpEff support in VariantAnnotator to support the latest SnpEff release (version 2.0.2)
-Removed support for SnpEff 1.9.6 (and associated tribble codec)
-Will refuse to parse SnpEff output files produced by unsupported versions (or without a version tag)
-Correctly matches ref/alt alleles before annotating a record, unlike the previous version
-Correctly handles indels (again, unlike the previous version
2011-09-14 07:09:47 -04:00
Guillermo del Angel 5b1bf6e244 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-13 17:04:43 -04:00
Guillermo del Angel c6672f2397 Intermediate (but necessary) fix for Beagle walkers: if a marker is absent in the Beagle output files, but present in the input vcf, there's no reason why it should be omitted in the output vcf. Rather, the vc is written as is from the input vcf 2011-09-13 16:57:37 -04:00
Mark DePristo edf29d0616 Explicit info message about uploading S3 log 2011-09-12 22:16:52 -04:00