Commit Graph

822 Commits (a5e75cd14cf7a8b6d3c4c8271302468fc5f256b4)

Author SHA1 Message Date
Mark DePristo c0bb0cb465 Make DiploidGenotype enum private to walkers.genotyper 2011-09-24 08:48:33 -04:00
Guillermo del Angel 3a4469a236 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-23 21:58:34 -04:00
Guillermo del Angel 0e74cc3c74 a) Treat SNP genotype likelihoods just as indels, in the sense that they're always normalized as PL's so one of them will always be zero. This creates minor numerical differences in Qual and annotations due to numerical approximations in AF computation.
b) Intermediate CombineVariants fixes, not ready yet
2011-09-23 21:58:20 -04:00
Khalid Shakir 1803bd6ae2 Merged bug fix from Stable into Unstable 2011-09-23 21:05:00 -04:00
Khalid Shakir 8ceb93b8ac Fixed an integration test which crashed on the out of date LSF DRMAA library when run against the obsolete LSF dotkit instead of .combined_LSF_SGE 2011-09-23 21:03:22 -04:00
Mauricio Carneiro 7cac75ae1d Merged bug fix from Stable into Unstable 2011-09-23 19:00:43 -04:00
Mauricio Carneiro fbe3c1e0b3 Adding warning on HardClipping
Hard Clipping is still under heavy development and should not be used by anyone less prepared than MacGyver.
2011-09-23 19:00:19 -04:00
Mark DePristo b66841f179 Static cache for binomial probability
-- Very low level performance optimization
2011-09-23 17:29:34 -04:00
Mauricio Carneiro 1a45c331b2 bringing the latest bug fixes to Reduce Reads 2011-09-23 16:40:06 -04:00
Mauricio Carneiro 9ea40f2e41 Deletions/Insertions in hard clip and bug fixes
* Deletions now count as hard clipped bases in order to recover the original alignment start of a clipped read.
* Insertions do not  count as hard clipped bases for the same reason.
* This created a bug in the previous cigar cleaning function. Fixed.
2011-09-23 16:37:08 -04:00
David Roazen 40202c85e0 Merged bug fix from Stable into Unstable 2011-09-23 16:35:55 -04:00
David Roazen e1cb5f6459 SnpEff annotator now assigns a functional class to each effect and distinguishes between actual effects and mere modifiers.
-We now assign a functional class (nonsense, missense, silent, or none) to each SnpEff effect, and add a
 SNPEFF_FUNCTIONAL_CLASS annotation to the INFO field of the output VCF.
-Effects are now prioritized according to both biological impact and functional class, instead of impact only.
-Many of SnpEff's "low-impact" effects are now classified as "modifiers" with lower priority than every
 other effect. This includes such "effects" as DOWNSTREAM, UPSTREAM, INTRON, GENE, EXON, and others that
 really describe the location of the variant rather than its biological effect.

This code will be short-lived (likely 1.2-only), as the next version of SnpEff will include most of these
features directly.

Checking this change into Stable+Unstable instead of Unstable because the current functional class stratification
in VariantEval is basically broken and urgently needs to be fixed for production purposes.
2011-09-23 16:06:52 -04:00
Matt Hanna e388c357ca Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-23 14:53:28 -04:00
Matt Hanna cc23b0b8a9 Fix for recent change modelling unmapped shards: don't invoke optimization to combine mapped and unmapped shards. 2011-09-23 14:52:31 -04:00
Mark DePristo e3d4efb283 Remove N2 EXACT model code, which should never be used 2011-09-23 11:55:21 -04:00
Mark DePristo 27ce3c822e Merge branch 'stable' 2011-09-23 09:04:52 -04:00
Mark DePristo 2bb77a7978 Docs for all VariantAnnotator annotations 2011-09-23 09:04:16 -04:00
Mark DePristo dd65ba5bae @Hidden for DocumentationTest and GATKDocsExample 2011-09-23 09:03:37 -04:00
Mark DePristo dfce301beb Looks for @Hidden annotation on all classes and excludes them from the docs 2011-09-23 09:03:04 -04:00
Mark DePristo 106a26c42d Minor file cleanup 2011-09-23 08:25:20 -04:00
Mark DePristo a9f073fa68 Genotype merging unit tests for simpleMerge
-- Remaining TODOs are all for GdA
2011-09-23 08:24:49 -04:00
Mark DePristo 4397ce8653 Moved removePLs to VariantContextUtils 2011-09-23 08:24:20 -04:00
Eric Banks a8e0fb26ea Updating md5 because the file changed 2011-09-23 07:33:20 -04:00
Mark DePristo c49cc623de Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-22 17:26:21 -04:00
Mark DePristo dab7232e9a simpleMerge UnitTest for not annotating and annotating to different info key 2011-09-22 17:26:11 -04:00
Mark DePristo 30ab3af0c8 A few more simpleMerge UnitTest tests for filtered vcs 2011-09-22 17:14:59 -04:00
Mark DePristo 5cf82f9236 simpleMerge UnitTest tests filtered VC merging 2011-09-22 17:05:12 -04:00
Mark DePristo 46ca33dc04 TestDataProvider now can be named 2011-09-22 17:04:32 -04:00
Mauricio Carneiro 96c875399c Merging many bug fixes to reduce reads 2011-09-22 17:04:11 -04:00
Mauricio Carneiro 39b54211d0 Fixed hard clipping soft clipped bases after hard clips
if soft clipped bases were after a hard clipped section of the read, the hard clip was clipping the left soft clip tail as if it were a right tail. Mayhem.
2011-09-22 15:46:55 -04:00
Mark DePristo 68da555932 UnitTest for simpleMerge for alleles 2011-09-22 15:16:37 -04:00
Mauricio Carneiro 1acf7945c5 Fixed hard clipped cigar and alignment start
* Hard clipped Cigar now includes all insertions that were hard clipped and not the deletions.
* The alignment start is now recalculated according to the new hard clipped cigar representation
2011-09-22 14:51:14 -04:00
Eric Banks 80d7300de4 Unit test was passing in FORMAT as one of the sample names. There used to be a hack in the VCFHeader to check for this and remove it and I couldn't figure out why, but now I know. Hack was removed and now the unit test passes in only the sample names as per the contract. 2011-09-22 13:28:42 -04:00
Mauricio Carneiro 4e9020c9f7 Fixed alignment start for hard clipping insertions 2011-09-22 13:28:25 -04:00
Eric Banks 9c1728416c Revert "Updating md5 for fixed file" because this was fixed properly in unstable (but will break SnpEff if put into Stable).
This reverts commit 6b4182c6ab3e214da4c73bc6f3687ac6d1c0b72c.
2011-09-22 13:16:42 -04:00
Eric Banks 888d8697b1 Merged bug fix from Stable into Unstable 2011-09-22 13:16:31 -04:00
Eric Banks 15a410b24b Updating md5 for fixed file 2011-09-22 13:15:41 -04:00
Mark DePristo ba5f83fee2 start of VariantContextUtils UnitTest
-- tests rsID merging
2011-09-22 12:10:39 -04:00
Mark DePristo 93dd1faa5f Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-22 11:20:10 -04:00
Mark DePristo a05c959e5a Empty unit tests for VariantContextUtils
-- will be expanded over the day
2011-09-22 11:20:07 -04:00
Mark DePristo 3fdee2b9ed Merge from stable into unstable 2011-09-22 11:19:43 -04:00
Christopher Hartl 4f4a0fc38a Merge branch 'master' of ssh://gsa2/humgen/gsa-scr1/chartl/dev/git 2011-09-22 11:01:58 -04:00
Christopher Hartl 982c47bfa7 Remove duplicate effort in ReadUtils (with apologies to Mauricio)
Big (but not major) cleanup of code in ILG - mostly excising the old likelihood model
Activated the early-abort check for ILG. I think it should be better this way.
2011-09-22 10:58:26 -04:00
Mark DePristo c514df6d18 Merge of stable into unstable 2011-09-22 10:34:27 -04:00
Mark DePristo f81a41b889 Updating MD5s for CombineVariants
-- Old version had broken RSIDs, new version is fixed.  No longer see rs1234,. as it is now just rs1234
2011-09-22 10:30:25 -04:00
Eric Banks b8ea9ceb68 Adding integration test that uses the -V:dbsnp binding to make sure it won't fail later on if someone messes with Tribble. 2011-09-21 22:43:31 -04:00
Eric Banks 8f8b59a932 My interpretation of the VCF spec is that the FORMAT field should only be present if there is genotype/sample data. So the VCFCodec now throws an exception when it encounters such a case. I had to fix one of the integration test VCFs. 2011-09-21 22:23:28 -04:00
Christopher Hartl dc96f6da79 Merge branch 'master' of ssh://chartl@gsa2/humgen/gsa-scr1/chartl/dev/git 2011-09-21 18:18:41 -04:00
Christopher Hartl f9cdc119af Added a method to ReadUtils that converts reads of the form 10S20M10S to 40M (just unclips the soft-clips).
Be careful when using this - if you're writing a bam file it will be potentially written out of order (since the previous alignment start was at the M, not the S).
2011-09-21 18:16:42 -04:00
Christopher Hartl faff6e4019 Failed to commit changes to the GATKReport required for more easy access when using the files as data sources (read: histograms) for walkers 2011-09-21 18:15:23 -04:00
Mauricio Carneiro 96768c8a18 Sending latest bug fixes to Reduce Reads to the main repository 2011-09-21 17:43:11 -04:00
Mauricio Carneiro 70335b2b0a Hard clipping soft clipped reads to fix misalignments.
Pre-softclipped reads (with high qual) are a complicated event to deal with in the Reduced Reads environment. I chose to hard clip them out for now and added a todo item to bring them back on in the future, perhaps as a variant region.
2011-09-21 17:12:01 -04:00
Christopher Hartl ef05827c7b Merge branch 'master' of ssh://chartl@tin.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-21 16:40:47 -04:00
Christopher Hartl 3b51d9106a Adding in likelihood calculations for mendelian violations. Also fixing a minor and rare bug in SelectVariants when specifying family structure on the command line. 2011-09-21 16:40:29 -04:00
Mark DePristo 04968c88b3 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-21 15:43:25 -04:00
Mark DePristo 6bcfce225f Fix for dynamic type determination for bgzip files
-- GZipInputStream handles bgzip files under linux, but not mac
-- Added BlockCompressedInputStream test as well, which works properly on bgzip files
2011-09-21 15:39:19 -04:00
Mark DePristo 9f6f0c443c Marginally cleaner isVCFStream() function
-- cleanup trying to debug minor bug.  Failed to fix the bug, but the code is nicer now
2011-09-21 15:25:01 -04:00
Ryan Poplin 5fef6dc5d0 Merged bug fix from Stable into Unstable 2011-09-21 15:23:06 -04:00
Ryan Poplin 2585fc3d6c Updating Rscript path doc text for Broad users 2011-09-21 15:22:26 -04:00
Mark DePristo 74f9ccf6dd Merge 2011-09-21 11:30:11 -04:00
Mark DePristo 6592972f82 Putative fix for BAQ array out of bounds
-- Old code required qual to be <64, which isn't strictly necessary.  Now uses the Picard SAMUtils.MAX_PHRED_SCORE constant
-- Unittest to enforce this behavior
2011-09-21 11:25:08 -04:00
Eric Banks 174859fc68 Don't allow whitespace in the INFO field 2011-09-21 11:14:54 -04:00
Mark DePristo ecc7f34774 Putative fix for BAQ problem. 2011-09-21 11:09:54 -04:00
Mark DePristo 7d11f93b82 Final bugfix for CombineVariants
-- Now handles multiple records at a site, so that you don't see records like set=dbsnp-dbsnp-dbsnp when combining something with dbsnp
-- Proper handling of ids.  If you are merging files with multiple ids for the same record, the ids are merged into a comma separated list
2011-09-21 10:58:32 -04:00
Mark DePristo a91ac0c5db Intermediate commit of bugfixes to CombineVariants 2011-09-21 10:15:05 -04:00
David Roazen b04d8eab55 Merged bug fix from Stable into Unstable 2011-09-20 17:24:14 -04:00
Mauricio Carneiro 758ecf2d43 Bringing latest updates of ReduceReads to the master repository 2011-09-20 16:35:09 -04:00
David Roazen d9ea764611 SnpEff annotator now adds OriginalSnpEffVersion and OriginalSnpEffCmd lines to the header of the VCF output file.
This change is urgently required for production, which is why it's going into Stable+Unstable
instead of just Unstable.

The keys for the SnpEff version and command header lines in the VCF file output by
VariantAnnotator (OriginalSnpEffVersion and OriginalSnpEffCmd) are intentionally
different from the keys for those same lines in the SnpEff output file (SnpEffVersion
and SnpEffCmd), so that output files from VariantAnnotator won't be confused
with output files from SnpEff itself.
2011-09-20 16:30:55 -04:00
Mark DePristo bffd3cca6f Bug fix for reduced read; only adds regular bases for calculation
-- No longer passes on deletions for genotyping
2011-09-20 15:07:06 -04:00
Mark DePristo a1b4cafe7a Bug fix for NPE when timer wasn't initialized 2011-09-20 13:59:59 -04:00
Mark DePristo b7511c5ff3 Fixed long-standing bug in tribble index creation
-- Previously, on the fly indices didn't have dictionary set on the fly, so the GATK would read, add dictionary, and rewrite the index.  This is now fixed, so that the on the fly index contains the reference dictionary when first written, avoiding the unnecessary read and write
-- Added a GenomeAnalysisEngine and Walker function called getMasterSequenceDictionary() that fetches the reference sequence dictionary.  This can be used conveniently everywhere, and is what's written into the Tribble index
-- Refactored tribble index utilities from RMDTrackBuilder into IndexDictionaryUtils
-- VCFWriter now requires the master sequence dictionary
-- Updated walkers that create VCFWriters to provide the master sequence dictionary
2011-09-20 10:53:18 -04:00
Mark DePristo 230e16d7c0 Merge branch 'master' into rodrewrite 2011-09-20 06:54:18 -04:00
Mark DePristo aa8afa3899 Merge 2011-09-19 21:16:47 -04:00
Mauricio Carneiro 56106d54ed Changing ReadUtils behavior to comply with GenomeLocParser
Now the functions getRefCoordSoftUnclippedStart and getRefCoordSoftUnclippedEnd will return getUnclippedStart if the read is all contained within an insertion. Updated the contracts accordingly. This should give the same behavior as the GenomeLocParser now.
2011-09-19 14:00:00 -04:00
Mauricio Carneiro 080c957547 Fixing contracts for SoftUnclippedEnd utils
Now accepts reads that are entirely contained inside an insertion.
2011-09-19 13:53:53 -04:00
Mauricio Carneiro 5e832254a4 Fixing ReadAndInterval overlap comments. 2011-09-19 13:28:41 -04:00
Christopher Hartl ecb8466662 Merged bug fix from Stable into Unstable 2011-09-19 12:32:08 -04:00
Christopher Hartl 8143def292 Fix the -T argument in the DepthOfCoverage docs
Add documentation for the RefSeqCodec, pointing users to the wiki page describing how to create the file
2011-09-19 12:31:47 -04:00
Christopher Hartl 034b868588 Revert "Fix the -T argument in the DepthOfCoverage docs"
This reverts commit 0994efda998cf3a41b1a43696dbc852a441d5316.
2011-09-19 12:16:07 -04:00
Mark DePristo cfde0e674b Merge branch 'sgintervals' 2011-09-19 12:02:41 -04:00
Mark DePristo 3e93f246f7 Support for sample sets in AssignSomaticStatus
-- Also cleaned up SampleUtils.getSamplesFromCommandLine() to return a set, not a list, and trim the sample names.
2011-09-19 11:40:45 -04:00
Mark DePristo 41ffb25b74 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-19 10:55:18 -04:00
Christopher Hartl ca1b30e4a4 Fix the -T argument in the DepthOfCoverage docs
Add documentation for the RefSeqCodec, pointing users to the wiki page describing how to create the file
2011-09-19 10:29:06 -04:00
Mark DePristo 4ad330008d Final intervals cleanup
-- No functional changes (my algorithm wouldn't work)
-- Major structural cleanup (returning more basic data structures that allow us to development new algorithm)
-- Unit tests for the efficiency of interval partitioning
2011-09-19 10:19:10 -04:00
Mark DePristo 6ea57bf036 Merge branch 'master' into sgintervals 2011-09-19 09:50:19 -04:00
Mark DePristo 6bd42c053d Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-18 20:18:39 -04:00
Roger Zurawicki 091c7197cd Fixed memory leak and bug with deletions in clipping
The ClippingOp clip cigar function would run into a endless loop if the parameter were out of the reads range, I stopped the bug.
* There is no check to make sure the read coordinate are covered by the read though
When Hard clipping to interval, I added a check for deletions.
NOTE: method works for NA12878 WEx but needs to be more thoroughly tested/optimized
2011-09-18 19:21:51 -04:00
Guillermo del Angel 7fa1e237d9 Forgot to git stash pop new MD5's for CombineVariants integration test 2011-09-16 12:53:54 -04:00
Guillermo del Angel e7b9a009b7 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-16 12:48:30 -04:00
Menachem Fromer b2e8e11128 Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-16 00:52:27 -04:00
Christopher Hartl 57b3efa2e2 Merge branch 'master' of ssh://chartl@tin.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-15 21:06:38 -04:00
Christopher Hartl 939babc820 Updating formating for ValidationAmplicons GATK docs 2011-09-15 21:05:51 -04:00
Christopher Hartl 9fdf1f8eb6 Fix some doc formatting for Depth of Coverage 2011-09-15 21:05:22 -04:00
Menachem Fromer e6e9b08c9a Must provide alleles VCF to UGCallVariants 2011-09-15 18:51:09 -04:00
David Roazen d78e00e5b2 Renaming VariantAnnotator SnpEff keys
This is to head off potential confusion with the output from the SnpEff tool itself,
which also uses a key named EFF.
2011-09-15 17:42:15 -04:00
Eric Banks 1971fb35d7 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-15 16:55:33 -04:00
Eric Banks 9dc6354130 Oops didn't mean to touch this test before 2011-09-15 16:55:24 -04:00
Ryan Poplin 2a8b8efd2f Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-15 16:26:35 -04:00
Ryan Poplin 2f58fdb369 Adding expected output doc to CountCovariates 2011-09-15 16:26:11 -04:00
Eric Banks fd1831b4a5 Updating docs to include more details 2011-09-15 16:25:03 -04:00