Commit Graph

7680 Commits (ffdfdcde3ff3340d822693cb27efc8f7b6aaeeb4)

Author SHA1 Message Date
Mark DePristo 9f6f0c443c Marginally cleaner isVCFStream() function
-- cleanup trying to debug minor bug.  Failed to fix the bug, but the code is nicer now
2011-09-21 15:25:01 -04:00
Ryan Poplin 5fef6dc5d0 Merged bug fix from Stable into Unstable 2011-09-21 15:23:06 -04:00
Ryan Poplin 2585fc3d6c Updating Rscript path doc text for Broad users 2011-09-21 15:22:26 -04:00
Mark DePristo 74f9ccf6dd Merge 2011-09-21 11:30:11 -04:00
Mark DePristo 6592972f82 Putative fix for BAQ array out of bounds
-- Old code required qual to be <64, which isn't strictly necessary.  Now uses the Picard SAMUtils.MAX_PHRED_SCORE constant
-- Unittest to enforce this behavior
2011-09-21 11:25:08 -04:00
Eric Banks 174859fc68 Don't allow whitespace in the INFO field 2011-09-21 11:14:54 -04:00
Mark DePristo ecc7f34774 Putative fix for BAQ problem. 2011-09-21 11:09:54 -04:00
Mark DePristo 7d11f93b82 Final bugfix for CombineVariants
-- Now handles multiple records at a site, so that you don't see records like set=dbsnp-dbsnp-dbsnp when combining something with dbsnp
-- Proper handling of ids.  If you are merging files with multiple ids for the same record, the ids are merged into a comma separated list
2011-09-21 10:58:32 -04:00
Mark DePristo b36d396c16 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-21 10:16:24 -04:00
Mark DePristo 34f435565c Accidentally committed unclean tribble jar to repo 2011-09-21 10:16:17 -04:00
Mark DePristo a91ac0c5db Intermediate commit of bugfixes to CombineVariants 2011-09-21 10:15:05 -04:00
Mauricio Carneiro ac4f2d6d34 Fixing choppy consensus reads
When the consensus read had holes in the middle, the consensus was being finalized but not properly reinitialized. It was restarting with the old coordinates of the finalized consensus, misaligning following bases.
2011-09-21 00:49:50 -04:00
Mark DePristo 48c413fee8 Now throws an error when the mismatch fraction is too high 2011-09-20 21:28:31 -04:00
Mark DePristo 3b9314aecf Max fraction of mismatch test for debugging
-- Useful example for individuals who want to compute mismatches between a read and the reference.
2011-09-20 20:42:18 -04:00
David Roazen b04d8eab55 Merged bug fix from Stable into Unstable 2011-09-20 17:24:14 -04:00
Mauricio Carneiro 758ecf2d43 Bringing latest updates of ReduceReads to the master repository 2011-09-20 16:35:09 -04:00
David Roazen d9ea764611 SnpEff annotator now adds OriginalSnpEffVersion and OriginalSnpEffCmd lines to the header of the VCF output file.
This change is urgently required for production, which is why it's going into Stable+Unstable
instead of just Unstable.

The keys for the SnpEff version and command header lines in the VCF file output by
VariantAnnotator (OriginalSnpEffVersion and OriginalSnpEffCmd) are intentionally
different from the keys for those same lines in the SnpEff output file (SnpEffVersion
and SnpEffCmd), so that output files from VariantAnnotator won't be confused
with output files from SnpEff itself.
2011-09-20 16:30:55 -04:00
Mark DePristo bffd3cca6f Bug fix for reduced read; only adds regular bases for calculation
-- No longer passes on deletions for genotyping
2011-09-20 15:07:06 -04:00
Mark DePristo 83bb91020f Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-20 14:52:54 -04:00
Menachem Fromer a97e039a62 Thanks to Chris for instructing me to use VCFExtractIntervals to get proper scattering of Variant Annotation 2011-09-20 14:29:39 -04:00
Mark DePristo 827c942c80 Rev tribble 2011-09-20 14:01:14 -04:00
Mark DePristo a1b4cafe7a Bug fix for NPE when timer wasn't initialized 2011-09-20 13:59:59 -04:00
Mauricio Carneiro 08ffb18b96 Renaming datasets in the MDCP
Making dataset names and files generated by the MDCP more uniform.
2011-09-20 11:02:51 -04:00
Mark DePristo b7511c5ff3 Fixed long-standing bug in tribble index creation
-- Previously, on the fly indices didn't have dictionary set on the fly, so the GATK would read, add dictionary, and rewrite the index.  This is now fixed, so that the on the fly index contains the reference dictionary when first written, avoiding the unnecessary read and write
-- Added a GenomeAnalysisEngine and Walker function called getMasterSequenceDictionary() that fetches the reference sequence dictionary.  This can be used conveniently everywhere, and is what's written into the Tribble index
-- Refactored tribble index utilities from RMDTrackBuilder into IndexDictionaryUtils
-- VCFWriter now requires the master sequence dictionary
-- Updated walkers that create VCFWriters to provide the master sequence dictionary
2011-09-20 10:53:18 -04:00
Ryan Poplin efce2ece9d Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-20 09:08:38 -04:00
Ryan Poplin 5d0705acd6 Adding quality scores to the VCF records created by the Haplotype Caller 2011-09-20 09:07:28 -04:00
Mark DePristo 230e16d7c0 Merge branch 'master' into rodrewrite 2011-09-20 06:54:18 -04:00
Khalid Shakir b507bd946c Merged bug fix from Stable into Unstable 2011-09-20 00:18:16 -04:00
Khalid Shakir 61b89e236a To work around potential problem with invalid javax.mail 1.4.1 in ivy cache, added explicit javax.mail 1.4.4 along with build.xml code to remove 1.4.1. 2011-09-20 00:14:35 -04:00
Mark DePristo aa8afa3899 Merge 2011-09-19 21:16:47 -04:00
Mauricio Carneiro 56106d54ed Changing ReadUtils behavior to comply with GenomeLocParser
Now the functions getRefCoordSoftUnclippedStart and getRefCoordSoftUnclippedEnd will return getUnclippedStart if the read is all contained within an insertion. Updated the contracts accordingly. This should give the same behavior as the GenomeLocParser now.
2011-09-19 14:00:00 -04:00
Mauricio Carneiro 080c957547 Fixing contracts for SoftUnclippedEnd utils
Now accepts reads that are entirely contained inside an insertion.
2011-09-19 13:53:53 -04:00
Eric Banks ba150570f3 Updating to use new rod system syntax plus name change for CountRODs 2011-09-19 13:30:32 -04:00
Mauricio Carneiro 5e832254a4 Fixing ReadAndInterval overlap comments. 2011-09-19 13:28:41 -04:00
Christopher Hartl ecb8466662 Merged bug fix from Stable into Unstable 2011-09-19 12:32:08 -04:00
Christopher Hartl 8143def292 Fix the -T argument in the DepthOfCoverage docs
Add documentation for the RefSeqCodec, pointing users to the wiki page describing how to create the file
2011-09-19 12:31:47 -04:00
Eric Banks 095f75ff7d Merge branch 'master' of ssh://gsa1.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-19 12:24:12 -04:00
Eric Banks 85626e7a5d We no longer want people to use the August 2010 Dindel calls for indel realignment but instead Guillermo's new whole genome bi-allelic indel calls; updating the bundle accordingly. Also, there was some confusion by the 1000G data processing folks as to exactly what these indel files are, so I've renamed them so that it's clear. Wiki updated too. 2011-09-19 12:24:05 -04:00
Christopher Hartl 034b868588 Revert "Fix the -T argument in the DepthOfCoverage docs"
This reverts commit 0994efda998cf3a41b1a43696dbc852a441d5316.
2011-09-19 12:16:07 -04:00
Mark DePristo cfde0e674b Merge branch 'sgintervals' 2011-09-19 12:02:41 -04:00
Mark DePristo 3e93f246f7 Support for sample sets in AssignSomaticStatus
-- Also cleaned up SampleUtils.getSamplesFromCommandLine() to return a set, not a list, and trim the sample names.
2011-09-19 11:40:45 -04:00
Mark DePristo 41ffb25b74 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-19 10:55:18 -04:00
Christopher Hartl ca1b30e4a4 Fix the -T argument in the DepthOfCoverage docs
Add documentation for the RefSeqCodec, pointing users to the wiki page describing how to create the file
2011-09-19 10:29:06 -04:00
Mark DePristo 4ad330008d Final intervals cleanup
-- No functional changes (my algorithm wouldn't work)
-- Major structural cleanup (returning more basic data structures that allow us to development new algorithm)
-- Unit tests for the efficiency of interval partitioning
2011-09-19 10:19:10 -04:00
Mark DePristo 6ea57bf036 Merge branch 'master' into sgintervals 2011-09-19 09:50:19 -04:00
Mark DePristo 6bd42c053d Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-18 20:18:39 -04:00
Mark DePristo bed78b47e0 Marginally better formating, with hours the default time 2011-09-18 20:18:18 -04:00
Roger Zurawicki 091c7197cd Fixed memory leak and bug with deletions in clipping
The ClippingOp clip cigar function would run into a endless loop if the parameter were out of the reads range, I stopped the bug.
* There is no check to make sure the read coordinate are covered by the read though
When Hard clipping to interval, I added a check for deletions.
NOTE: method works for NA12878 WEx but needs to be more thoroughly tested/optimized
2011-09-18 19:21:51 -04:00
Ryan Poplin 67cca5196c Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-18 12:13:01 -04:00
Ryan Poplin cb4a50b147 Adding ability to try both small and large kmer lengths. Highest likelihood wins. 2011-09-17 16:42:49 -04:00