Commit Graph

8071 Commits (fee9b367e49e8fdc798c9d511ecb1f7b9eed9efd)

Author SHA1 Message Date
Mark DePristo fee9b367e4 VariantContext genotypes are now stored as GenotypeMap objects
-- Enables further sophisticated optimizations, as this class can be smarter about storing the data and will directly support operations like subset to samples
-- All instances in the gatk that used Map<String, Genotype> now use GenotypeMap type.
-- Amazingly, there were many places where HashMap<String, Genotype> is used, so that the order of the genotypes is technically undefined and could be dangerous.  Now everything uses GenotypeMap with a specific ordering of samples (by name)
-- Integrationtests updated and all pass
2011-11-11 15:00:35 -05:00
Mark DePristo 4938569b3a More general handling of parameters for VariantContextBenchmark 2011-11-11 10:22:19 -05:00
Mark DePristo ef9f8b5d46 Added subContextOfSamples to VariantContext
-- This is a more convenient accesssor than subContextOfGenotypes, represents nearly all of the use cases of the former function, and potentially can be implemented more efficiently.
2011-11-11 10:07:11 -05:00
Mark DePristo e216e85465 First working version of VariantContextBenchmark 2011-11-11 09:56:00 -05:00
Mark DePristo ee40791776 Attributes are now Map<String,Object> not Map<String,?>
-- Allows us to avoid an unnecessary copy when creating InferredGeneticContext (whose name really needs to change).
2011-11-11 09:55:42 -05:00
Eric Banks 59945a41e8 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-11-10 23:05:20 -05:00
Eric Banks 0c32281484 Adding a benchmarking class for parsing VCF files. Not complete. 2011-11-10 23:05:13 -05:00
Mauricio Carneiro 9c013374fd A walker to calculate the coverage of a target
in targeted sequencing projects, we pay a penalty to get to a minimum coverage in 80% of the targets. This walker will help us understand what is the ratio between the targeted site (usually in the middle of the interval) and the targeted region.
2011-11-10 17:16:51 -05:00
Mauricio Carneiro ffa6bc66ec Eliminating excessive debug tests 2011-11-10 17:16:51 -05:00
Mauricio Carneiro 5a1170078a Using centralized reduce read facilities 2011-11-10 17:16:51 -05:00
Mark DePristo dc9b351b5e Meaningful error message when an IntervalArg file fails to parse correctly 2011-11-10 17:10:26 -05:00
Mark DePristo bb7bf74aa8 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-11-10 16:05:43 -05:00
Mark DePristo 153e52ffed VariantEvalIntegrationTest for IntervalStratification 2011-11-10 14:10:39 -05:00
Mauricio Carneiro 060c7ce8ae It wouldn't harm integrationtests if we had our logic right... :-) 2011-11-10 14:03:22 -05:00
Mauricio Carneiro bb4cd59475 Filtered and consensus reads will now use the same tag 2011-11-10 13:58:31 -05:00
Mauricio Carneiro 7a46273d75 Consensus reads had filtered data read names
fixed.
2011-11-10 13:58:31 -05:00
Mauricio Carneiro c14b182501 Add reads in the recursive call
was missing consensus reads that got added from the recursive call. This is was a side-effect of the filtered data implementation. Fixed.
2011-11-10 13:58:31 -05:00
Ryan Poplin 07dbf0bd40 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-11-10 13:39:24 -05:00
Ryan Poplin 26762d6c6f Folding recent HMM changes into Haplotype Caller. Misc bug fixes throughout HC. 2011-11-10 13:36:03 -05:00
Eric Banks 39678b6a20 Check for reads with missing read groups and throw a UserException when encountered. Mauricio said this wouldn't break integration tests. 2011-11-10 13:34:45 -05:00
Mark DePristo 18f829f76b Towards a full G1KPhaseI table creation script 2011-11-10 13:27:54 -05:00
Mark DePristo dd1810140f -stratIntervals is optional 2011-11-10 13:27:32 -05:00
Mark DePristo 67b022c34b Cleanup for new SampleUtils function
-- getVCFHeadersFromRods(rods) is now available so that you don't have getVCFHeadersFromRods(rods, null) throughout the codebase
2011-11-10 13:27:13 -05:00
Ryan Poplin 9490d71bc8 Folding recent HMM changes into Haplotype Caller. Misc bug fixes throughout HC. 2011-11-10 13:26:29 -05:00
Mark DePristo 35fe9c8a06 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-11-10 11:11:33 -05:00
Mark DePristo 714cac21c9 Testdata for IntervalStratification 2011-11-10 11:08:34 -05:00
Mark DePristo dc4932f93d VariantEval module to stratify the variants by whether they overlap an interval set
The primary use of this stratification is to provide a mechanism to divide asssessment of a call set up by whether a variant overlaps an interval or not.  I use this to differentiate between variants occurring in CCDS exons vs. those in non-coding regions, in the 1000G call set, using a command line that looks like:

-T VariantEval -R human_g1k_v37.fasta -eval 1000G.vcf -stratIntervals:BED ccds.bed -ST IntervalStratification

Note that the overlap algorithm properly handles symbolic alleles with an INFO field END value.  In order to safely use this module you should provide entire contigs worth of variants, and let the interval strat decide overlap, as opposed to using -L which will not properly work with symbolic variants.

Minor improvements to create() interval in GenomeLocParser.
2011-11-10 10:58:40 -05:00
Mauricio Carneiro 0d8983feee outputting the RG information
setReadGroup now sets the read group attribute for the GATKSAMRecord
2011-11-09 23:35:00 -05:00
Eric Banks 315ac68b0b Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-11-09 22:37:36 -05:00
Eric Banks 6313aae2c4 Adding checks for hasBasePileup() before calling getBasePileup() as per GS thread 2011-11-09 22:37:26 -05:00
Ryan Poplin 74a18d3de8 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-11-09 22:29:40 -05:00
Ryan Poplin 24712c0221 Merged bug fix from Stable into Unstable 2011-11-09 22:28:27 -05:00
Ryan Poplin 8942406aa2 Use MathUtils to compare doubles instead of testing for equality 2011-11-09 22:05:21 -05:00
Ryan Poplin 348f2db7fd Fix for HMM optimization. If the two penalty arrays match exactly the function should return the end of the array instead of 0. 2011-11-09 22:00:52 -05:00
Mauricio Carneiro 9a4486a9e6 BaseCounts now include N's
Fixing unit tests accordingly.
2011-11-09 21:29:33 -05:00
Eric Banks 82bf09edf3 Mark Standard Annotations with an asterisk 2011-11-09 20:42:31 -05:00
Eric Banks 04b122be29 Fix for bug reported on GetSatisfaction 2011-11-09 20:33:36 -05:00
Mauricio Carneiro d00b2c6599 Adding a synthetic read for filtered data
* Generalized the concept of a synthetic read to cread both running consensus and a synthetic reads of filtered data.
* Synthetic reads can now have deletions (but not insertions)
* New reduced read tag for filtered data synthetic reads *(RF)*
* Sliding window header now keeps information of consensus and filtered data
* Synthetic reads are created simultaneously, new functionality is controlled internally by addToSyntheticReads
2011-11-09 20:16:22 -05:00
Mauricio Carneiro 3afbd0e526 Sliding Window Header now includes filtered data information
This is a necessary framework for the filtered data consensus reads to be produced.
2011-11-09 20:16:22 -05:00
Mauricio Carneiro 6ee90ada14 Quick optimization to the SlidingWindow builder
from O(n^2) to O(n). Not bad.
2011-11-09 20:16:21 -05:00
Eric Banks 21bf43f3bb Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-11-09 15:34:40 -05:00
Eric Banks 02d5e3025e Added integration test for intervals from bed file 2011-11-09 15:34:19 -05:00
Christopher Hartl 85bffe1dca Merged bug fix from Stable into Unstable 2011-11-09 15:29:14 -05:00
Christopher Hartl d828eba7f4 Allow comments in a table-formatted file to precede the header line. 2011-11-09 15:27:38 -05:00
Eric Banks 8205efbb29 Merge branch 'master' into intervals 2011-11-09 15:27:15 -05:00
Eric Banks d64f8a89a9 Instead of the SelfScopingFeatureCodec interface, pushed this functionality into Tribble itself. Now we can e.g. determine that a file can be parsed by the BedCodec on the fly. 2011-11-09 15:24:29 -05:00
Mark DePristo 0111e58d4e Don't generate PDF unless you have -run specified 2011-11-09 14:45:40 -05:00
Mark DePristo 29df96a77b First working version of G1KPhase1SummaryTable.scala 2011-11-09 14:43:53 -05:00
Mauricio Carneiro f080f64f99 Preserve RG information on new GATKSAMRecord from SAMRecord 2011-11-09 14:39:20 -05:00
Mauricio Carneiro f9530e0768 Clean unnecessary attributes from the read
this gives on average 40% file size reduction.
2011-11-09 14:39:20 -05:00