Commit Graph

8106 Commits (101ffc4dfd5ce83f7d6bf0b5de4a99ebfac447de)

Author SHA1 Message Date
Mauricio Carneiro 7a46273d75 Consensus reads had filtered data read names
fixed.
2011-11-10 13:58:31 -05:00
Mauricio Carneiro c14b182501 Add reads in the recursive call
was missing consensus reads that got added from the recursive call. This is was a side-effect of the filtered data implementation. Fixed.
2011-11-10 13:58:31 -05:00
Ryan Poplin 07dbf0bd40 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-11-10 13:39:24 -05:00
Ryan Poplin 26762d6c6f Folding recent HMM changes into Haplotype Caller. Misc bug fixes throughout HC. 2011-11-10 13:36:03 -05:00
Eric Banks 39678b6a20 Check for reads with missing read groups and throw a UserException when encountered. Mauricio said this wouldn't break integration tests. 2011-11-10 13:34:45 -05:00
Mark DePristo 18f829f76b Towards a full G1KPhaseI table creation script 2011-11-10 13:27:54 -05:00
Mark DePristo dd1810140f -stratIntervals is optional 2011-11-10 13:27:32 -05:00
Mark DePristo 67b022c34b Cleanup for new SampleUtils function
-- getVCFHeadersFromRods(rods) is now available so that you don't have getVCFHeadersFromRods(rods, null) throughout the codebase
2011-11-10 13:27:13 -05:00
Ryan Poplin 9490d71bc8 Folding recent HMM changes into Haplotype Caller. Misc bug fixes throughout HC. 2011-11-10 13:26:29 -05:00
Mark DePristo 35fe9c8a06 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-11-10 11:11:33 -05:00
Mark DePristo 714cac21c9 Testdata for IntervalStratification 2011-11-10 11:08:34 -05:00
Mark DePristo dc4932f93d VariantEval module to stratify the variants by whether they overlap an interval set
The primary use of this stratification is to provide a mechanism to divide asssessment of a call set up by whether a variant overlaps an interval or not.  I use this to differentiate between variants occurring in CCDS exons vs. those in non-coding regions, in the 1000G call set, using a command line that looks like:

-T VariantEval -R human_g1k_v37.fasta -eval 1000G.vcf -stratIntervals:BED ccds.bed -ST IntervalStratification

Note that the overlap algorithm properly handles symbolic alleles with an INFO field END value.  In order to safely use this module you should provide entire contigs worth of variants, and let the interval strat decide overlap, as opposed to using -L which will not properly work with symbolic variants.

Minor improvements to create() interval in GenomeLocParser.
2011-11-10 10:58:40 -05:00
Mauricio Carneiro 0d8983feee outputting the RG information
setReadGroup now sets the read group attribute for the GATKSAMRecord
2011-11-09 23:35:00 -05:00
Eric Banks 315ac68b0b Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-11-09 22:37:36 -05:00
Eric Banks 6313aae2c4 Adding checks for hasBasePileup() before calling getBasePileup() as per GS thread 2011-11-09 22:37:26 -05:00
Ryan Poplin 74a18d3de8 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-11-09 22:29:40 -05:00
Ryan Poplin 24712c0221 Merged bug fix from Stable into Unstable 2011-11-09 22:28:27 -05:00
Ryan Poplin 8942406aa2 Use MathUtils to compare doubles instead of testing for equality 2011-11-09 22:05:21 -05:00
Ryan Poplin 348f2db7fd Fix for HMM optimization. If the two penalty arrays match exactly the function should return the end of the array instead of 0. 2011-11-09 22:00:52 -05:00
Mauricio Carneiro 9a4486a9e6 BaseCounts now include N's
Fixing unit tests accordingly.
2011-11-09 21:29:33 -05:00
Eric Banks 82bf09edf3 Mark Standard Annotations with an asterisk 2011-11-09 20:42:31 -05:00
Eric Banks 04b122be29 Fix for bug reported on GetSatisfaction 2011-11-09 20:33:36 -05:00
Mauricio Carneiro d00b2c6599 Adding a synthetic read for filtered data
* Generalized the concept of a synthetic read to cread both running consensus and a synthetic reads of filtered data.
* Synthetic reads can now have deletions (but not insertions)
* New reduced read tag for filtered data synthetic reads *(RF)*
* Sliding window header now keeps information of consensus and filtered data
* Synthetic reads are created simultaneously, new functionality is controlled internally by addToSyntheticReads
2011-11-09 20:16:22 -05:00
Mauricio Carneiro 3afbd0e526 Sliding Window Header now includes filtered data information
This is a necessary framework for the filtered data consensus reads to be produced.
2011-11-09 20:16:22 -05:00
Mauricio Carneiro 6ee90ada14 Quick optimization to the SlidingWindow builder
from O(n^2) to O(n). Not bad.
2011-11-09 20:16:21 -05:00
Eric Banks 21bf43f3bb Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-11-09 15:34:40 -05:00
Eric Banks 02d5e3025e Added integration test for intervals from bed file 2011-11-09 15:34:19 -05:00
Christopher Hartl 85bffe1dca Merged bug fix from Stable into Unstable 2011-11-09 15:29:14 -05:00
Christopher Hartl d828eba7f4 Allow comments in a table-formatted file to precede the header line. 2011-11-09 15:27:38 -05:00
Eric Banks 8205efbb29 Merge branch 'master' into intervals 2011-11-09 15:27:15 -05:00
Eric Banks d64f8a89a9 Instead of the SelfScopingFeatureCodec interface, pushed this functionality into Tribble itself. Now we can e.g. determine that a file can be parsed by the BedCodec on the fly. 2011-11-09 15:24:29 -05:00
Mark DePristo 0111e58d4e Don't generate PDF unless you have -run specified 2011-11-09 14:45:40 -05:00
Mark DePristo 29df96a77b First working version of G1KPhase1SummaryTable.scala 2011-11-09 14:43:53 -05:00
Mauricio Carneiro f080f64f99 Preserve RG information on new GATKSAMRecord from SAMRecord 2011-11-09 14:39:20 -05:00
Mauricio Carneiro f9530e0768 Clean unnecessary attributes from the read
this gives on average 40% file size reduction.
2011-11-09 14:39:20 -05:00
Mauricio Carneiro 9427ada498 Fixing no cigar bug
empty GATKSAMRecords will have a null cigar. Treat them accordingly.
2011-11-09 14:39:20 -05:00
Mark DePristo e639f0798e mergeEvals allows you to treat -eval 1.vcf -eval 2.vcf as a single call set
-- A bit of code cleanup in VCFUtils
-- VariantEval table to create 1000G Phase I variant summary table
-- First version of 1000G Phase I summary table Qscript
2011-11-09 14:35:50 -05:00
Christopher Hartl 149b79eaad Merged bug fix from Stable into Unstable 2011-11-09 11:26:30 -05:00
Christopher Hartl 11abb4f9d1 Better error message. 2011-11-09 11:25:28 -05:00
Christopher Hartl d3a533b82e Revert "a"
This reverts commit 1175f50ddbf389f5da74d27dc725596582ae15af.
2011-11-09 11:22:26 -05:00
Christopher Hartl 5eaf800281 a 2011-11-09 11:22:20 -05:00
Christopher Hartl 5451fbc2b2 Merged bug fix from Stable into Unstable 2011-11-09 11:06:15 -05:00
Christopher Hartl 091229e4db MVLikelihoodRatio now checks if the family string is provided before attempting to instantiate. Also check that variant contexts have both genotypes and genotype likelihoods.
Table codec now yells at users for not providing a HEADER with the table - parsing tables without a header line was causing the first line of the file to be eaten.
Table feature now has a toString method.

These are minor bug fixes.
2011-11-09 11:03:29 -05:00
Mauricio Carneiro e1b4c3968f Fixing GATKSAMRecord bug
when constructing a GATKSAMRecord from scratch, we should set "mRestOfBinaryData" to null so the BAMRecord doesn't try to retrieve missing information from the non-existent bam file.
2011-11-08 16:50:36 -05:00
Guillermo del Angel 8e519e7d5d Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-11-08 16:07:12 -05:00
Guillermo del Angel c49efcf631 Two small improvements to ValidationSiteSelector: a) add argument to ignore genotypes, and make sure selection works in sites-only VCF (in which case we can't select for polymorphism in samples), b) add ability to include filtered sites if requested 2011-11-08 16:06:18 -05:00
Ryan Poplin e973ca2010 fixing merge conflict. 2011-11-08 14:55:05 -05:00
Ryan Poplin b0e6afec48 Bug fix for HMM optimization. Need to also check the gap continuation penalty array for the index with the first discrepancy. 2011-11-08 14:51:25 -05:00
Ryan Poplin 94dc447a70 Merged bug fix from Stable into Unstable 2011-11-07 15:26:35 -05:00
Ryan Poplin 0b181be61f Bug fix in SelectVariants when using a discordance track but no sample specifications. Added integration test to test this. 2011-11-07 15:25:16 -05:00