Commit Graph

1257 Commits (28b286ad3967b966d4a1f3e95f36cbea482528e7)

Author SHA1 Message Date
Mark DePristo 28b286ad39 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-11-30 09:11:53 -05:00
Ryan Poplin 91413cf0d9 Merged bug fix from Stable into Unstable 2011-11-29 14:01:23 -05:00
Ryan Poplin cb284eebde Further updating VQSR tutorial wiki docs to reflect the bundle 2011-11-29 14:00:57 -05:00
Ryan Poplin dcb889665d Merged bug fix from Stable into Unstable 2011-11-29 09:58:49 -05:00
Ryan Poplin 447e9bff9e Updating VQSR tutorial wiki docs to reflect the bundle 2011-11-29 09:57:45 -05:00
Ryan Poplin 110298322c Adding Transmission Disequilibrium Test annotation to VariantAnnotator and integration test to test it. 2011-11-29 09:29:18 -05:00
Eric Banks d7d8b8e380 Tribble v42 changes the Codec.canDecode method to take in a String instead of a File; this is something that Jim was adamant about (because Tribble can handle streams other than files). I didn't want the next person who needed to rev Tribble to deal with this change additionally, so I took care of updating the GATK now. 2011-11-28 14:18:28 -05:00
Mark DePristo 3c36428a20 Bug fix for TiTv calculation -- shouldn't be rounding 2011-11-28 10:20:34 -05:00
Eric Banks 436b4dc855 Updated docs 2011-11-28 08:59:48 -05:00
Mark DePristo e60272975a Fix for changed MD5 in streaming VCF test 2011-11-23 19:01:33 -05:00
Mark DePristo 12f09d88f9 Removing references to SimpleMetricsByAC 2011-11-23 16:08:18 -05:00
Mark DePristo e319079c32 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-11-23 13:02:11 -05:00
Mark DePristo 4107636144 VariantEval updates
-- Performance optimizations
-- Tables now are cleanly formatted (floats are %.2f printed)
-- VariantSummary is a standard report now
-- Removed CompEvalGenotypes (it didn't do anything)
-- Deleted unused classes in GenotypeConcordance
-- Updates integration tests as appropriate
2011-11-23 13:02:07 -05:00
David Roazen e5b85f0a78 A toString() method for IntervalBindings
Necessary since we're currently writing things like this to our VCF headers:
intervals=[org.broadinstitute.sting.commandline.IntervalBinding@4ce66f56]
2011-11-23 11:56:12 -05:00
Mark DePristo 5a4856b82e GATKReports now support a format field per column
-- You can tell the table to format your object with "%.2f" for example.
2011-11-23 11:31:04 -05:00
Mark DePristo c8bf7d2099 Check for null comment 2011-11-23 10:47:21 -05:00
Mark DePristo 6c2555885c Caching getSimpleName() in VariantEval is a big performance improvement
-- Removed the SimpleMetricsByAC table, as one should just use the AlleleCount Stratefication and the upcoming VariantSummary table
2011-11-23 08:34:05 -05:00
Guillermo del Angel 32adbd614f Solve merge conflict 2011-11-22 22:48:46 -05:00
Guillermo del Angel 941f3784dc Solve merge conflict 2011-11-22 22:48:03 -05:00
Guillermo del Angel 75d93e6335 Another corner condition fix: skip likelihood computation in case we cut so many bases there's no haplotype or read left 2011-11-22 22:46:12 -05:00
Mark DePristo a3aef8fa53 Final performance optimization for GenotypesContext 2011-11-22 17:19:30 -05:00
Mark DePristo 990c02e4de Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-11-22 17:19:11 -05:00
Guillermo del Angel 38a90da92c Fixed merge conflict to Unstable 2011-11-22 14:39:45 -05:00
Guillermo del Angel 32a77a8a56 Prevent out of bound error in case read span > reference context + indel length. Can happen in RNAseq reads with long N CIGAR operators in the middle. 2011-11-22 13:57:24 -05:00
Eric Banks 5821c11fad For BAM and Reviewed errors we now check the error message to see if it's actually a 'too many open files' problem and, if so, we generate a User Error instead. 2011-11-22 10:50:22 -05:00
Mark DePristo 7087310373 Embarassing bug fixed 2011-11-22 10:16:36 -05:00
Mark DePristo e484625594 GenotypesContext now updates cached data for add, set, replace operations when possible
-- Involved separately managing the sample -> offset and sample sorted list operations.  This should improve performance throughout the system
2011-11-22 08:40:48 -05:00
Mark DePristo 29ca24694a UG now encoding NO_CALLs as ./. not ./.:.:4:0,0,0
A few updated UGs integration tests
2011-11-22 08:22:32 -05:00
Mark DePristo 2b51c01df4 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-11-21 19:16:06 -05:00
Mark DePristo 5443d3634a Again, fixing the add call when we really mean replace
-- Updating MD5s for UG to reflect that what was previously called ./.:.:10:0,0,0 is now just ./.  Eric will fix long-standing bug in QD observed from this change
-- VFW MD5s restored to their old correct values.  There was a bug in my implementation to caused the genotypes to not be parsed from the lazy output even through the header was incorrect.
2011-11-21 19:15:56 -05:00
Mauricio Carneiro 5ad3dfcd62 BugFix: byte overflow in SyntheticRead compressed base counts
* fixed and added unit test
2011-11-21 17:11:50 -05:00
Mark DePristo 9ea7b70a02 Added decode method to LazyGenotypesContext
-- AbstractVCFCodec calls this if the samples are not sorted.  Previously called getGenotypes() which didn't actually trigger the decode
2011-11-21 16:21:23 -05:00
Mark DePristo ab2efe3bd3 Reverting bad exact model changes 2011-11-21 16:14:40 -05:00
Eric Banks 44554b2bfd Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-11-21 15:01:45 -05:00
Eric Banks 022832bd74 Very bad use of the == operator with Strings was ensuring that validating GenomeLocs was very inefficient. This fix resulted in a significant speedup for a simple RodWalker. 2011-11-21 14:49:47 -05:00
Mark DePristo 1561af22af Exact model code cleanup
-- Fixed up code when fixing a bug detected by aggressive contracts in GenotypesContext.
2011-11-21 14:35:15 -05:00
Mark DePristo 2c501364b8 GenotypesContext no longer have immutability in constructor
-- additional bug fixes throughout VariantContext and GenotypesContext objects
2011-11-21 14:34:31 -05:00
David Roazen 1296dd41be Removing the legacy -L "interval1;interval2" syntax
This syntax predates the ability to have multiple -L arguments, is
inconsistent with the syntax of all other GATK arguments, requires
quoting to avoid interpretation by the shell, and was causing
problems in Queue.

A UserException is now thrown if someone tries to use this syntax.
2011-11-21 13:18:53 -05:00
Mark DePristo e467b8e1ae More contracts on LazyGenotypesContext 2011-11-21 09:34:57 -05:00
Mark DePristo 2e9ecf639e Generalized interface to LazyGenotypesContext
-- Now you provide a LazyParsing object
-- LazyGenotypesContext now knows nothing about the VCF parser itself.  The parser holds all of the necessary data to parse the VCF genotypes when necessarily, and the LGC only has a pointer to this object
-- Using new interface added LazyGenotypesContext to unit tests with a simple lazy version
-- Deleted VCFParser interface, as it was no longer necessary
2011-11-21 09:30:40 -05:00
Mark DePristo f0ac588d32 Extensive unit test for GenotypeContextUnitTest
-- Currently only tests base class.  Adding subclass testing in a bit
2011-11-20 18:28:01 -05:00
Mark DePristo bc44f6fd9e Utility function Collection<Genotype> -> Collection<String> 2011-11-20 18:26:56 -05:00
Mark DePristo 9445326c6c Genotype is Comparable via sampleName 2011-11-20 18:26:27 -05:00
Mark DePristo f9e25081ab Completed documented LazyGenotypesContext 2011-11-20 08:35:52 -05:00
Mark DePristo 9cb3fe3a59 Vastly better way of doing on-demand genotyping loading
-- With our GenotypesContext class we can naturally create a LazyGenotypesContext subclass that does the on-demand loading.
-- This new class was replaced all of the old, complex functionality
-- Better still, there were many cases were the genotypes were being loaded unnecessarily, resulting in efficiency.  This was detected because some of the integration tests changed as the genotypes were no longer being parsing unnecessarily
-- Misc. bug fixes throughout the system
-- Bug fixes for PhaseByTransmission with new GenotypesContext
2011-11-20 08:23:09 -05:00
Mark DePristo f392d330c3 Proper use of builder. Previous conversion attempt was flawed 2011-11-19 22:09:56 -05:00
Mark DePristo 7d09c0064b Bug fixes and code cleanup throughout
-- chromosomeCounts now takes builder as well, cleaning up a lot of code throughout the codebase.
2011-11-19 18:40:15 -05:00
Mark DePristo 707bd30b3f Should have been @BeforeMethod 2011-11-19 16:10:09 -05:00
Mark DePristo 8f7eebbaaf Bugfix for pError not being checked correctly in CommonInfo
-- UnitTests to ensure correct behavior
-- UnitTests to ensure correct behavior for pass filters vs. failed filters vs. unfiltered
2011-11-19 15:58:59 -05:00
Mark DePristo b7b57ef39a Updating MD5 to reflect canonical ordering of calculation
-- We should no longer have md5s changing because of hashmaps changing their sort order on us
-- Added GenotypeLikelihoodsUnitTests
-- Refactored ExactAFCaclculation to put the PL -> QUAL calculation in the GenotypeLikelihoods class to avoid the code copy.
2011-11-19 15:57:33 -05:00