Mark DePristo
28b286ad39
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-11-30 09:11:53 -05:00
Ryan Poplin
91413cf0d9
Merged bug fix from Stable into Unstable
2011-11-29 14:01:23 -05:00
Ryan Poplin
cb284eebde
Further updating VQSR tutorial wiki docs to reflect the bundle
2011-11-29 14:00:57 -05:00
Ryan Poplin
dcb889665d
Merged bug fix from Stable into Unstable
2011-11-29 09:58:49 -05:00
Ryan Poplin
447e9bff9e
Updating VQSR tutorial wiki docs to reflect the bundle
2011-11-29 09:57:45 -05:00
Ryan Poplin
110298322c
Adding Transmission Disequilibrium Test annotation to VariantAnnotator and integration test to test it.
2011-11-29 09:29:18 -05:00
Eric Banks
d7d8b8e380
Tribble v42 changes the Codec.canDecode method to take in a String instead of a File; this is something that Jim was adamant about (because Tribble can handle streams other than files). I didn't want the next person who needed to rev Tribble to deal with this change additionally, so I took care of updating the GATK now.
2011-11-28 14:18:28 -05:00
Mark DePristo
3c36428a20
Bug fix for TiTv calculation -- shouldn't be rounding
2011-11-28 10:20:34 -05:00
Eric Banks
436b4dc855
Updated docs
2011-11-28 08:59:48 -05:00
Mark DePristo
e60272975a
Fix for changed MD5 in streaming VCF test
2011-11-23 19:01:33 -05:00
Mark DePristo
12f09d88f9
Removing references to SimpleMetricsByAC
2011-11-23 16:08:18 -05:00
Mark DePristo
e319079c32
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-11-23 13:02:11 -05:00
Mark DePristo
4107636144
VariantEval updates
...
-- Performance optimizations
-- Tables now are cleanly formatted (floats are %.2f printed)
-- VariantSummary is a standard report now
-- Removed CompEvalGenotypes (it didn't do anything)
-- Deleted unused classes in GenotypeConcordance
-- Updates integration tests as appropriate
2011-11-23 13:02:07 -05:00
David Roazen
e5b85f0a78
A toString() method for IntervalBindings
...
Necessary since we're currently writing things like this to our VCF headers:
intervals=[org.broadinstitute.sting.commandline.IntervalBinding@4ce66f56]
2011-11-23 11:56:12 -05:00
Mark DePristo
5a4856b82e
GATKReports now support a format field per column
...
-- You can tell the table to format your object with "%.2f" for example.
2011-11-23 11:31:04 -05:00
Mark DePristo
c8bf7d2099
Check for null comment
2011-11-23 10:47:21 -05:00
Mark DePristo
6c2555885c
Caching getSimpleName() in VariantEval is a big performance improvement
...
-- Removed the SimpleMetricsByAC table, as one should just use the AlleleCount Stratefication and the upcoming VariantSummary table
2011-11-23 08:34:05 -05:00
Guillermo del Angel
32adbd614f
Solve merge conflict
2011-11-22 22:48:46 -05:00
Guillermo del Angel
941f3784dc
Solve merge conflict
2011-11-22 22:48:03 -05:00
Guillermo del Angel
75d93e6335
Another corner condition fix: skip likelihood computation in case we cut so many bases there's no haplotype or read left
2011-11-22 22:46:12 -05:00
Mark DePristo
a3aef8fa53
Final performance optimization for GenotypesContext
2011-11-22 17:19:30 -05:00
Mark DePristo
990c02e4de
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-11-22 17:19:11 -05:00
Guillermo del Angel
38a90da92c
Fixed merge conflict to Unstable
2011-11-22 14:39:45 -05:00
Guillermo del Angel
32a77a8a56
Prevent out of bound error in case read span > reference context + indel length. Can happen in RNAseq reads with long N CIGAR operators in the middle.
2011-11-22 13:57:24 -05:00
Eric Banks
5821c11fad
For BAM and Reviewed errors we now check the error message to see if it's actually a 'too many open files' problem and, if so, we generate a User Error instead.
2011-11-22 10:50:22 -05:00
Mark DePristo
7087310373
Embarassing bug fixed
2011-11-22 10:16:36 -05:00
Mark DePristo
e484625594
GenotypesContext now updates cached data for add, set, replace operations when possible
...
-- Involved separately managing the sample -> offset and sample sorted list operations. This should improve performance throughout the system
2011-11-22 08:40:48 -05:00
Mark DePristo
29ca24694a
UG now encoding NO_CALLs as ./. not ./.:.:4:0,0,0
...
A few updated UGs integration tests
2011-11-22 08:22:32 -05:00
Mark DePristo
2b51c01df4
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-11-21 19:16:06 -05:00
Mark DePristo
5443d3634a
Again, fixing the add call when we really mean replace
...
-- Updating MD5s for UG to reflect that what was previously called ./.:.:10:0,0,0 is now just ./. Eric will fix long-standing bug in QD observed from this change
-- VFW MD5s restored to their old correct values. There was a bug in my implementation to caused the genotypes to not be parsed from the lazy output even through the header was incorrect.
2011-11-21 19:15:56 -05:00
Mauricio Carneiro
5ad3dfcd62
BugFix: byte overflow in SyntheticRead compressed base counts
...
* fixed and added unit test
2011-11-21 17:11:50 -05:00
Mark DePristo
9ea7b70a02
Added decode method to LazyGenotypesContext
...
-- AbstractVCFCodec calls this if the samples are not sorted. Previously called getGenotypes() which didn't actually trigger the decode
2011-11-21 16:21:23 -05:00
Mark DePristo
ab2efe3bd3
Reverting bad exact model changes
2011-11-21 16:14:40 -05:00
Eric Banks
44554b2bfd
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-11-21 15:01:45 -05:00
Eric Banks
022832bd74
Very bad use of the == operator with Strings was ensuring that validating GenomeLocs was very inefficient. This fix resulted in a significant speedup for a simple RodWalker.
2011-11-21 14:49:47 -05:00
Mark DePristo
1561af22af
Exact model code cleanup
...
-- Fixed up code when fixing a bug detected by aggressive contracts in GenotypesContext.
2011-11-21 14:35:15 -05:00
Mark DePristo
2c501364b8
GenotypesContext no longer have immutability in constructor
...
-- additional bug fixes throughout VariantContext and GenotypesContext objects
2011-11-21 14:34:31 -05:00
David Roazen
1296dd41be
Removing the legacy -L "interval1;interval2" syntax
...
This syntax predates the ability to have multiple -L arguments, is
inconsistent with the syntax of all other GATK arguments, requires
quoting to avoid interpretation by the shell, and was causing
problems in Queue.
A UserException is now thrown if someone tries to use this syntax.
2011-11-21 13:18:53 -05:00
Mark DePristo
e467b8e1ae
More contracts on LazyGenotypesContext
2011-11-21 09:34:57 -05:00
Mark DePristo
2e9ecf639e
Generalized interface to LazyGenotypesContext
...
-- Now you provide a LazyParsing object
-- LazyGenotypesContext now knows nothing about the VCF parser itself. The parser holds all of the necessary data to parse the VCF genotypes when necessarily, and the LGC only has a pointer to this object
-- Using new interface added LazyGenotypesContext to unit tests with a simple lazy version
-- Deleted VCFParser interface, as it was no longer necessary
2011-11-21 09:30:40 -05:00
Mark DePristo
f0ac588d32
Extensive unit test for GenotypeContextUnitTest
...
-- Currently only tests base class. Adding subclass testing in a bit
2011-11-20 18:28:01 -05:00
Mark DePristo
bc44f6fd9e
Utility function Collection<Genotype> -> Collection<String>
2011-11-20 18:26:56 -05:00
Mark DePristo
9445326c6c
Genotype is Comparable via sampleName
2011-11-20 18:26:27 -05:00
Mark DePristo
f9e25081ab
Completed documented LazyGenotypesContext
2011-11-20 08:35:52 -05:00
Mark DePristo
9cb3fe3a59
Vastly better way of doing on-demand genotyping loading
...
-- With our GenotypesContext class we can naturally create a LazyGenotypesContext subclass that does the on-demand loading.
-- This new class was replaced all of the old, complex functionality
-- Better still, there were many cases were the genotypes were being loaded unnecessarily, resulting in efficiency. This was detected because some of the integration tests changed as the genotypes were no longer being parsing unnecessarily
-- Misc. bug fixes throughout the system
-- Bug fixes for PhaseByTransmission with new GenotypesContext
2011-11-20 08:23:09 -05:00
Mark DePristo
f392d330c3
Proper use of builder. Previous conversion attempt was flawed
2011-11-19 22:09:56 -05:00
Mark DePristo
7d09c0064b
Bug fixes and code cleanup throughout
...
-- chromosomeCounts now takes builder as well, cleaning up a lot of code throughout the codebase.
2011-11-19 18:40:15 -05:00
Mark DePristo
707bd30b3f
Should have been @BeforeMethod
2011-11-19 16:10:09 -05:00
Mark DePristo
8f7eebbaaf
Bugfix for pError not being checked correctly in CommonInfo
...
-- UnitTests to ensure correct behavior
-- UnitTests to ensure correct behavior for pass filters vs. failed filters vs. unfiltered
2011-11-19 15:58:59 -05:00
Mark DePristo
b7b57ef39a
Updating MD5 to reflect canonical ordering of calculation
...
-- We should no longer have md5s changing because of hashmaps changing their sort order on us
-- Added GenotypeLikelihoodsUnitTests
-- Refactored ExactAFCaclculation to put the PL -> QUAL calculation in the GenotypeLikelihoods class to avoid the code copy.
2011-11-19 15:57:33 -05:00