Mark DePristo
101ffc4dfd
Expanded, contrastive VariantContextBenchmark
...
-- Compares performance across a bunch of common operations with GATK 1.3 version of VariantContext and GATK 1.4
-- 1.3 VC and associated utilities copied wholesale into test directory under v13
2011-11-16 13:35:16 -05:00
Mark DePristo
e56d52006a
Continuing bugfixes to get new VC working
2011-11-16 10:39:17 -05:00
Mark DePristo
df415da4ab
More bug fixes on the way to passing all tests
2011-11-15 17:38:12 -05:00
Mark DePristo
0be23aae4e
Bugfixes on way to a working refactored VariantContext
2011-11-15 17:20:14 -05:00
Mark DePristo
231c47c039
Bugfixes on way to a working refactored VariantContext
2011-11-15 16:42:50 -05:00
Mark DePristo
2b2514dad2
Moved many unused phasing walkers and utilities to archive
2011-11-15 16:14:50 -05:00
Mark DePristo
460a51f473
ID field now stored in the VariantContext itself, not the attributes
2011-11-15 14:56:33 -05:00
Mark DePristo
233e581828
Merging in Master
2011-11-15 09:28:24 -05:00
Mark DePristo
6e1a86bc3e
Bug fixes to VariantContext and GenotypeCollection
2011-11-15 09:21:30 -05:00
Mauricio Carneiro
cde829899d
compress Reduce Read counts bytes by offset
...
compressed the representation of the reduce reads counts by offset results in 17% average compression in final BAM file size.
Example compression -->
from : 10, 10, 11, 11, 12, 12, 12, 11, 10
to: 10, 0, 1, 1,2, 2, 2, 1, 0
2011-11-14 18:30:24 -05:00
Mauricio Carneiro
a1ce3d8141
Not reporting counts to reduced deletions (temporary patch)
...
Deletions will not have counts represented in the reduced form. This may change in the future with a ReadBackedPileup refactor.
2011-11-14 18:30:24 -05:00
Mark DePristo
4ff8225d78
GenotypeMap -> GenotypeCollection part 3
...
-- Test code actually builds
2011-11-14 17:51:41 -05:00
Mark DePristo
f0234ab67f
GenotypeMap -> GenotypeCollection part 2
...
-- Code actually builds
2011-11-14 17:42:55 -05:00
David Roazen
ab0ee9b847
Perform only necessary validation in VariantContext modify methods
2011-11-14 16:49:59 -05:00
Mark DePristo
2e9d5363e7
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-11-14 15:32:06 -05:00
Mark DePristo
1fbdcb4f43
GenotypeMap -> GenotypeCollection
2011-11-14 15:32:03 -05:00
Guillermo del Angel
5c38a9cfd6
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-11-14 15:00:03 -05:00
Guillermo del Angel
f1db31f072
Attempt to reduce memory footprint of ValidationSiteSelector (if this doesn't work then a radical rewrite of the walker to make it two-pass will be necessary): don't log any attributes of original VCF, if we need chr counts later we can reannotate from original inputs. As things stand, we can't select SNP's genomewide due to memory usage.
2011-11-14 14:56:09 -05:00
Eric Banks
4dc9dbe890
One quick fix to previous commit
2011-11-14 14:42:12 -05:00
Eric Banks
b3313e1445
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-11-14 14:31:38 -05:00
Eric Banks
7b2a7cfbe7
Transfer headers from the resource VCF when possible when using expressions. While there, VA was modified so that it didn't assume that the ID field was present in the VC's info map in preparation for Mark's upcoming changes.
2011-11-14 14:31:27 -05:00
Mark DePristo
9b5c79b49d
Renamed InferredGeneticContext to CommonInfo
...
-- I have no idea why I named this InferredGeneticContext, a totally meaningless term
-- Renamed to CommonInfo.
-- Made package protected, as no one should use this outside of VariantContext and Genotype
-- UGEngine was using IGC constant, but it's now using the public one in VariantContext.
2011-11-14 14:28:52 -05:00
Mark DePristo
077397cb4b
Deleted MutableVariantContext
...
-- All methods that used this capable now use VariantContext directly instead
2011-11-14 14:19:06 -05:00
Mark DePristo
b11c535527
Deleted MutableGenotype
...
-- This class wasn't really used anywhere, and so removed to control code bloat.
2011-11-14 13:16:36 -05:00
Guillermo del Angel
509ecc62cc
Another bug fix for when no samples are specified in ValidationSiteSelectionWalker
2011-11-14 13:02:51 -05:00
Mark DePristo
79987d685c
GenotypeMap contains a Map, not extends it
...
-- On path to replacing it with GenotypeCollection
2011-11-14 12:55:03 -05:00
Eric Banks
7aee80cd3b
Fix to deal with reduced reads containing a deletion
2011-11-14 12:23:46 -05:00
Eric Banks
3d2970453b
Misc minor cleanup
2011-11-14 09:41:54 -05:00
Eric Banks
b7c33116af
Minor docs update
2011-11-12 23:21:07 -05:00
Eric Banks
76d357be40
Updating docs example to use -L since that's best practice
2011-11-12 23:20:05 -05:00
Guillermo del Angel
af8e39c04d
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-11-12 08:42:24 -05:00
Guillermo del Angel
c95f015d77
a) Bug fix in validation site selector, b) Initial qscript for selection of random snps and indels for validation experiment
2011-11-12 08:41:53 -05:00
Mauricio Carneiro
8cd077f009
Writing a GATKReport table as output.
...
just to standardize the output.
2011-11-11 18:52:58 -05:00
Mark DePristo
fee9b367e4
VariantContext genotypes are now stored as GenotypeMap objects
...
-- Enables further sophisticated optimizations, as this class can be smarter about storing the data and will directly support operations like subset to samples
-- All instances in the gatk that used Map<String, Genotype> now use GenotypeMap type.
-- Amazingly, there were many places where HashMap<String, Genotype> is used, so that the order of the genotypes is technically undefined and could be dangerous. Now everything uses GenotypeMap with a specific ordering of samples (by name)
-- Integrationtests updated and all pass
2011-11-11 15:00:35 -05:00
Guillermo del Angel
cd3146f4cf
Add hidden option to ValidationAmplicons to output slightly modified format to make file work with downstream SQNM tools more seamlessly at request of GAP: one line per record, keep probe identifier to 20 characters, no * in ref allele.
2011-11-11 14:07:07 -05:00
Ryan Poplin
40fbeafa37
VQSR will now detect if the negative model failed to converge properly because of having too few data points and automatically retry with more appropriate clustering parameters.
2011-11-11 11:52:30 -05:00
Mark DePristo
4938569b3a
More general handling of parameters for VariantContextBenchmark
2011-11-11 10:22:19 -05:00
Mark DePristo
ef9f8b5d46
Added subContextOfSamples to VariantContext
...
-- This is a more convenient accesssor than subContextOfGenotypes, represents nearly all of the use cases of the former function, and potentially can be implemented more efficiently.
2011-11-11 10:07:11 -05:00
Mark DePristo
e216e85465
First working version of VariantContextBenchmark
2011-11-11 09:56:00 -05:00
Mark DePristo
ee40791776
Attributes are now Map<String,Object> not Map<String,?>
...
-- Allows us to avoid an unnecessary copy when creating InferredGeneticContext (whose name really needs to change).
2011-11-11 09:55:42 -05:00
Eric Banks
59945a41e8
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-11-10 23:05:20 -05:00
Eric Banks
0c32281484
Adding a benchmarking class for parsing VCF files. Not complete.
2011-11-10 23:05:13 -05:00
Mauricio Carneiro
9c013374fd
A walker to calculate the coverage of a target
...
in targeted sequencing projects, we pay a penalty to get to a minimum coverage in 80% of the targets. This walker will help us understand what is the ratio between the targeted site (usually in the middle of the interval) and the targeted region.
2011-11-10 17:16:51 -05:00
Mauricio Carneiro
ffa6bc66ec
Eliminating excessive debug tests
2011-11-10 17:16:51 -05:00
Mauricio Carneiro
5a1170078a
Using centralized reduce read facilities
2011-11-10 17:16:51 -05:00
Mark DePristo
dc9b351b5e
Meaningful error message when an IntervalArg file fails to parse correctly
2011-11-10 17:10:26 -05:00
Mark DePristo
bb7bf74aa8
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-11-10 16:05:43 -05:00
Mark DePristo
153e52ffed
VariantEvalIntegrationTest for IntervalStratification
2011-11-10 14:10:39 -05:00
Mauricio Carneiro
060c7ce8ae
It wouldn't harm integrationtests if we had our logic right... :-)
2011-11-10 14:03:22 -05:00
Mauricio Carneiro
bb4cd59475
Filtered and consensus reads will now use the same tag
2011-11-10 13:58:31 -05:00