David Roazen
e5b85f0a78
A toString() method for IntervalBindings
...
Necessary since we're currently writing things like this to our VCF headers:
intervals=[org.broadinstitute.sting.commandline.IntervalBinding@4ce66f56]
2011-11-23 11:56:12 -05:00
Mark DePristo
5a4856b82e
GATKReports now support a format field per column
...
-- You can tell the table to format your object with "%.2f" for example.
2011-11-23 11:31:04 -05:00
Mark DePristo
c8bf7d2099
Check for null comment
2011-11-23 10:47:21 -05:00
Mark DePristo
6c2555885c
Caching getSimpleName() in VariantEval is a big performance improvement
...
-- Removed the SimpleMetricsByAC table, as one should just use the AlleleCount Stratefication and the upcoming VariantSummary table
2011-11-23 08:34:05 -05:00
Guillermo del Angel
32adbd614f
Solve merge conflict
2011-11-22 22:48:46 -05:00
Guillermo del Angel
941f3784dc
Solve merge conflict
2011-11-22 22:48:03 -05:00
Guillermo del Angel
75d93e6335
Another corner condition fix: skip likelihood computation in case we cut so many bases there's no haplotype or read left
2011-11-22 22:46:12 -05:00
Mark DePristo
a3aef8fa53
Final performance optimization for GenotypesContext
2011-11-22 17:19:30 -05:00
Mark DePristo
990c02e4de
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-11-22 17:19:11 -05:00
Guillermo del Angel
38a90da92c
Fixed merge conflict to Unstable
2011-11-22 14:39:45 -05:00
Guillermo del Angel
32a77a8a56
Prevent out of bound error in case read span > reference context + indel length. Can happen in RNAseq reads with long N CIGAR operators in the middle.
2011-11-22 13:57:24 -05:00
Eric Banks
5821c11fad
For BAM and Reviewed errors we now check the error message to see if it's actually a 'too many open files' problem and, if so, we generate a User Error instead.
2011-11-22 10:50:22 -05:00
Mark DePristo
7087310373
Embarassing bug fixed
2011-11-22 10:16:36 -05:00
Mark DePristo
e484625594
GenotypesContext now updates cached data for add, set, replace operations when possible
...
-- Involved separately managing the sample -> offset and sample sorted list operations. This should improve performance throughout the system
2011-11-22 08:40:48 -05:00
Mark DePristo
2b51c01df4
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-11-21 19:16:06 -05:00
Mark DePristo
5443d3634a
Again, fixing the add call when we really mean replace
...
-- Updating MD5s for UG to reflect that what was previously called ./.:.:10:0,0,0 is now just ./. Eric will fix long-standing bug in QD observed from this change
-- VFW MD5s restored to their old correct values. There was a bug in my implementation to caused the genotypes to not be parsed from the lazy output even through the header was incorrect.
2011-11-21 19:15:56 -05:00
Mauricio Carneiro
5ad3dfcd62
BugFix: byte overflow in SyntheticRead compressed base counts
...
* fixed and added unit test
2011-11-21 17:11:50 -05:00
Mark DePristo
9ea7b70a02
Added decode method to LazyGenotypesContext
...
-- AbstractVCFCodec calls this if the samples are not sorted. Previously called getGenotypes() which didn't actually trigger the decode
2011-11-21 16:21:23 -05:00
Mark DePristo
ab2efe3bd3
Reverting bad exact model changes
2011-11-21 16:14:40 -05:00
Eric Banks
44554b2bfd
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-11-21 15:01:45 -05:00
Eric Banks
022832bd74
Very bad use of the == operator with Strings was ensuring that validating GenomeLocs was very inefficient. This fix resulted in a significant speedup for a simple RodWalker.
2011-11-21 14:49:47 -05:00
Mark DePristo
1561af22af
Exact model code cleanup
...
-- Fixed up code when fixing a bug detected by aggressive contracts in GenotypesContext.
2011-11-21 14:35:15 -05:00
Mark DePristo
2c501364b8
GenotypesContext no longer have immutability in constructor
...
-- additional bug fixes throughout VariantContext and GenotypesContext objects
2011-11-21 14:34:31 -05:00
David Roazen
1296dd41be
Removing the legacy -L "interval1;interval2" syntax
...
This syntax predates the ability to have multiple -L arguments, is
inconsistent with the syntax of all other GATK arguments, requires
quoting to avoid interpretation by the shell, and was causing
problems in Queue.
A UserException is now thrown if someone tries to use this syntax.
2011-11-21 13:18:53 -05:00
Mark DePristo
e467b8e1ae
More contracts on LazyGenotypesContext
2011-11-21 09:34:57 -05:00
Mark DePristo
2e9ecf639e
Generalized interface to LazyGenotypesContext
...
-- Now you provide a LazyParsing object
-- LazyGenotypesContext now knows nothing about the VCF parser itself. The parser holds all of the necessary data to parse the VCF genotypes when necessarily, and the LGC only has a pointer to this object
-- Using new interface added LazyGenotypesContext to unit tests with a simple lazy version
-- Deleted VCFParser interface, as it was no longer necessary
2011-11-21 09:30:40 -05:00
Mark DePristo
bc44f6fd9e
Utility function Collection<Genotype> -> Collection<String>
2011-11-20 18:26:56 -05:00
Mark DePristo
9445326c6c
Genotype is Comparable via sampleName
2011-11-20 18:26:27 -05:00
Mark DePristo
f9e25081ab
Completed documented LazyGenotypesContext
2011-11-20 08:35:52 -05:00
Mark DePristo
9cb3fe3a59
Vastly better way of doing on-demand genotyping loading
...
-- With our GenotypesContext class we can naturally create a LazyGenotypesContext subclass that does the on-demand loading.
-- This new class was replaced all of the old, complex functionality
-- Better still, there were many cases were the genotypes were being loaded unnecessarily, resulting in efficiency. This was detected because some of the integration tests changed as the genotypes were no longer being parsing unnecessarily
-- Misc. bug fixes throughout the system
-- Bug fixes for PhaseByTransmission with new GenotypesContext
2011-11-20 08:23:09 -05:00
Mark DePristo
f392d330c3
Proper use of builder. Previous conversion attempt was flawed
2011-11-19 22:09:56 -05:00
Mark DePristo
7d09c0064b
Bug fixes and code cleanup throughout
...
-- chromosomeCounts now takes builder as well, cleaning up a lot of code throughout the codebase.
2011-11-19 18:40:15 -05:00
Mark DePristo
8f7eebbaaf
Bugfix for pError not being checked correctly in CommonInfo
...
-- UnitTests to ensure correct behavior
-- UnitTests to ensure correct behavior for pass filters vs. failed filters vs. unfiltered
2011-11-19 15:58:59 -05:00
Mark DePristo
b7b57ef39a
Updating MD5 to reflect canonical ordering of calculation
...
-- We should no longer have md5s changing because of hashmaps changing their sort order on us
-- Added GenotypeLikelihoodsUnitTests
-- Refactored ExactAFCaclculation to put the PL -> QUAL calculation in the GenotypeLikelihoods class to avoid the code copy.
2011-11-19 15:57:33 -05:00
Mark DePristo
73119c8e3c
Merge with master
...
-- A few bug fixes
2011-11-19 09:56:06 -05:00
Mark DePristo
f685fff79b
Killing the final versions of old new VariantContext interface
2011-11-18 21:32:43 -05:00
Mark DePristo
6cf315e17b
Change interface to getNegLog10PError to getLog10PError
2011-11-18 21:07:30 -05:00
Mark DePristo
c7f2d5c7c7
Final minor fix to contract
2011-11-18 19:40:05 -05:00
Mauricio Carneiro
b5de182014
isEmpty now checks if mReadBases is null
...
Since newly created reads have mReadBases == null. This is an effort to centralize the place to check for empty GATKSAMRecords.
2011-11-18 18:34:05 -05:00
Mauricio Carneiro
8ab3ee9c65
Merge remote-tracking branch 'unstable/master' into rr
2011-11-18 16:50:25 -05:00
Mauricio Carneiro
333e5de812
returning read instead of GATKSAMRecord
...
Do not create new GATKSAMRecord when read has been fully clipped, because it is essentially the same as returning the currently fully clipped read.
2011-11-18 16:49:59 -05:00
Matt Hanna
8bb4d4dca3
First pass of the asynchronous block loader.
...
Block loads are only triggered on queue empty at this point. Disabled by
default (enable with nt:io=?).
2011-11-18 15:02:59 -05:00
Mark DePristo
a2e79fbe8a
Fixes to contracts
2011-11-18 14:18:53 -05:00
Mark DePristo
660d6009a2
Documentation and contracts for GenotypesContext and VariantContextBuilder
2011-11-18 13:59:30 -05:00
Mark DePristo
f54afc19b4
VariantContextBuilder
...
-- New approach to making VariantContexts modeled on StringBuilder
-- No more modify routines -- use VariantContextBuilder
-- Renamed isPolymorphic to isPolymorphicInSamples. Same for mono
-- getChromosomeCount -> getCalledChrCount
-- Walkers changed to use new VariantContext. Some deprecated new VariantContext calls remain
-- VCFCodec now uses optimized cached information to create GenotypesContext.
2011-11-18 12:39:10 -05:00
Eric Banks
6459784351
Merged bug fix from Stable into Unstable
2011-11-18 12:34:57 -05:00
Eric Banks
c62082ba1b
Making this class public again as per request from Cancer folks
2011-11-18 12:34:27 -05:00
Eric Banks
8710673a97
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-11-18 12:29:33 -05:00
Eric Banks
768b27322b
I figured out why we were getting tons of hom var genotype calls with Mauricio's low quality (synthetic) reduced reads: the RR implementation in the UG was not capping the base quality by the mapping quality, so all the low quality reads were used to generate GLs. Fixed.
2011-11-18 12:29:15 -05:00
Mark DePristo
7490dbb6eb
First version of VariantContextBuilder
2011-11-18 11:06:15 -05:00
Roger Zurawicki
f48d4cfa79
Bug fix: fully clipping GATKSAMRecords and flushing ops
...
Reads that are emptied after clipping become new GATKSAMRecords.
When applying ClippingOps, the ops are cleared after the clipping
2011-11-18 00:24:39 -05:00
Mark DePristo
fa454c88bb
UnitTests for VariantContext for chrCount, getSampleNames, Order function
...
-- Major change to how chromosomeCounts is computed. Now NO_CALL alleles are always excluded. So ChromosomeCounts(A/.) is 1, the previous result would have been 2.
-- Naming changes for getSamplesNameInOrder()
2011-11-17 20:37:22 -05:00
Mark DePristo
23359d1c6c
Bugfix for pruneVariantContext, which was dropping the ref base for padding
2011-11-17 15:32:52 -05:00
Mark DePristo
473b860312
Major determinism fix for UG and RankSumTest
...
-- Now these routines all iterate in sample name order (genotypes.iterateInSampleNameOrder) so that the results of UG and the annotator do not depend on the particular order of samples we see for the exact model and the RankSumTest
2011-11-17 15:31:45 -05:00
Khalid Shakir
c50274e02e
During flanking interval creation merging overlapping flanks so that on scatter the list doesn't accidentally genotype the same site twice.
...
Moved flanking interval utilies to IntervalUtils with UnitTests.
2011-11-17 13:56:42 -05:00
Eric Banks
16a021992b
Updated header description for the INFO and FORMAT DP fields to be more accurate.
2011-11-17 13:17:53 -05:00
Eric Banks
e7d41d8d33
Minor cleanup
2011-11-17 12:00:28 -05:00
Mark DePristo
7e66677769
Expanded UnitTests for VariantContext
...
Tests for
-- getGenotype and getGenotypes
-- subContextBySample
-- modify routines
2011-11-16 20:45:15 -05:00
Mark DePristo
aa0610ea92
GenotypeCollection renamed to GenotypesContext
2011-11-16 16:24:05 -05:00
Mark DePristo
caf6080402
Better algorithm for merging genotypes in CombineVariants
2011-11-16 15:17:33 -05:00
Mark DePristo
e56d52006a
Continuing bugfixes to get new VC working
2011-11-16 10:39:17 -05:00
Matt Hanna
eb8e031f75
Merged bug fix from Stable into Unstable
2011-11-16 09:57:37 -05:00
Matt Hanna
6a5d5e7ac9
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/stable
2011-11-16 09:57:13 -05:00
Matt Hanna
7ac5cf8430
Getting rid of unsupported CountReadPairs walker in stable. Removal of
...
remainder of pairs processing framework to follow in unstable.
2011-11-16 09:53:59 -05:00
Eric Banks
c2ebe58712
Merge remote-tracking branch 'Laurent/master'
2011-11-16 09:34:47 -05:00
Laurent Francioli
0dc3d20d58
Corrected bug causing PhaseByTransmission to crash in case of new Genotype.Type
2011-11-16 09:33:13 +01:00
Laurent Francioli
7d77fc51f5
Corrected bug causing PhaseByTransmission to crash in case of new Genotype.Type
2011-11-16 03:32:43 -05:00
David Roazen
0d163e3f52
SnpEff 2.0.4 support
...
-Modified the SnpEff parser to work with the SnpEff 2.0.4 VCF output format
-Assigning functional classes and effect impacts now handled directly
by SnpEff rather than the GATK
-Removed support for SnpEff 2.0.2, as we no longer trust the output of that
version since it doesn't exclude effects associated with certain nonsensical
transcripts. These effects are excluded as of 2.0.4.
-Updated unit and integration tests
This support is based on a *release-candidate* of SnpEff 2.0.4, and so is subject
to change between now and the next GATK release.
2011-11-15 18:36:22 -05:00
Mark DePristo
df415da4ab
More bug fixes on the way to passing all tests
2011-11-15 17:38:12 -05:00
Mark DePristo
0be23aae4e
Bugfixes on way to a working refactored VariantContext
2011-11-15 17:20:14 -05:00
Mark DePristo
231c47c039
Bugfixes on way to a working refactored VariantContext
2011-11-15 16:42:50 -05:00
Laurent Francioli
fb685f88ec
Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-11-15 16:23:53 -05:00
Mark DePristo
2b2514dad2
Moved many unused phasing walkers and utilities to archive
2011-11-15 16:14:50 -05:00
Mark DePristo
460a51f473
ID field now stored in the VariantContext itself, not the attributes
2011-11-15 14:56:33 -05:00
Eric Banks
b45d10e6f1
The DP in the FORMAT field (per sample) must also use the representative count or else it's always 1 for reduced reads.
2011-11-15 10:23:59 -05:00
Mark DePristo
233e581828
Merging in Master
2011-11-15 09:28:24 -05:00
Eric Banks
b66556f4a0
Update error message so that it's clear ReadPair Walkers are exceptions
2011-11-15 09:22:57 -05:00
Mark DePristo
6e1a86bc3e
Bug fixes to VariantContext and GenotypeCollection
2011-11-15 09:21:30 -05:00
Mauricio Carneiro
cde829899d
compress Reduce Read counts bytes by offset
...
compressed the representation of the reduce reads counts by offset results in 17% average compression in final BAM file size.
Example compression -->
from : 10, 10, 11, 11, 12, 12, 12, 11, 10
to: 10, 0, 1, 1,2, 2, 2, 1, 0
2011-11-14 18:30:24 -05:00
Mark DePristo
f0234ab67f
GenotypeMap -> GenotypeCollection part 2
...
-- Code actually builds
2011-11-14 17:42:55 -05:00
David Roazen
ab0ee9b847
Perform only necessary validation in VariantContext modify methods
2011-11-14 16:49:59 -05:00
Mark DePristo
2e9d5363e7
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-11-14 15:32:06 -05:00
Mark DePristo
1fbdcb4f43
GenotypeMap -> GenotypeCollection
2011-11-14 15:32:03 -05:00
Eric Banks
4dc9dbe890
One quick fix to previous commit
2011-11-14 14:42:12 -05:00
Eric Banks
7b2a7cfbe7
Transfer headers from the resource VCF when possible when using expressions. While there, VA was modified so that it didn't assume that the ID field was present in the VC's info map in preparation for Mark's upcoming changes.
2011-11-14 14:31:27 -05:00
Mark DePristo
9b5c79b49d
Renamed InferredGeneticContext to CommonInfo
...
-- I have no idea why I named this InferredGeneticContext, a totally meaningless term
-- Renamed to CommonInfo.
-- Made package protected, as no one should use this outside of VariantContext and Genotype
-- UGEngine was using IGC constant, but it's now using the public one in VariantContext.
2011-11-14 14:28:52 -05:00
Mark DePristo
077397cb4b
Deleted MutableVariantContext
...
-- All methods that used this capable now use VariantContext directly instead
2011-11-14 14:19:06 -05:00
Mark DePristo
b11c535527
Deleted MutableGenotype
...
-- This class wasn't really used anywhere, and so removed to control code bloat.
2011-11-14 13:16:36 -05:00
Mark DePristo
79987d685c
GenotypeMap contains a Map, not extends it
...
-- On path to replacing it with GenotypeCollection
2011-11-14 12:55:03 -05:00
Eric Banks
7aee80cd3b
Fix to deal with reduced reads containing a deletion
2011-11-14 12:23:46 -05:00
Eric Banks
3d2970453b
Misc minor cleanup
2011-11-14 09:41:54 -05:00
Laurent Francioli
1347beef40
Merge branch 'PhaseByTransmission'
2011-11-14 11:31:28 +01:00
Eric Banks
b7c33116af
Minor docs update
2011-11-12 23:21:07 -05:00
Eric Banks
76d357be40
Updating docs example to use -L since that's best practice
2011-11-12 23:20:05 -05:00
Mark DePristo
fee9b367e4
VariantContext genotypes are now stored as GenotypeMap objects
...
-- Enables further sophisticated optimizations, as this class can be smarter about storing the data and will directly support operations like subset to samples
-- All instances in the gatk that used Map<String, Genotype> now use GenotypeMap type.
-- Amazingly, there were many places where HashMap<String, Genotype> is used, so that the order of the genotypes is technically undefined and could be dangerous. Now everything uses GenotypeMap with a specific ordering of samples (by name)
-- Integrationtests updated and all pass
2011-11-11 15:00:35 -05:00
Guillermo del Angel
cd3146f4cf
Add hidden option to ValidationAmplicons to output slightly modified format to make file work with downstream SQNM tools more seamlessly at request of GAP: one line per record, keep probe identifier to 20 characters, no * in ref allele.
2011-11-11 14:07:07 -05:00
Ryan Poplin
40fbeafa37
VQSR will now detect if the negative model failed to converge properly because of having too few data points and automatically retry with more appropriate clustering parameters.
2011-11-11 11:52:30 -05:00
Mark DePristo
ef9f8b5d46
Added subContextOfSamples to VariantContext
...
-- This is a more convenient accesssor than subContextOfGenotypes, represents nearly all of the use cases of the former function, and potentially can be implemented more efficiently.
2011-11-11 10:07:11 -05:00
Mark DePristo
ee40791776
Attributes are now Map<String,Object> not Map<String,?>
...
-- Allows us to avoid an unnecessary copy when creating InferredGeneticContext (whose name really needs to change).
2011-11-11 09:55:42 -05:00
Mark DePristo
dc9b351b5e
Meaningful error message when an IntervalArg file fails to parse correctly
2011-11-10 17:10:26 -05:00
Mark DePristo
bb7bf74aa8
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-11-10 16:05:43 -05:00
Mauricio Carneiro
060c7ce8ae
It wouldn't harm integrationtests if we had our logic right... :-)
2011-11-10 14:03:22 -05:00
Eric Banks
39678b6a20
Check for reads with missing read groups and throw a UserException when encountered. Mauricio said this wouldn't break integration tests.
2011-11-10 13:34:45 -05:00
Mark DePristo
dd1810140f
-stratIntervals is optional
2011-11-10 13:27:32 -05:00
Mark DePristo
67b022c34b
Cleanup for new SampleUtils function
...
-- getVCFHeadersFromRods(rods) is now available so that you don't have getVCFHeadersFromRods(rods, null) throughout the codebase
2011-11-10 13:27:13 -05:00
Mark DePristo
35fe9c8a06
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-11-10 11:11:33 -05:00
Mark DePristo
dc4932f93d
VariantEval module to stratify the variants by whether they overlap an interval set
...
The primary use of this stratification is to provide a mechanism to divide asssessment of a call set up by whether a variant overlaps an interval or not. I use this to differentiate between variants occurring in CCDS exons vs. those in non-coding regions, in the 1000G call set, using a command line that looks like:
-T VariantEval -R human_g1k_v37.fasta -eval 1000G.vcf -stratIntervals:BED ccds.bed -ST IntervalStratification
Note that the overlap algorithm properly handles symbolic alleles with an INFO field END value. In order to safely use this module you should provide entire contigs worth of variants, and let the interval strat decide overlap, as opposed to using -L which will not properly work with symbolic variants.
Minor improvements to create() interval in GenomeLocParser.
2011-11-10 10:58:40 -05:00
Mauricio Carneiro
0d8983feee
outputting the RG information
...
setReadGroup now sets the read group attribute for the GATKSAMRecord
2011-11-09 23:35:00 -05:00
Eric Banks
315ac68b0b
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-11-09 22:37:36 -05:00
Eric Banks
6313aae2c4
Adding checks for hasBasePileup() before calling getBasePileup() as per GS thread
2011-11-09 22:37:26 -05:00
Ryan Poplin
74a18d3de8
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-11-09 22:29:40 -05:00
Ryan Poplin
24712c0221
Merged bug fix from Stable into Unstable
2011-11-09 22:28:27 -05:00
Ryan Poplin
8942406aa2
Use MathUtils to compare doubles instead of testing for equality
2011-11-09 22:05:21 -05:00
Ryan Poplin
348f2db7fd
Fix for HMM optimization. If the two penalty arrays match exactly the function should return the end of the array instead of 0.
2011-11-09 22:00:52 -05:00
Eric Banks
82bf09edf3
Mark Standard Annotations with an asterisk
2011-11-09 20:42:31 -05:00
Eric Banks
04b122be29
Fix for bug reported on GetSatisfaction
2011-11-09 20:33:36 -05:00
Mauricio Carneiro
d00b2c6599
Adding a synthetic read for filtered data
...
* Generalized the concept of a synthetic read to cread both running consensus and a synthetic reads of filtered data.
* Synthetic reads can now have deletions (but not insertions)
* New reduced read tag for filtered data synthetic reads *(RF)*
* Sliding window header now keeps information of consensus and filtered data
* Synthetic reads are created simultaneously, new functionality is controlled internally by addToSyntheticReads
2011-11-09 20:16:22 -05:00
Eric Banks
21bf43f3bb
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-11-09 15:34:40 -05:00
Christopher Hartl
85bffe1dca
Merged bug fix from Stable into Unstable
2011-11-09 15:29:14 -05:00
Christopher Hartl
d828eba7f4
Allow comments in a table-formatted file to precede the header line.
2011-11-09 15:27:38 -05:00
Eric Banks
8205efbb29
Merge branch 'master' into intervals
2011-11-09 15:27:15 -05:00
Eric Banks
d64f8a89a9
Instead of the SelfScopingFeatureCodec interface, pushed this functionality into Tribble itself. Now we can e.g. determine that a file can be parsed by the BedCodec on the fly.
2011-11-09 15:24:29 -05:00
Mauricio Carneiro
f080f64f99
Preserve RG information on new GATKSAMRecord from SAMRecord
2011-11-09 14:39:20 -05:00
Mauricio Carneiro
f9530e0768
Clean unnecessary attributes from the read
...
this gives on average 40% file size reduction.
2011-11-09 14:39:20 -05:00
Mauricio Carneiro
9427ada498
Fixing no cigar bug
...
empty GATKSAMRecords will have a null cigar. Treat them accordingly.
2011-11-09 14:39:20 -05:00
Mark DePristo
e639f0798e
mergeEvals allows you to treat -eval 1.vcf -eval 2.vcf as a single call set
...
-- A bit of code cleanup in VCFUtils
-- VariantEval table to create 1000G Phase I variant summary table
-- First version of 1000G Phase I summary table Qscript
2011-11-09 14:35:50 -05:00
Christopher Hartl
149b79eaad
Merged bug fix from Stable into Unstable
2011-11-09 11:26:30 -05:00
Christopher Hartl
11abb4f9d1
Better error message.
2011-11-09 11:25:28 -05:00
Christopher Hartl
d3a533b82e
Revert "a"
...
This reverts commit 1175f50ddbf389f5da74d27dc725596582ae15af.
2011-11-09 11:22:26 -05:00
Christopher Hartl
5eaf800281
a
2011-11-09 11:22:20 -05:00
Christopher Hartl
5451fbc2b2
Merged bug fix from Stable into Unstable
2011-11-09 11:06:15 -05:00
Christopher Hartl
091229e4db
MVLikelihoodRatio now checks if the family string is provided before attempting to instantiate. Also check that variant contexts have both genotypes and genotype likelihoods.
...
Table codec now yells at users for not providing a HEADER with the table - parsing tables without a header line was causing the first line of the file to be eaten.
Table feature now has a toString method.
These are minor bug fixes.
2011-11-09 11:03:29 -05:00
Mauricio Carneiro
e1b4c3968f
Fixing GATKSAMRecord bug
...
when constructing a GATKSAMRecord from scratch, we should set "mRestOfBinaryData" to null so the BAMRecord doesn't try to retrieve missing information from the non-existent bam file.
2011-11-08 16:50:36 -05:00
Ryan Poplin
e973ca2010
fixing merge conflict.
2011-11-08 14:55:05 -05:00
Ryan Poplin
b0e6afec48
Bug fix for HMM optimization. Need to also check the gap continuation penalty array for the index with the first discrepancy.
2011-11-08 14:51:25 -05:00
Laurent Francioli
571c724cfd
Added reporting of the number of genotypes updated.
2011-11-08 15:15:51 +01:00
Ryan Poplin
94dc447a70
Merged bug fix from Stable into Unstable
2011-11-07 15:26:35 -05:00
Ryan Poplin
0b181be61f
Bug fix in SelectVariants when using a discordance track but no sample specifications. Added integration test to test this.
2011-11-07 15:25:16 -05:00
Ryan Poplin
0534149708
Merged bug fix from Stable into Unstable
2011-11-07 14:07:08 -05:00
Ryan Poplin
2d1e385ca4
Adding note to VQSR docs about Rscript being needed in the environment PATH.
2011-11-07 14:04:13 -05:00
Eric Banks
759f4fe6b8
Moving unclaimed walker with bad integration test to archive
2011-11-07 13:16:38 -05:00
Eric Banks
c1986b6335
Add notes to the GATKdocs as to when a particular annotation can/cannot be calculated.
2011-11-07 11:06:19 -05:00
Eric Banks
724e3f3b0d
Merged bug fix from Stable into Unstable
2011-11-06 22:23:22 -05:00
Eric Banks
cdd40d1222
Removing contracts for the SimpleTimer
2011-11-06 22:22:49 -05:00
Ryan Poplin
5c565d28b9
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-11-06 10:26:19 -05:00
Eric Banks
1c4e429a1c
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-11-06 00:05:56 -04:00
Eric Banks
a12bc63e5c
Get rid of support for bams without sample information in the read groups. This hidden option wasn't being used anyways because it wasn't hooked up properly in the AlignmentContext.
2011-11-05 23:54:28 -04:00
Eric Banks
90a053ea93
Don't change the mapping quality of MQ=255 reads in IR
2011-11-05 22:40:45 -04:00
Ryan Poplin
611a395783
Now properly extending candidate haplotypes with bases from the reference context instead of filling with padding bases. Functionality in the private Haplotype class is no longer necessary so removing it. No need to have four different Haplotype classes in the GATK.
2011-11-05 12:18:56 -04:00
Mark DePristo
e99871f587
Bug fix for decode loc
...
-- decodeLoc() wasn't skipping input header lines, so the system blew up when there was an = line being split.
2011-11-04 13:20:54 -04:00
Mark DePristo
a340a1aeac
Bug fix. decodeLoc() should update lineNo so you get meaningful line no when indexing
...
due to malformed VCF files.
2011-11-04 11:44:24 -04:00
Mark DePristo
9f260c0dc1
Zero byte index bug fix for RandomlySplitVariants + cleanup
...
-- vcfWriter2 was never being closed in onTraversalDone(), so the on the fly index file was being created but never actually properly written to the file.
-- This bug is ultimately due to the inability of the GATK to allow multiple VCF output writers as @Output arguments, though
-- Removed the unnecessary local variable iFraction, = 1000 * the input fraction argument. Now the system just uses a double random number and compares to the input fraction at all. Is there some subtle reason I don't appreciate for this programming construct?
2011-11-04 09:45:20 -04:00
Mauricio Carneiro
e89ff063fc
GATKSAMRecord refactor
...
The GATK engine will now provide a GATKSAMRecord to all tools which incorporates the functionality used by the GATK to the bam file (ReadGroups, Reduced Reads, ...).
* No tools should create SAMRecord anymore, use GATKSAMRecord instead *
2011-11-03 15:43:26 -04:00
Laurent Francioli
385a6abec1
Fixed a bug that wrongly swapped the mother and father genotypes in case the child genotype missing.
2011-11-03 13:04:53 +01:00
Laurent Francioli
893787de53
Functions getAsMap and getNegLog10GQ now handle missing genotype case.
2011-11-03 13:04:11 +01:00
Eric Banks
e8bceb1eaa
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-11-02 21:13:54 -04:00
Eric Banks
52b16bf739
Must check whether there's a normal vs. extended pileup before asking for it.
2011-11-02 20:45:24 -04:00
Eric Banks
e1edd6bd12
Removing the min mapping quality argument since it wasn't being used in the normal processing of the pileups in UG - only for indel pileups. Instead, we apply the min base quality to the reads in the pileup for indels and define it to be the min 'confidence' of the base. Docs are updated but I didn't rename the argument as I don't want people to complain.
2011-11-02 20:32:58 -04:00
Ryan Poplin
e94fcf537b
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-11-02 16:29:19 -04:00
Ryan Poplin
4d35272916
Bug fixes with Mauricio to functions in ReadUtils used by reduced reads and the haplotype caller.
2011-11-02 16:29:10 -04:00
Mark DePristo
8a2929c1dd
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-11-02 16:21:00 -04:00
Laurent Francioli
19ad5b635a
- Calculation of parent/child pairs corrected
...
- Separated the reporting of single and double mendelian violations in trios
2011-11-02 18:35:31 +01:00
Eric Banks
967ff647b8
Reduced reads shouldn't contribute to Fisher Strand calculations
2011-11-02 13:07:20 -04:00
Eric Banks
cf0e699226
QualByDepth was inefficiently iterating over the pileup 2 times for some reason. Removed non-useful annotation classes.
2011-11-02 12:58:38 -04:00
Eric Banks
4501dce58d
Fixing merge conflict
2011-11-02 12:50:32 -04:00
Eric Banks
54331b44e9
New way of looking at the size of a pileup: there's a physical number of elements in the data structure and there's a representative depth of coverage (since a reduced read represents depth >= 1). The size() method has been removed because its meaning is ambiguous. Updated several annotations and the UG engine to make use of the representative depths.
2011-11-02 12:47:30 -04:00
Mark DePristo
c2b97030a4
IntervalUtils for completely balanced locus-based scatter/gather
...
-- scatterLocusIntervals master utility
-- Moved around some general functionality from GenomeLocSortedSet to GenomeLoc
-- Util function for reversing a list (List<T> -> List<T>, unlike Collections version)
-- DoC is PartitionType.INTERVAL
-- Significant unit tests on new functionality (all passing)
-- Ready for real-world testing, as soon as I can get LocusScatterFunction.scala to actually work
2011-11-02 10:49:40 -04:00
Laurent Francioli
119ca7d742
Fixed a bug in parent/child pairs reporting causing a crash in case the -mvf option was used and mother was not provided
2011-11-02 08:22:33 +01:00
Laurent Francioli
b91a9c4711
- Fixed parent/child pairs handling (was crashing before)
...
- Added parent/child pair reporting
2011-11-02 08:04:01 +01:00
Mark DePristo
5fc613f972
Better default partition types for walkers
...
-- Added PartitionType.READ, and associated ReadScatterFunction. ReadScatterFunction is literally just ContigScatterFunction until someone wants to implement something better
-- LocusWalkers (and subclasses RodWalkers and RefWalkers) are by default PartitionType.LOCUS.
2011-11-01 19:47:10 -04:00
Mauricio Carneiro
36600fd8e9
added MQ of low MQ/BQ to consensus RMS
...
Bases that were excluded for MQ and BQ filters are now contributing to the MQ RMS (but not to consensus base counts and variant/not variant region triggers).
2011-11-01 17:46:12 -04:00
Mauricio Carneiro
b004489c6d
Moving ReduceRead TAG to GATKSAMRecord
...
ReduceReads are now a feature of a GATKSAMRecord, so the tag and the special methods needed to use it will now be housed by the GATKSAMRecord.
2011-11-01 17:12:09 -04:00
Mauricio Carneiro
17cc484dbd
Revert "ReduceReads ref bases are now output as '='
...
Reducing the reference bases to '=' results in an extra compression of 13% on average. The GATK is not ready to handle files with '=' bases, and the decision was to implement this a an engine support, not a part of ReduceReads.
2011-11-01 16:35:07 -04:00
Eric Banks
0839c75c8d
More minor fixes to docs
2011-10-31 21:49:27 -04:00
Eric Banks
74b018a1f3
Minor fixes to docs
2011-10-31 21:41:43 -04:00
Eric Banks
31ee5432c5
Merged bug fix from Stable into Unstable
2011-10-31 14:56:59 -04:00
David Roazen
cdde32acbd
Merged bug fix from Stable into Unstable
2011-10-31 14:21:15 -04:00
Eric Banks
f62af0291b
Check for invalid VCF records (not enough tokens) instead of assuming they are there.
2011-10-31 14:09:51 -04:00
Andrey Sivachenko
bed0acaed4
nWayOut now adds PG tag to the header as it should. Also, additional hidden option added: keepPGTags. If invoked, IndelRealigner PG tags from previous runs (if any) are kept in the header and the new PG tag is simply added, instead of overriding them
2011-10-31 12:28:28 -04:00
Mauricio Carneiro
389380a590
ReduceReads ref bases are now output as '=' to save space
...
Restructured the sliding window framework to manipulate a wrapped version of the SAMRecord that contains information about the reference.
2011-10-30 12:04:39 -04:00
Eric Banks
0ca7428e76
Allow processing of empty intervals, but warn user when this case is encountered.
2011-10-28 12:12:14 -04:00
Eric Banks
649dfe98f0
Add VCF header for any expressions that are requested
2011-10-28 10:22:19 -04:00
Eric Banks
057a79f598
This argument should be annotated as @Input
2011-10-28 09:44:49 -04:00
Eric Banks
4ba7c0cecd
Moving to private
2011-10-28 09:29:28 -04:00
Eric Banks
1bdd76c2f2
These tools now use the IntervalBinding system to handle intervals instead of doing it all manually
2011-10-28 09:28:12 -04:00
Eric Banks
6ba08a103d
Empty ROD files should generate an exception when used for creating intervals. Moved some now obsolete files to the archive as the realigner will now read all target intervals into memory.
2011-10-28 09:23:25 -04:00
Eric Banks
3d04bb5608
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-10-27 23:55:18 -04:00
Eric Banks
19e27d4568
Removing all instances of -BTI (in tests and in GATKdocs) and replacing them with the appropriate alternative.
2011-10-27 23:55:11 -04:00
Eric Banks
cafc245a43
For some reason, a class of Codecs (including TableCodec) require that a GenomeLocParser be passed in to do the position processing. Why can't they just return a Feature with chr, start, stop? Isn't that the right thing?
2011-10-27 23:54:28 -04:00
Guillermo del Angel
cbc43683ee
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-10-27 20:54:18 -04:00
Guillermo del Angel
8907e42007
First fully functional implementation of ValidationSiteSelectorWalker. User gives a) a set of input variants, b) a desired number of output variants, b) Optionally, a set of samples which will restrict sites to be polymorphic in those samples, c) a frequency selection mode: either uniform (no AF matching), or matching AF so that output sites mirror the input AF spectrum as closely as possible.
...
More testing is needed and docs need improving but so far all functionality seems up and running
2011-10-27 20:53:48 -04:00
Eric Banks
ccfd853b34
Added further integration tests for rod-based intervals that deal with more complex cases. Good call by Mark to test the empty VCF example because we were failing on it; fixed.
2011-10-27 20:43:50 -04:00
Eric Banks
c2f343773e
Oops, working too quickly last time. This is the proper fix for the potential NPE in the equals() test.
2011-10-27 15:32:08 -04:00
Khalid Shakir
b80d407dc7
No more hunting down R "resources". As a tradeoff Rscript cannot be specified on the commandline and will be found in the environment path.
...
Other minor cleanup.
2011-10-27 14:17:07 -04:00
Eric Banks
8c4dbce6d8
Don't serialize the GATKArgumentCollection for the GATKRunReports (which would have meant dealing with the new IntervalBindings). Also, forgot to remove a test that's no longer relevant to BED parsing.
2011-10-27 13:58:19 -04:00
Eric Banks
4a7e6fee3f
Remove support for BED file interval parsing in the GATK; it should all go through Tribble now. IndelRealigner no longer supports unordered interval input (which shouldn't have been used anyways). Temporarily commenting out serialization of arguments so that tests pass; this whole piece will be deleted soon anyways.
2011-10-27 13:38:08 -04:00
Matt Hanna
f7df8bdecc
Merged bug fix from Stable into Unstable
2011-10-27 11:31:17 -04:00
Matt Hanna
41ddc7bce7
Make sure we output a full stack trace when we encounter Tribble error messages on VCF header merge.
2011-10-27 11:30:04 -04:00
Eric Banks
44f905b5e5
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-10-26 23:31:11 -04:00
Eric Banks
68283b1651
Fixing docs and adding GATKdocs for the new interval functionality
2011-10-26 22:14:43 -04:00