-- Not hooked up yet, so the output of VariantEval should be the same as before
-- Implemented a VariantEvalUnitTest that tests the low level strat / eval combinatorics and counting routines
-- Better docs throughout
-- Now properly includes both bi and multi-allelic variants. These are actually counted as well, and emitted as counts and % of sites with multiple alleles
-- Bug fix for gold standard rate
-- HMS no longer tries to grab and throw all exceptions. Exceptions are just thrown directly now.
-- Proper error handling is handled by functions in HMS, which are used by ShardTraverser and TreeReducer
-- Better printing of stack traces in WalkerTest
-- Unfortunately the result of the multi-threaded test is non-deterministic so run the test 10x times to see if the right expection is always thrown
-- Now prints the stack trace and exception message of the caught exception of the wrong type, if this occurs
Adding support for active-region-based annotation for most standard annotations. I need to discuss with Ryan what to do about tests that require offsets into the reads (since I don't have access to the offsets) like e.g. the ReadPosRankSumTest.
IMPORTANT NOTE: this is still very much a dev effort and can only be accessed through private walkers (i.e. the HaplotypeCaller). The interface is in flux and so we are making no attempt at all to make it clean or to merge this with the Locus-Traversal-based annotation system. When we are satisfied that it's working properly and have settled on the proper interface, we will clean it up then.
* Fixed output format to get a valid vcf
* Optimzed the per sample pileup routine O(n^2) => O(n) pileup for samples
* Added support to overlapping intervals
* Removed expand target functionality (for now)
* Removed total depth (pointless metric)
-- SamFileReader.java:525
-- BlockCompressedInputStream:376
These were both instances were we weren't catching and rethrowing picard exceptions as UserExceptions.
- refactored the statistics classes
- concurrent callable statuses by sample are now available.
Signed-off-by: Mauricio Carneiro <carneiro@broadinstitute.org>
* Added parameter -qq to quantize qualities using a recalibration report
* Added options to quantize using the recalibration report quantization levels, new nLevels and no quantization.
* Updated BQSR scripts to make use of the new parameters
-- Now calculates the number of Indels overlapping gold standard sites, as well as the percent of indels overlapping gold standard sites
-- Removed insertion : deletion ratio for 1 bp event, replaced it with 1 + 2 : 3 bp ratio for insertions and deletions separately. This is based on an old email from Mark Daly:
// - Since 1 & 2 bp insertions and 1 & 2 bp deletions are equally likely to cause a
// downstream frameshift, if we make the simplifying assumptions that 3 bp ins
// and 3bp del (adding/subtracting 1 AA in general) are roughly comparably
// selected against, we should see a consistent 1+2 : 3 bp ratio for insertions
// as for deletions, and certainly would expect consistency between in/dels that
// multiple methods find and in/dels that are unique to one method (since deletions
// are more common and the artifacts differ, it is probably worth looking at the totals,
// overlaps and ratios for insertions and deletions separately in the methods
// comparison and in this case don't even need to make the simplifying in = del functional assumption
-- Added a new VEW argument to bind a gold standard track
-- Added two new stratifications: OneBPIndel and TandemRepeat which do exactly what you imagine they do
-- Deleted random unused functions in IndelUtils
Returns true iff VC is an non-complex indel where every allele represents an expansion or
contraction of a series of identical bases in the reference.
The logic of this function is pretty simple. Take all of the non-null alleles in VC. For
each insertion allele of n bases, check if that allele matches the next n reference bases.
For each deletion allele of n bases, check if this matches the reference bases at n - 2 n,
as it must necessarily match the first n bases. If this test returns true for all
alleles you are a tandem repeat, otherwise you are not. Note that in this context n is the
base differences between the ref and alt alleles
* fixed the loading of the new reduced size reports
* reduced BQSR scala script memory to 2Gb
* removed dcov parameter from BQSR scala script
* fixed estimatedQReported calculation from -log10(pe) to -10*log10(pe).
* updated md5's with the proper PHRED scaled EstimatedQReported