* Re-wrote the sliding window approach to allow the variant region not to clip the reads that overlap it.
* Updated consensus to include only reads that were not passed on by the variant region, header counts are updated on the fly to avoid recompute
* Added soft clipped bases to ReduceReads analysis by unclipping high quality soft-clips then re-clipping after reduce reads
* Updated all integration tests
Instead of creating a supposed network temporary directory locally which then fails when remote nodes try to access the non-existant dir, now checking to see if they network directory is available and throwing a SkipException to bypass the test when it cannot be run.
TODO: Throw similar SkipExceptions when fastas are not available. Right now instead of skipping the test or failing fast the REQUIRE_NETWORK_CONNECTION=false means that the errors popup later when the networked fastas aren't found.
- Merged Roger's metrics with Mauricio's optimizations
- Added Stats for DiagnoseTargets
- now has functions to find the median depth, and upper/lower quartile
- the REF_N callable status is implemented
- The walker now runs efficiently
- Diagnose Targets accepts overlapping intervals
- Diagnose Targets now checks for bad mates
- The read mates are checked in a memory efficient manner
- The statistics thresholds have been consolidated and moved outside of the statistics classes and into the walker.
- Fixed some bugs
- Removed rod binding
Added more Unit tests
- Test callable statuses on the locus level
- Test bad mates
- Changed NO_COVERAGE -> COVERAGE_GAPS to avoid confusion
Signed-off-by: Mauricio Carneiro <carneiro@broadinstitute.org>
-- VCFWriter / codec now passes the same rigorous UnitTest as the BCF2 writer / codec. As part of this we now can only test doubles for equivalence in VCFs to 1e-2 (not exactly impressive)
-- This version of BCF should actually work properly for most files, assuming headers are properly defined.
-- Lots of bug fixes to BCF2 codec
-- Genotype getPhredScaledQual is now an int, returning -1 if there's no QUAL. NOTE THIS SEMANTICS change
-- Equals() method for GenotypeLikelihoods, using PLs.
-- VCFCodec now longer adds empty bindings to missing input field values. NOTE THIS CHANGE
-- VCs can be marked as fully decoded, so that when fullyDecode() is called it returns itself, instead of doing the decoding work. The BCF2 codec now makes VCs marked as fully decoded
-- stringToBytes returns empty list for null or "" string in BCF2Encoder
-- Proper handling of genotype ordering in BCF2 reader / writer
-- Removed the crazy slow noDups and sameSamples tests that were slowing down unit and integration tests totally unnecessarily
-- Many failing MD5s now due to double -> int change in GQ, will update later
-- Added a new parameter to control the maximum number of pairwise differences to generate, which previously could expand to a very large number when there were lots of differences among genotypes, resulting in a n^2 algorithm running with n > 1,000,000