-- Support for streaming VCF writing via the VCFWriter interface
-- GCF now has a header and a footer. The header is minimal, and contains a forward pointer to the position of the footer in the file.
-- Readers now read the header, and then jump to the footer to get the rest of the "header" information
-- Version now a field in GCF
-- per sample stratification was not being calculated correctly. The alt allele was always remaining, even if the genotype of the sample was hom-ref. Although conceptually fine, this breaks the assumptions of all of the eval modules, so per sample stratifications actually included all variants for everything. Eric is going to fix the system in general, so this commit may break the build.
-- ArrayList are List where possible
-- states refactored into VariantStratifier base class (reduces many lines of duplicate code)
-- Added VariantType stratification that partitions report by VariantContext.Type
- Instead of using readLength, the ReadUtil function are used to get a proper read coordinate
- Added debug info in interval clipping ( with -dl)
NOTE: method might not be safe for production and checks need to be added to the ClippingOp code
Updates for HybridSelectionPipeline:
- Use VQSR on SNPs for projects using bait set whole_exome_agilent_1 and applying cut at 98.5.
- If a whole_exome_agilent_1 project has less than 50 samples also mixing in 1000G samples to reach VQSR thresholds.
- Updated SNP hard filters based on analysis done with ebanks to approximate VQSR results on small target batches.
- Removed GSA_PRODUCTION_ONLY flag from indel caller.
- Updated indel hard filters based on delangel's analysis.
- Updated HybridSelectionPipelineTest to use HARD SNP filters only, for now.
-- Very minimal working version that can read / write binary VCFs with genotypes
-- Already 10x faster for sites, 5x for fully parsed genotypes, and 1000x for skipping genotypes when reading
-- Removed versions getAttribriteAsX(key) that except on not having the value.
-- Removed version that getAttributeAsXNoException(key)
-- The only available assessors are now getAttributeAsX(key, default).
-- This single accessors properly handle their argument types, so if the value is a double it is returned directly for getAttributeAsDouble(), or if it's a string it's converted to a double. If the key isn't found, default is returned.
-- Don't create an empty LinkedHashSet() for PASS fields. Just return Collections.emptySet() instead.
-- For filter fields with actual values, returns an unmodifiableSet instead of one that can be changed