Note that this patch involves ignoring supplementary alignments. Ideally we would want
to fix their mates properly but that would require a major refactoring of this soon-to-be
deprecated tool.
Find out about a dev-bug and added TODOs (reported in #1096).
Addresses issue #1095.
Conflicts:
protected/gatk-tools-protected/src/main/java/org/broadinstitute/gatk/tools/walkers/haplotypecaller/HaplotypeCaller.java
Changed a division by -10.0 to a multiplication by -.1 in QualUtils (typically multiplication is faster than division).
Addresses performance issue #1081.
Now that Ron updated the GATK so that we use star to represent spanning
deletions, we need to catch those cases in the code that remaps alleles.
Otherwise, we try to pad the stars and that's just bad.
Added test from actual failing data.
When a sample has multiple spanning deletions and we are asked to assign
likelihoods to the spanning deletion allele, we currently choose the first
deletion. Valentin pointed out that this isn't desired behavior. I
promised Valentin that I would address this issue, so here it is.
I do not believe that the correct thing to do is to sum the likelihoods
over all spanning deletions (I came up with problematic cases where this
breaks down).
So instead I'm using a simple heuristic approach: using the hom alt PLs, find
the most likely spanning deletion for this position and use its likelihoods.
In the 10K-sample VCF from Monkol there were only 2 cases that this problem
popped up. In both cases the heuristic approach works well.
Add oxoG read count annotation and add as default annotation
Add ##SAMPLE VCF header line in accordance with TCGA VCF spec, specifying "File" line in sample header with BAM file name and "SampleName" with BAM sample name (Don't print sample file path if --no_cmdline_in_header is specified to help with test consistency)
Turn on active region assembly-based physical phasing for M2
Clean up M2-related annotations so UG doesn't crash if M2 annotations are called
-We now pull htsjdk and picard from maven central.
-Updated the GATK codebase as necessary to adapt to changes in the Feature
interface.
-Since VCFHeader now requires that all header lines have unique keys, uniquified
the keys of GVCFBlock header lines by including the min/max GQ in the key.
Updated MD5s accordingly.
-Other MD5s changed as a result of an htsjdk fix to eliminate "-0" in VCF output.
In the case where there's a low quality SNP under a spanning deletion in the gvcfs:
if the SNP is not genotyped by GenotypeGVCFs (because it's just noise) we were still
emitting a record with just the symbolic DEL allele (because that allele is high quality).
We no longer do that.