The check is performed by a Read Transformer that samples (currently set to once
every 1000 reads so that we don't hurt overall GATK performance) from the input
reads and checks to make sure that none of the base quals is too high (> Q60). If
we encounter such a base then we fail with a User Error.
* Can be over-ridden with --allow_potentially_misencoded_quality_scores.
* Also, the user can choose to fix his quals on the fly (presumably using PrintReads
to write out a fixed bam) with the --fix_misencoded_quality_scores argument.
Added unit tests.
-- Wasn't handling indel overlaps correctly in SiteIterator.getSitesBefore, causing it to incorrectly skip variants underlying indels (the getSitesBefore was considering both start and stop [not the correct behavior]) causing it to only get sites up to the first record whose stop overlapped the requested start.
-- Just separated infrastructure into core package, away from the walkers themselves.
-- Added na12878kb.jar target that builds a jar that can run a test main function (see testNA12878kbJar.csh)
-Allows packaged resource files to be accessed within tests
-Guards against packaging errors in dist/ jars by testing the
jars that actually get run rather than unpackaged class files.
Previously we were only protected against packaging errors in the
monolithic jars posted to our website, not the dist/ jars used in
everyday runs.
-"ant fasttest" still uses the unpackaged class files for speed
(don't want to have to rebuild the jars in fasttest). Relies on
dubious methods to get at the resource files that would end up
in the jars.
-Eliminated the stupid separate "test" ivy config. Now we only
invoke ivy ONCE during an ant build that includes tests.
-- Functions that depend on the value of variables that have GATK injection values must be initialized lazy, not at object creation time. Previous version broken dbToUse and useLocal arguments. Fixed
With the newly-added support for packaging arbitrary resources, the
resources were getting packaged in a normal build but not when
creating a standalone GATK jar. This corrects this oversight.
-Resources must be in a "resources" or "templates" subdirectory within the Java package hierarchy
-Remove direct inclusion of private resources from the main jar packaging target added in Jacob's
patch: this would break builds where the private directory was absent, and did not respect build
settings (include.private, etc.)
As reported by Menachem Fromer: a critical bug in AFCalcResult:
Specifically, the implementation:
public boolean isPolymorphic(final Allele allele, final double log10minPNonRef) {
return getLog10PosteriorOfAFGt0ForAllele(allele) >= log10minPNonRef;
}
seems incorrect and should probably be:
getLog10PosteriorOfAFEq0ForAllele(allele) <= log10minPNonRef
The issue here is that the 30 represents a Phred-scaled probability of *error* and it's currently being compared to a log probability of *non-error*.
Instead, we need to require that our probability of error be less than the error threshold.
This bug has only a minor impact on the calls -- hardly any sites change -- which is good. But the inverted logic effects multi-allelic sites significantly. Basically you only hit this logic with multiple alleles, and in that case it'\s including extra alt alleles incorrectly, and throwing out good ones.
Change was to create a new function that properly handles thresholds that are PhredScaled quality scores:
/**
* Same as #isPolymorphic but takes a phred-scaled quality score as input
*/
public boolean isPolymorphicPhredScaledQual(final Allele allele, final double minPNonRefPhredScaledQual) {
if ( minPNonRefPhredScaledQual < 0 ) throw new IllegalArgumentException("phredScaledQual " + minPNonRefPhredScaledQual + " < 0 ");
final double log10Threshold = Math.log10(QualityUtils.qualToProb(minPNonRefPhredScaledQual));
return isPolymorphic(allele, log10Threshold);
}
Create MongoDBManager, which keeps track of connections based on Locator class. Locators can be instantiated directly, or read from JSON files (NA12878DBArgumentCollection uses the GSon library)