-- Just separated infrastructure into core package, away from the walkers themselves.
-- Added na12878kb.jar target that builds a jar that can run a test main function (see testNA12878kbJar.csh)
-Allows packaged resource files to be accessed within tests
-Guards against packaging errors in dist/ jars by testing the
jars that actually get run rather than unpackaged class files.
Previously we were only protected against packaging errors in the
monolithic jars posted to our website, not the dist/ jars used in
everyday runs.
-"ant fasttest" still uses the unpackaged class files for speed
(don't want to have to rebuild the jars in fasttest). Relies on
dubious methods to get at the resource files that would end up
in the jars.
-Eliminated the stupid separate "test" ivy config. Now we only
invoke ivy ONCE during an ant build that includes tests.
-- Functions that depend on the value of variables that have GATK injection values must be initialized lazy, not at object creation time. Previous version broken dbToUse and useLocal arguments. Fixed
With the newly-added support for packaging arbitrary resources, the
resources were getting packaged in a normal build but not when
creating a standalone GATK jar. This corrects this oversight.
-Resources must be in a "resources" or "templates" subdirectory within the Java package hierarchy
-Remove direct inclusion of private resources from the main jar packaging target added in Jacob's
patch: this would break builds where the private directory was absent, and did not respect build
settings (include.private, etc.)
As reported by Menachem Fromer: a critical bug in AFCalcResult:
Specifically, the implementation:
public boolean isPolymorphic(final Allele allele, final double log10minPNonRef) {
return getLog10PosteriorOfAFGt0ForAllele(allele) >= log10minPNonRef;
}
seems incorrect and should probably be:
getLog10PosteriorOfAFEq0ForAllele(allele) <= log10minPNonRef
The issue here is that the 30 represents a Phred-scaled probability of *error* and it's currently being compared to a log probability of *non-error*.
Instead, we need to require that our probability of error be less than the error threshold.
This bug has only a minor impact on the calls -- hardly any sites change -- which is good. But the inverted logic effects multi-allelic sites significantly. Basically you only hit this logic with multiple alleles, and in that case it'\s including extra alt alleles incorrectly, and throwing out good ones.
Change was to create a new function that properly handles thresholds that are PhredScaled quality scores:
/**
* Same as #isPolymorphic but takes a phred-scaled quality score as input
*/
public boolean isPolymorphicPhredScaledQual(final Allele allele, final double minPNonRefPhredScaledQual) {
if ( minPNonRefPhredScaledQual < 0 ) throw new IllegalArgumentException("phredScaledQual " + minPNonRefPhredScaledQual + " < 0 ");
final double log10Threshold = Math.log10(QualityUtils.qualToProb(minPNonRefPhredScaledQual));
return isPolymorphic(allele, log10Threshold);
}
Create MongoDBManager, which keeps track of connections based on Locator class. Locators can be instantiated directly, or read from JSON files (NA12878DBArgumentCollection uses the GSon library)
0. Add additional create method to MongoVariantContext as convenience, if we want a custom TruthStatus (and change "type" to less ambiguous "truthStatus")
1. Have NA12878KnowledgeBase return WriteResults from insert methods, so caller can know if there's an error
2. Provide constructors for NA12878DBArgumentCollection, since we need to be able to create this class for NA12878DBKnowledgeBase
-- Cleaned up code for SiteIterator.
-- Added a generic error handling system for the SiteIterator. Created approaches to simply throw errors when invalid records are found, to log them, and to remove them from the sites collection.
-- By default getCalls() produces a SiteIterator that removes incorrectly formatted records from the DB
-- Created NA12878KnowledgeBaseServer GATK walker that (1) continually finds newly added records to the sites database and rebuilds the consensus as needed and (2) archives the reviewed sites to a VCF file upon server termination
-- More, better unit tests everywhere
-- Adding infrastructure to find only newly added sites to the NA12878KnowledgeBase. Uses mongos ordering of _id to obtain the records (and the sites) of variants newly added to the sites collection. This is essential infrastructure to write a NA12878KnowledgeBase server that continually keeps the consensus records updated as new sites are added to the database