After extensive detective work, Joel determined that these tests were failing
due to changes in the implementation of Math.pow() in newer versions of
Java 1.7.
All GSA members should ensure that they're using a JDK that is at least
as current as the one in the Java-1.7 dotkit on the Broad servers
(build 1.7.0_51-b13).
This change should allow us to test that the GATK jar has been
correctly packaged at release time, by ensuring that only the
packaged jar + a few test-related dependencies are on the classpath
when tests are run.
Note that we still need to actually test that this works as intended
before we can make this live in the Bamboo release plan.
1. AD values now propogate up (they weren't before).
2. MIN_DP gets transferred over to DP and removed.
3. SB gets removed after FS is calculated.
Also, added a bunch of new integration tests for GenotypeGVCFs.
This tool will take any number of gVCFs and create a merged gVCF (as opposed to
GenotypeGVCFs which produces a standard VCF).
Added unit/integration tests and fixed up GATK docs.
New properties to disable regenerating example resources artifact when each parallel test runs under packagetest.
Moved collection of packagetest parameters from shell scripts into maven profiles.
Fixed necessity of test-utils jar by removing incorrect dependenciesToScan element during packagetests.
When building picard libraries, run clean first.
Fixed tools jar dependency in picard pom.
Integration tests properly use the ant-bridge.sh test.debug.port variable, like unit tests.
Story:
https://www.pivotaltracker.com/story/show/65048706https://www.pivotaltracker.com/story/show/65116908
Changes:
ActiveRegionTrimmer in now an argument collection and it returns not only the trimmed down active region but also the non-variant containing flanking regions
HaplotypeCaller code has been simplified significantly pushing some functionality two other classes like ActiveRegion and AssemblyResultSet.
Fixed a problem with the way the trimming was done causing some gVCF non-variant records no have conservative 0,0,0 PLs
These changes happened in Tribble, but Joel clobbered them with his commit.
We can now change the logging priority on failures to validate the sequence dictionary to WARN.
Thanks to Tim F for indirectly pointing this out.
1. Throw a user error when the input data for a given genotype does not contain PLs.
2. Add VCF header line for --dbsnp input
3. Need to check that the UG result is not null
4. Don't error out at positions with no gVCFs (which is possible when using a dbSNP rod)
Joel is working on these failures in a separate branch. Since
maven (currently! we're working on this..) won't run the whole
test suite to completion if there's a failure early on, we need
to temporarily disable these tests in order to allow group members
to run tests on their branches again.
Added pom.xml workarounds for duplicate classpath error, due to gatk-framework dependency containing required BaseTest, and jarred *UnitTest/*IntegrationTest classes that also exist as files under target/test-classes.
Here are the git moved directories in case other files need to be moved during a merge:
git-mv private/java/src/ private/gatk-private/src/main/java/
git-mv private/R/scripts/ private/gatk-private/src/main/resources/
git-mv private/java/test/ private/gatk-private/src/test/java/
git-mv private/testdata/ private/gatk-private/src/test/resources/
git-mv private/scala/qscript/ private/queue-private/src/main/qscripts/
git-mv private/scala/src/ private/queue-private/src/main/scala/
git-mv protected/java/src/ protected/gatk-protected/src/main/java/
git-mv protected/java/test/ protected/gatk-protected/src/test/java/
git-mv public/java/src/ public/gatk-framework/src/main/java/
git-mv public/java/test/ public/gatk-framework/src/test/java/
git-mv public/testdata/ public/gatk-framework/src/test/resources/
git-mv public/scala/qscript/ public/queue-framework/src/main/qscripts/
git-mv public/scala/src/ public/queue-framework/src/main/scala/
git-mv public/scala/test/ public/queue-framework/src/test/scala/
Changes:
-------
<NON_REF> likelihood in variant sites is calculated as the maximum possible likelihood for an unseen alternative allele: for reach read is calculated as the second best likelihood amongst the reported alleles.
When –ERC gVCF, stand_conf_emit and stand_conf_call are forcefully set to 0. Also dontGenotype is set to false for consistency sake.
Integration test MD5 have been changed accordingly.
Additional fix:
--------------
Specially after adding the <NON_REF> allele, but also happened without that, QUAL values tend to go to 0 (very large integer number in log 10) due to underflow when combining GLs (GenotypingEngine.combineGLs). To fix that combineGLs has been substituted by combineGLsPrecise that uses the log-sum-exp trick.
In just a few cases this change results in genotype changes in integration tests but after double-checking using unit-test and difference between combineGLs and combineGLsPrecise in the affected integration test, the previous GT calls were either border-line cases and or due to the underflow.
Problem:
matchToMatch transition calculation was wrong resulting in transition probabilites coming out of the Match state that added more than 1.
Reports:
https://www.pivotaltracker.com/s/projects/793457/stories/62471780https://www.pivotaltracker.com/s/projects/793457/stories/61082450
Changes:
The transition matrix update code has been moved to a common place in PairHMMModel to dry out its multiple copies.
MatchToMatch transtion calculation has been fixed and implemented in PairHMMModel.
Affected integration test md5 have been updated, there were no differences in GT fields and example differences always implied
small changes in likelihoods that is what is expected.
It didn't completely work before (it was hard-coded for a particular long-lost data set) but it should work now.
Since I thought that it might prove useful to others, I moved it to protected and added integration tests.
GERALDINE: NEW TOOL ALERT!
Problem: the codec was written to take in consensus pileups produced with pileup -c option (which consists of 10 or 13 fields per line depending on the variant type) but errored out on the basic pileup format (which only has 6 fields per line). This was inconsistent and confusing to users.
Solution: I added a switch in the parsing to recognize and handle both cases more appropriately, and updated related docs. While I was at it I also improved error messages in CheckPileup, which now emits User Error: Bad Input exceptions when reporting mismatches. Which may not be the best thing to do (ultimately they're not really errors, they're just reporting unwelcome results) but it beats emitting Runtime Exceptions.
Tested by CheckPileupIntegrationTest which tests both format cases.
-Added docs for ERC mode in HC
-Move RecalibrationPerformance walker since to private since it is experimental and unsupported
-Updated VR docs and restored percentBad/numBad (but @Hidden) to enable deprecation alert if users try to use them
-Improved error msg for conflict between per-interval aggregation and -nt
-Minor clean up in exception docs
-Added Toy Walkers category for devs and dev supercat (to build out docs for developers)
-Added more detailed info to GenotypeConcordance doc based on Chris forum post
-Added system to include min/max argument values in gatkdocs (build gatkdocs with 'ant gatkdocs' to test it, see engine and DoC args for in situ examples)
-Added tentative min/max argument annotations to DepthOfCoverage and CommandLineGATK arguments (and improved docs while at it)
-Added gotoDev annotation to GATKDocumentedFeature to track who is the go-to person in GSA for questions & issues about specific walkers/tools (now discreetly indicated in each gatkdoc)