Added pom.xml workarounds for duplicate classpath error, due to gatk-framework dependency containing required BaseTest, and jarred *UnitTest/*IntegrationTest classes that also exist as files under target/test-classes.
Here are the git moved directories in case other files need to be moved during a merge:
git-mv private/java/src/ private/gatk-private/src/main/java/
git-mv private/R/scripts/ private/gatk-private/src/main/resources/
git-mv private/java/test/ private/gatk-private/src/test/java/
git-mv private/testdata/ private/gatk-private/src/test/resources/
git-mv private/scala/qscript/ private/queue-private/src/main/qscripts/
git-mv private/scala/src/ private/queue-private/src/main/scala/
git-mv protected/java/src/ protected/gatk-protected/src/main/java/
git-mv protected/java/test/ protected/gatk-protected/src/test/java/
git-mv public/java/src/ public/gatk-framework/src/main/java/
git-mv public/java/test/ public/gatk-framework/src/test/java/
git-mv public/testdata/ public/gatk-framework/src/test/resources/
git-mv public/scala/qscript/ public/queue-framework/src/main/qscripts/
git-mv public/scala/src/ public/queue-framework/src/main/scala/
git-mv public/scala/test/ public/queue-framework/src/test/scala/
Changes:
-------
<NON_REF> likelihood in variant sites is calculated as the maximum possible likelihood for an unseen alternative allele: for reach read is calculated as the second best likelihood amongst the reported alleles.
When –ERC gVCF, stand_conf_emit and stand_conf_call are forcefully set to 0. Also dontGenotype is set to false for consistency sake.
Integration test MD5 have been changed accordingly.
Additional fix:
--------------
Specially after adding the <NON_REF> allele, but also happened without that, QUAL values tend to go to 0 (very large integer number in log 10) due to underflow when combining GLs (GenotypingEngine.combineGLs). To fix that combineGLs has been substituted by combineGLsPrecise that uses the log-sum-exp trick.
In just a few cases this change results in genotype changes in integration tests but after double-checking using unit-test and difference between combineGLs and combineGLsPrecise in the affected integration test, the previous GT calls were either border-line cases and or due to the underflow.
Problem:
matchToMatch transition calculation was wrong resulting in transition probabilites coming out of the Match state that added more than 1.
Reports:
https://www.pivotaltracker.com/s/projects/793457/stories/62471780https://www.pivotaltracker.com/s/projects/793457/stories/61082450
Changes:
The transition matrix update code has been moved to a common place in PairHMMModel to dry out its multiple copies.
MatchToMatch transtion calculation has been fixed and implemented in PairHMMModel.
Affected integration test md5 have been updated, there were no differences in GT fields and example differences always implied
small changes in likelihoods that is what is expected.
In unifying the arguments it was clear that the values were inconsistent throughout the code, so now there's a
single value that is intended to be more liberal in what it allows in (in an attempt to increase sensitivity).
Very little code actually changes here, but just about every md5 in the HC integration tests are different (as
expected). Added another integration test for the new argument.
To be used by David R to test his per-branch QC framework: does this commit make the HC look better against the KB?
It didn't completely work before (it was hard-coded for a particular long-lost data set) but it should work now.
Since I thought that it might prove useful to others, I moved it to protected and added integration tests.
GERALDINE: NEW TOOL ALERT!
Problem: the codec was written to take in consensus pileups produced with pileup -c option (which consists of 10 or 13 fields per line depending on the variant type) but errored out on the basic pileup format (which only has 6 fields per line). This was inconsistent and confusing to users.
Solution: I added a switch in the parsing to recognize and handle both cases more appropriately, and updated related docs. While I was at it I also improved error messages in CheckPileup, which now emits User Error: Bad Input exceptions when reporting mismatches. Which may not be the best thing to do (ultimately they're not really errors, they're just reporting unwelcome results) but it beats emitting Runtime Exceptions.
Tested by CheckPileupIntegrationTest which tests both format cases.
The code comments very clearly state that INFO fields shouldn't be propagated into the output,
but someone must have accidentally changed it afterwards. This is just a simple one-line fix
to make sure the code adhered to the comments.
Delivers #63333488.
-Added docs for ERC mode in HC
-Move RecalibrationPerformance walker since to private since it is experimental and unsupported
-Updated VR docs and restored percentBad/numBad (but @Hidden) to enable deprecation alert if users try to use them
-Improved error msg for conflict between per-interval aggregation and -nt
-Minor clean up in exception docs
-Added Toy Walkers category for devs and dev supercat (to build out docs for developers)
-Added more detailed info to GenotypeConcordance doc based on Chris forum post
-Added system to include min/max argument values in gatkdocs (build gatkdocs with 'ant gatkdocs' to test it, see engine and DoC args for in situ examples)
-Added tentative min/max argument annotations to DepthOfCoverage and CommandLineGATK arguments (and improved docs while at it)
-Added gotoDev annotation to GATKDocumentedFeature to track who is the go-to person in GSA for questions & issues about specific walkers/tools (now discreetly indicated in each gatkdoc)
It is true that indels of length > 1 have higher QUALS than those of length = 1. But for the HC those
QUALS are not that much higher, and it doesn't continue scaling up as the indels get larger. So we no
longer normalize by indel length (which massively over-penalizes larger events and effectively drops their
QD to 0).
For the UG the previous normalization also wasn't perfect. Now we divide the indel length by a factor
of 3 to make sure that QD is consistent over the range of indel lengths.
Integration tests change because QD is different for indels.
Also, got permission from Valentin to archive a failing test that no longer applies.
Thanks to Kurt on the GATK forum for pointing this all out.