-- throws UserException; added tests in PosteriorLikelihoodsUtilsUnitTests
Add error handling to CalculateGenotypePosteriors for cases where MLEAC>AN; add tests in PosteriorLikelihoodsUtilsUnitTests
Add unit tests to confirm that CalculateGenotypePosteriors has the ability to switch genotypes for four cases
Changes:
1. Addressed review comments on new K-best haplotype assembly graph finder.
2. Generalize KBestHaplotypeFinder to deal with multiple source and sink vertices.
3. Updated test to use KBestHaplotypeFinder instead of KBestPaths
4. Retired KBestPaths to the archive.
5. Small improvements to the code and documentation.
Story:
https://www.pivotaltracker.com/story/show/66238286
Changes:
1. Created a new k-best haplotype search implementation in class KBestHaplotypeFinder.
2. Changed HC code to use the new implementation.
This seems to fix the original problem without causing significant changes in outputs using some empirical data test cases
3. Moved haplotype's cigar calculation code from Path to CigarUtils; need that in order to gain independence from Path in some parts of the code.
In any case that seems like a more natural location for that functionality.
The purpose of this is to be able to call SNPs that fall at the beginning of a capture region (or exon).
Before, the read threading code would only start threading from the first kmer that matched the reference. But
that means that, in the case of a SNP at the beginning of an exome, it wouldn't start threading the read until
after the SNP position - so we'd lose the SNP.
For now, this is still very experimental. It works well for RNAseq data, but does introduce FPs in normal exomes.
I know why this is and how to fix it, but it requires a much larger fix to the HC: the HC needs to pass all reads
and bases to the annotation engine (like UG does) instead of just the high quality ones. So for now, the head
merging is disabled by default.
As per reviewer comments, I moved the head and tail merging code out into their own class.
We use a "manager" to keep track of observed splits and previous reads. This can be extended/modified in the
future to try to salvage those overhangs instead of hard-clipping them and/or try other possible strategies.
Added unit tests and more integration tests.
The GATK now fails with a user error if you try to run with a reduced bam.
(I added a unit test for that; everything else here is just the removal of all traces of RR)
Re-added import java.io.File for BamGatherFunction.
Other cleanup to resolve scala syntax warnings from intellij.
Moved Example UG script to from protected to public.
This commit consists of 2 main changes:
1. When the strand table gets too large, we normalize it down to values that are more reasonable.
2. We don't include a particular sample's contribution unless the total ref and alt counts are at least 2 each;
this is a heuristic method for dealing only with hets.
MD5s change as expected.
Hopefully we'll have a more robust implementation for GATK 3.1.
The slicePrefix method functionality was broken.
Story:
https://www.pivotaltracker.com/story/show/64595624
Changes:
1. Fixed the bug.
2. Added unit test to check on the method functionality.
3. Added a integration test to verify the bug has been fixed in a empirical data reprudible case.
Story:
https://www.pivotaltracker.com/s/projects/1007536
Changes:
1. HC's GenotypingEngine now invokes reverseAlleleTrimming on GVCF variant output lines.
2. GenotypeGVCFs also reverse trim after regenotyping as some alt. alleles are dropped (observed in real-data).
The writer was never resetting the pointer to the end of the last non-ref VariantContext that it saw.
This was fine except when it jumped to a new contig - and a lower position on that contig - where it
thought that it was still part of that previous non-ref VariantContext so wouldn't emit a reference
block. Therefore, ref blocks were missing from the beginnings of all chromosomes (except chr1).
Added unit test to cover this case.
Bug uncovered by some untrimmed alleles in the single sample pipeline output.
Notice however does not fix the untrimmed alleles in general.
Story:
https://www.pivotaltracker.com/story/show/65481104
Changes:
1. Fixed the bug itself.
2. Fixed non-working tests (sliently skipped due to exception in dataProvider).
Note that this tool is still a work in progress and very experimental, so isn't 100% stable. Most of
the features are untested (both by people and by unit/integration tests) because Chris Hartl implemented
it right before he left, and we're going to need to add tests at some point soon. I added a first
integration test in this commit, but it's just a start.
The fixes include:
1. Stop having the genotyping code strip out AD values. It doesn't make sense that it should do this so
I don't know why it was doing that at all.
Updated GenotypeGVCFs so that it doesn't need to manually recover them anymore.
This also helps CalculateGenotypePosteriors which was losing the AD values.
Updated code in LeftAlignAndTrimVariants to strip out PLs and AD, since it wasn't doing that before.
Updated the integration test for that walker to include such data.
2. Chris was calling Math.pow directly on the normalized posteriors which isn't safe.
Instead, the normalization routine itself can revert back to log scale in a safe manner so let's use it.
Also, renamed the variable to posteriorProbabilities (and not likelihoods).
3. Have CGP update the AC/AF/AN counts after fixing GTs.
After extensive detective work, Joel determined that these tests were failing
due to changes in the implementation of Math.pow() in newer versions of
Java 1.7.
All GSA members should ensure that they're using a JDK that is at least
as current as the one in the Java-1.7 dotkit on the Broad servers
(build 1.7.0_51-b13).
1. updated QualByDepth not to use AD-restricted depth if it is zero.
Added unit test this change.
2. Fixed small bug in CombineGVCFs where spanning deletions were not being treated consistently throughout.
Added test for this situation.
3. Make sure GenotypeGVCFs puts in the required headers.
Updated test files to make sure this is covered.
4. Have GenotypeGVCFs propagate up the MLEAC/AF (which were getting clobbered out).
Tests updated to account for this.
when the AD annotation is present for a given genotype then we only use its depth for QD if the variant depth > 1.
Added new unit tests for QualByDepth.
Creating new VariantContexts each time we broke up a block was very expensive because we break up
blocks so often. Also, calling into GATKVariantContextUtils.simpleMerge was really hurting performance.
MD5 changes because we no longer propogate any INFO fields (except for END) for reference blocks; the tests
have the now unused BLOCK_SIZE field that now get dropped.
Story:
https://www.pivotaltracker.com/story/show/65388246
Additional changes and notes:
1. The fix consist in forcing the output of all PLs by setting the standard flag for that '-allSitePLs'.
2. BP_RESOLUTION was handled differently to GVCF in some aspect that should be common. That has been fixed.
1. AD values now propogate up (they weren't before).
2. MIN_DP gets transferred over to DP and removed.
3. SB gets removed after FS is calculated.
Also, added a bunch of new integration tests for GenotypeGVCFs.
AC,AF,AN,FS,QD - they'll all be recomputed later.
BLOCK_SIZE and MIN_GQ were not necessary.
I also made the StrandBiasBySample annotation forced on when in gVCF mode.
It turns out that its output wasn't compatible with BCF so I patched it (and the variant jar too).
This tool will take any number of gVCFs and create a merged gVCF (as opposed to
GenotypeGVCFs which produces a standard VCF).
Added unit/integration tests and fixed up GATK docs.
New properties to disable regenerating example resources artifact when each parallel test runs under packagetest.
Moved collection of packagetest parameters from shell scripts into maven profiles.
Fixed necessity of test-utils jar by removing incorrect dependenciesToScan element during packagetests.
When building picard libraries, run clean first.
Fixed tools jar dependency in picard pom.
Integration tests properly use the ant-bridge.sh test.debug.port variable, like unit tests.
Story:
https://www.pivotaltracker.com/story/show/65048706https://www.pivotaltracker.com/story/show/65116908
Changes:
ActiveRegionTrimmer in now an argument collection and it returns not only the trimmed down active region but also the non-variant containing flanking regions
HaplotypeCaller code has been simplified significantly pushing some functionality two other classes like ActiveRegion and AssemblyResultSet.
Fixed a problem with the way the trimming was done causing some gVCF non-variant records no have conservative 0,0,0 PLs
1. Throw a user error when the input data for a given genotype does not contain PLs.
2. Add VCF header line for --dbsnp input
3. Need to check that the UG result is not null
4. Don't error out at positions with no gVCFs (which is possible when using a dbSNP rod)
Joel is working on these failures in a separate branch. Since
maven (currently! we're working on this..) won't run the whole
test suite to completion if there's a failure early on, we need
to temporarily disable these tests in order to allow group members
to run tests on their branches again.