We use a "manager" to keep track of observed splits and previous reads. This can be extended/modified in the
future to try to salvage those overhangs instead of hard-clipping them and/or try other possible strategies.
Added unit tests and more integration tests.
The GATK now fails with a user error if you try to run with a reduced bam.
(I added a unit test for that; everything else here is just the removal of all traces of RR)
-- Keep a list of processed files in ArgumentTypeDescriptor.getRodBindingsCollection
-- Throw user exception if a file name duplicates one that was previously parsed
-- Throw user exception if the ROD list is empty
-- Added two unit tests to RodBindingCollectionUnitTest
Re-added import java.io.File for BamGatherFunction.
Other cleanup to resolve scala syntax warnings from intellij.
Moved Example UG script to from protected to public.
- This change means that BamGatherFunction will now have an @Output field for the BAM index, which will allow the bai to be deleted for intermediate functions
Signed-off-by: Khalid Shakir <kshakir@broadinstitute.org>
This commit consists of 2 main changes:
1. When the strand table gets too large, we normalize it down to values that are more reasonable.
2. We don't include a particular sample's contribution unless the total ref and alt counts are at least 2 each;
this is a heuristic method for dealing only with hets.
MD5s change as expected.
Hopefully we'll have a more robust implementation for GATK 3.1.
The slicePrefix method functionality was broken.
Story:
https://www.pivotaltracker.com/story/show/64595624
Changes:
1. Fixed the bug.
2. Added unit test to check on the method functionality.
3. Added a integration test to verify the bug has been fixed in a empirical data reprudible case.
Story:
https://www.pivotaltracker.com/s/projects/1007536
Changes:
1. HC's GenotypingEngine now invokes reverseAlleleTrimming on GVCF variant output lines.
2. GenotypeGVCFs also reverse trim after regenotyping as some alt. alleles are dropped (observed in real-data).
The writer was never resetting the pointer to the end of the last non-ref VariantContext that it saw.
This was fine except when it jumped to a new contig - and a lower position on that contig - where it
thought that it was still part of that previous non-ref VariantContext so wouldn't emit a reference
block. Therefore, ref blocks were missing from the beginnings of all chromosomes (except chr1).
Added unit test to cover this case.
Bug uncovered by some untrimmed alleles in the single sample pipeline output.
Notice however does not fix the untrimmed alleles in general.
Story:
https://www.pivotaltracker.com/story/show/65481104
Changes:
1. Fixed the bug itself.
2. Fixed non-working tests (sliently skipped due to exception in dataProvider).
This script publishes GATK/Queue jars for internal GSA use to the following locations
whenever tests pass:
/humgen/gsa-hpprojects/GATK/private_unstable_builds/GenomeAnalysisTK_latest_unstable.jar
/humgen/gsa-hpprojects/Queue/private_unstable_builds/Queue_latest_unstable.jar
These jars include private code, and so are for internal use only.
Note that this tool is still a work in progress and very experimental, so isn't 100% stable. Most of
the features are untested (both by people and by unit/integration tests) because Chris Hartl implemented
it right before he left, and we're going to need to add tests at some point soon. I added a first
integration test in this commit, but it's just a start.
The fixes include:
1. Stop having the genotyping code strip out AD values. It doesn't make sense that it should do this so
I don't know why it was doing that at all.
Updated GenotypeGVCFs so that it doesn't need to manually recover them anymore.
This also helps CalculateGenotypePosteriors which was losing the AD values.
Updated code in LeftAlignAndTrimVariants to strip out PLs and AD, since it wasn't doing that before.
Updated the integration test for that walker to include such data.
2. Chris was calling Math.pow directly on the normalized posteriors which isn't safe.
Instead, the normalization routine itself can revert back to log scale in a safe manner so let's use it.
Also, renamed the variable to posteriorProbabilities (and not likelihoods).
3. Have CGP update the AC/AF/AN counts after fixing GTs.