* This is a shortcut for people who have multi-sample BAMs but would like to use GVCF mode. Rather than creating single-sample BAMs with PrintReads, one could use the --sample_name argument to HaplotypeCaller to specify the single sample to make calls on
* Completes PT 73075482
Story:
https://www.pivotaltracker.com/story/show/77250524
Changes:
- Remove the annotating code in GeneralPloidyExactAFCalc (GPEAFC) class.
- Added the asAlleleList to GenotypeAlleleCounts class and get (GPEAFC) to use that instead of implementing its own (nicer and more reusable code).
- Removed the explicit addition of AlleleCountBySample fields to the VCF header by the walker initialize
- Added utility methods in Utils to wrap and int[] array into a List<Integer>, and double[] array into a List<Double> efficiently.
Test:
- Added unit-testing for asAlleleList in GenotypeAlleleCountsUnitTest (within testFirst and testNext).
- Added unit-testing for new methods in Utils : asList(int[]) and asList(double[])
- Changed UG General Ploidy test to add explicitly those annotations.
- Non-trivial changes in integration tests involving non-diploid runs (namelly haploid and tetraploid) as they are not showing
those annotations anylonger, so the MD5s have been changed accordingly.
It turns out that there can be some really complex situations even with a single sample where
there are lots of unphasable hets around a hom. Previously we were trying to phase each of the
hets against the hom, but that wasn't correct. Instead we now detect that situation and don't
attempt to phase anything.
Added a unit test to cover this situation.
New annotation for low= and high-confidence de novos (only annotates biallelics)
FamilyLikelihoodsUtils now add joint likelihood and joint posterior annotations
Restrict population priors based on discovered allele count to be valid for 10 or more samples.
Fix for the GeneralPloidyExactAFCalc implementation that was preventing -ploidy != 2 GVCF/BP_RESOLUTION output to work.
Story:
https://www.pivotaltracker.com/story/show/74471252
Tests:
Enabled GVCF tests with ploidy != 2 and other checking for the original ArrayIndexOutOfBounds exception.
VariantAnnotator/FS behavior changes slightly: VA used to output zeros for FS if there was no strand bias info, now skips FS output (but will still show FS in header)
Changes in several walker to use new sample, allele closed lists and new GenotypingEngine constructors signatures
Rebase adoption of new calculation system in walkers
1. It is now turned on by default
2. It now phases homozygous variants
3. Most importantly, it also phases variants that are always on opposite haplotypes
Changed the INFO keys to be PID and PGT, as described in the header.
If any pair of variants occurs on all used haplotypes together, then we propagate that information into the gVCF.
Can be enabled with the --tryPhysicalPhasing argument.
- Read groups that are excluded by sample_name, platform, or read_group arguments no longer appear in the header
- The performance penalty associated with filtering by read group has been essentially eliminated
- Partial fulfillment of PT 73075482
Stories:
https://www.pivotaltracker.com/story/show/70222086https://www.pivotaltracker.com/story/show/67961652
Changes:
Done some changes that I missed in relation with making sure that all PairHMM implentations use the same interface; as a consequence we were running always the standard PairHMM.
Fixed some additional bugs detected when running it on full wgs single sample and exom multi sample data set.
Updated some integration test md5s.
Stories:
https://www.pivotaltracker.com/story/show/70222086https://www.pivotaltracker.com/story/show/67961652
Changes:
Done some changes that I missed in relation with making sure that all PairHMM implentations use the same interface; as a consequence we were running always the standard PairHMM.
Fixed some additional bugs detected when running it on full wgs single sample and exom multi sample data set.
Updated some integration test md5s.
Fixing GraphBased bugs with new master code
Fixed ReadLikelihoods.changeReads difficult to spot bug.
Changed PairHMM interface to fix a bug
Fixed missing changes for various PairHMM implementations to get them to use the new structure.
Fixed various bugs only detectable when running with full sample(s).
Believe to have fixed the lack of annotations in UG runs
Fixed integrationt test MD5s
Updating some md5s
Fixed yet another md5 probably left out by mistake
The array structure should be faster to populate and query (no properly benchmarked) and reduce memory footprint considerably.
Nevertheless removing PairHMM factor (using likelihoodEngine Random) it only achieves a speed up of 15% in some example WGS dataset
i.e. there are other bigger bottle necks in the system. Bamboo tests also seem to run significantly faster with this change.
Stories:
https://www.pivotaltracker.com/story/show/70222086https://www.pivotaltracker.com/story/show/67961652
Changes:
- ReadLikelihoods added to substitute Map<String,PerSampleReadLikelihoods>
- Operation that involve changes in full sets of ReadLikelihoods have been moved into that class.
- Simplified a bit the code that handles the downsampling of reads based on contamination
Caveats:
- Still we keep Map<String,PerReadAlleleLikelihoodsMap> around to pass to annotators..., didn't feel like change the interface of so many public classes in this pull-request.