The array structure should be faster to populate and query (no properly benchmarked) and reduce memory footprint considerably.
Nevertheless removing PairHMM factor (using likelihoodEngine Random) it only achieves a speed up of 15% in some example WGS dataset
i.e. there are other bigger bottle necks in the system. Bamboo tests also seem to run significantly faster with this change.
Stories:
https://www.pivotaltracker.com/story/show/70222086https://www.pivotaltracker.com/story/show/67961652
Changes:
- ReadLikelihoods added to substitute Map<String,PerSampleReadLikelihoods>
- Operation that involve changes in full sets of ReadLikelihoods have been moved into that class.
- Simplified a bit the code that handles the downsampling of reads based on contamination
Caveats:
- Still we keep Map<String,PerReadAlleleLikelihoodsMap> around to pass to annotators..., didn't feel like change the interface of so many public classes in this pull-request.
In particular, it was possible to specify arguments for Files or Compound types without values
Added a special "none" value for annotations, since a bare "-A" is no longer allowed
Delivers PT 71792842 and 59360374
ValidationStringency was moved from htsjdk.samtools.SAMFileReader to htsjdk.samtools
samtools find BAM index file method was also moved (and made public!)
Story:
https://www.pivotaltracker.com/story/show/73440292
Changes:
- Just add the conditional in HaplotypeCaller#initialize
Testing:
- Nothing added, checked locally, trivial change that would eventually be removed anyway.
Story:
https://www.pivotaltracker.com/story/show/75028590
Changes:
Added the possibility of indicating the genotype type to consider (with argument [-gtType TYPE]*, where TYPE is HET, HOM_REF or HOM_VAR)
Removed conditional Het evaluation based on input ploidy; now you need to use -gtType explictly.
Tests:
Added integration tests to check on new argument (-gtType) behaviour in AssessNA12878KnowledgeBaseTest
Don't expand out source nodes for tail merging, since that's a head merging action only.
This shows up as a bug only because we now allow merging tails against non-reference paths.
- Edited intervals merging docs for correctness & clarity
- Edited VQSR arg docs and made mode required (+added -mode SNP to VQSR tests)
- Moved PaperGenotyper to Toy Walkers to declutter the actually useful docs
- Moved GenotypeGVCFs to Variant Discovery category and clarified a few points
- Clarified that the -resource argument depends on using the -V:tag format
- Clarified how the pcr indel model works
- Added caveat for -U ALLOW_N_CIGAR_READS
- Added MathJax support for displaying equations in GATKDocs
- Updated HC example commands and caveats
This is useful for e.g. cases where there are SNPs on insertions. Before tails were forced to be merged
(incorrectly) only to a reference node, but now they can be merged to any path in the graph from which they
directly branch.
Also, I've transferred over Ryan's code to refuse to process kmer sizes such that there are non-unique kmers
in the reference sequence with them.