gatk-3.8

History

Valentin Ruano-Rubio 0f99778a59 Adding Graph-based likelihood ratio calculation to HC To active this feature add '--likelihoodCalculationEngine GraphBased' to the HC command line. New HC Options (both Advanced and Hidden): ========================================== --likelihoodCalculationEngine PairHMM/GraphBased/Random (default PairHMM) Specifies what engine should be used to generate read vs haplotype likelihoods. PairHMM : standard full-PairHMM approach. GraphBased : using the assembly graph to accelarate the process. Random : generate random likelihoods - used for benchmarking purposes only. --heterogeneousKmerSizeResolution COMBO_MIN/COMBO_MAX/MAX_ONLY/MIN_ONLY (default COMBO_MIN) It idicates how to merge haplotypes produced using different kmerSizes. Only has effect when used in combination with (--likelihooCalculationEngine GraphBased) COMBO_MIN : use the smallest kmerSize with all haplotypes. COMBO_MAX : use the larger kmerSize with all haplotypes. MIN_ONLY : use the smallest kmerSize with haplotypes assembled using it. MAX_ONLY : use the larger kmerSize with haplotypes asembled using it. Major code changes: =================== * Introduce multiple likelihood calculation engines (before there was just one). * Assembly results from different kmerSies are now packed together using the AssemblyResultSet class. * Added yet another PairHMM implementation with a different API in order to spport local PairHMM calculations, (e.g. a segment of the read vs a segment of the haplotype). Major components: ================ * FastLoglessPairHMM: New pair-hmm implemtation using some heuristic to speed up partial PairHMM calculations * GraphBasedLikelihoodCalculationEngine: delegates onto GraphBasedLikelihoodCalculationEngineInstance the exectution of the graph-based likelihood approach. * GraphBasedLikelihoodCalculationEngineInstance: one instance per active-region, implements the graph traversals to calcualte the likelihoods using the graph as an scafold. * HaplotypeGraph: haplotype threading graph where build from the assembly haplotypes. This structure is the one used by GraphBasedLikelihoodCalculationEngineInstance to do its work. * ReadAnchoring and KmerSequenceGraphMap: contain information as how a read map on the HaplotypeGraph that is used by GraphBasedLikelihoodCalcuationEngineInstance to do its work. Remove mergeCommonChains from HaplotypeGraph creation Fixed bamboo issues with HaplotypeGraphUnitTest Fixed probrems with HaplotypeCallerIntegrationTest Fixed issue with GraphLikelihoodVsLoglessAccuracyIntegrationTest Fixed ReadThreadingLikelihoodCalculationEngine issues Moved event-block iteration outside GraphBasedEngineInstance Removed unecessary parameter from ReadAnchoring constructor. Fixed test problem Added a bit more documentation to EventBlockSearchEngine Fixing some private - protected dependency issues Further refactoring making GraphBasedInstance and HaplotypeGraph slimmer. Addressed last pull request commit comments Fixed FastLoglessPairHMM public -> protected dependency Fixed probrem with HaplotypeGraph unit test Adding Graph-based likelihood ratio calculation to HC To active this feature add '--likelihoodCalculationEngine GraphBased' to the HC command line. New HC Options (both Advanced and Hidden): ========================================== --likelihoodCalculationEngine PairHMM/GraphBased/Random (default PairHMM) Specifies what engine should be used to generate read vs haplotype likelihoods. PairHMM : standard full-PairHMM approach. GraphBased : using the assembly graph to accelarate the process. Random : generate random likelihoods - used for benchmarking purposes only. --heterogeneousKmerSizeResolution COMBO_MIN/COMBO_MAX/MAX_ONLY/MIN_ONLY (default COMBO_MIN) It idicates how to merge haplotypes produced using different kmerSizes. Only has effect when used in combination with (--likelihooCalculationEngine GraphBased) COMBO_MIN : use the smallest kmerSize with all haplotypes. COMBO_MAX : use the larger kmerSize with all haplotypes. MIN_ONLY : use the smallest kmerSize with haplotypes assembled using it. MAX_ONLY : use the larger kmerSize with haplotypes asembled using it. Major code changes: =================== * Introduce multiple likelihood calculation engines (before there was just one). * Assembly results from different kmerSies are now packed together using the AssemblyResultSet class. * Added yet another PairHMM implementation with a different API in order to spport local PairHMM calculations, (e.g. a segment of the read vs a segment of the haplotype). Major components: ================ * FastLoglessPairHMM: New pair-hmm implemtation using some heuristic to speed up partial PairHMM calculations * GraphBasedLikelihoodCalculationEngine: delegates onto GraphBasedLikelihoodCalculationEngineInstance the exectution of the graph-based likelihood approach. * GraphBasedLikelihoodCalculationEngineInstance: one instance per active-region, implements the graph traversals to calcualte the likelihoods using the graph as an scafold. * HaplotypeGraph: haplotype threading graph where build from the assembly haplotypes. This structure is the one used by GraphBasedLikelihoodCalculationEngineInstance to do its work. * ReadAnchoring and KmerSequenceGraphMap: contain information as how a read map on the HaplotypeGraph that is used by GraphBasedLikelihoodCalcuationEngineInstance to do its work. Remove mergeCommonChains from HaplotypeGraph creation Fixed bamboo issues with HaplotypeGraphUnitTest Fixed probrems with HaplotypeCallerIntegrationTest Fixed issue with GraphLikelihoodVsLoglessAccuracyIntegrationTest Fixed ReadThreadingLikelihoodCalculationEngine issues Moved event-block iteration outside GraphBasedEngineInstance Removed unecessary parameter from ReadAnchoring constructor. Fixed test problem Added a bit more documentation to EventBlockSearchEngine Fixing some private - protected dependency issues Further refactoring making GraphBasedInstance and HaplotypeGraph slimmer. Addressed last pull request commit comments Fixed FastLoglessPairHMM public -> protected dependency Fixed probrem with HaplotypeGraph unit test		2013-12-02 19:37:19 -05:00
..
annotator	Improvements to the reference model pipeline.	2013-11-01 17:58:25 -04:00
beagle	Simpler FILTER and info field encoding for BeagleOutputToVCF	2013-06-14 15:56:13 -04:00
bqsr	Removed plots generation from the BaseRecalibration software	2013-06-19 14:47:56 -04:00
compression/reducereads	Bug fix for RR: stop (incorrectly) pulling the MQ out of the SAMRecord as a byte instead of an int.	2013-11-27 18:55:03 -05:00
diagnostics	add unit tests	2013-10-04 11:44:07 -04:00
diffengine	Fixed issues raised by Appistry QA (mostly small fixes, corrections & clarifications to GATKDocs)	2013-03-12 10:57:14 -04:00
fasta	Updated all JAVA file licenses accordingly	2013-01-10 17:06:41 -05:00
filters	Don't allow users to specify keys and IDs that contain angle brackets or equals signs (not allowed in VCF spec).	2013-04-05 00:52:32 -04:00
genotyper	Bug fix for something Guillermo added to UG before he left to support calling indels from reduced reads.	2013-11-27 13:54:39 -05:00
haplotypecaller	Adding Graph-based likelihood ratio calculation to HC	2013-12-02 19:37:19 -05:00
indels	Bug fix for something Guillermo added to UG before he left to support calling indels from reduced reads.	2013-11-27 13:54:39 -05:00
phasing	Fixed bug in PhaseByTransmission where it was completely dropping multi-allelic records.	2013-08-21 15:46:57 -04:00
validation	MathUtils.randomSubset() now uses Collections.shuffle() (indirectly, through the other methods	2013-03-29 14:52:10 -04:00
varianteval	adding a check for the UNAVAILABLE case of GenotypeType in CountVariants	2013-08-29 17:27:00 -04:00
variantrecalibration	Various VQSR optimizations in both runtime and accuracy.	2013-11-29 13:04:46 -05:00
variantutils	Merged bug fix from Stable into Unstable	2013-10-10 14:31:33 -04:00