gatk-3.8/public/java/test/org/broadinstitute/sting/utils
Valentin Ruano-Rubio 0f99778a59 Adding Graph-based likelihood ratio calculation to HC
To active this feature add '--likelihoodCalculationEngine GraphBased' to the HC command line.

New HC Options (both Advanced and Hidden):
==========================================

  --likelihoodCalculationEngine PairHMM/GraphBased/Random (default PairHMM)

Specifies what engine should be used to generate read vs haplotype likelihoods.

  PairHMM : standard full-PairHMM approach.
  GraphBased : using the assembly graph to accelarate the process.
  Random : generate random likelihoods - used for benchmarking purposes only.

  --heterogeneousKmerSizeResolution COMBO_MIN/COMBO_MAX/MAX_ONLY/MIN_ONLY (default COMBO_MIN)

It idicates how to merge haplotypes produced using different kmerSizes.
Only has effect when used in combination with (--likelihooCalculationEngine GraphBased)

  COMBO_MIN : use the smallest kmerSize with all haplotypes.
  COMBO_MAX : use the larger kmerSize with all haplotypes.
  MIN_ONLY : use the smallest kmerSize with haplotypes assembled using it.
  MAX_ONLY : use the larger kmerSize with haplotypes asembled using it.

Major code changes:
===================

 * Introduce multiple likelihood calculation engines (before there was just one).

 * Assembly results from different kmerSies are now packed together using the AssemblyResultSet class.

 * Added yet another PairHMM implementation with a different API in order to spport
   local PairHMM calculations, (e.g. a segment of the read vs a segment of the haplotype).

Major components:
================

 * FastLoglessPairHMM: New pair-hmm implemtation using some heuristic to speed up partial PairHMM calculations

 * GraphBasedLikelihoodCalculationEngine: delegates onto GraphBasedLikelihoodCalculationEngineInstance the exectution
     of the graph-based likelihood approach.

 * GraphBasedLikelihoodCalculationEngineInstance: one instance per active-region, implements the graph traversals
     to calcualte the likelihoods using the graph as an scafold.

 * HaplotypeGraph: haplotype threading graph where build from the assembly haplotypes. This structure is the one
     used by GraphBasedLikelihoodCalculationEngineInstance to do its work.

 * ReadAnchoring and KmerSequenceGraphMap: contain information as how a read map on the HaplotypeGraph that is
     used by GraphBasedLikelihoodCalcuationEngineInstance to do its work.

Remove mergeCommonChains from HaplotypeGraph creation

Fixed bamboo issues with HaplotypeGraphUnitTest

Fixed probrems with HaplotypeCallerIntegrationTest

Fixed issue with GraphLikelihoodVsLoglessAccuracyIntegrationTest

Fixed ReadThreadingLikelihoodCalculationEngine issues

Moved event-block iteration outside GraphBased*EngineInstance

Removed unecessary parameter from ReadAnchoring constructor.
Fixed test problem

Added a bit more documentation to EventBlockSearchEngine

Fixing some private - protected dependency issues

Further refactoring making GraphBased*Instance and HaplotypeGraph slimmer. Addressed last pull request commit comments

Fixed FastLoglessPairHMM public -> protected dependency

Fixed probrem with HaplotypeGraph unit test

Adding Graph-based likelihood ratio calculation to HC

  To active this feature add '--likelihoodCalculationEngine GraphBased' to the HC command line.

New HC Options (both Advanced and Hidden):
==========================================

  --likelihoodCalculationEngine PairHMM/GraphBased/Random (default PairHMM)

Specifies what engine should be used to generate read vs haplotype likelihoods.

  PairHMM : standard full-PairHMM approach.
  GraphBased : using the assembly graph to accelarate the process.
  Random : generate random likelihoods - used for benchmarking purposes only.

  --heterogeneousKmerSizeResolution COMBO_MIN/COMBO_MAX/MAX_ONLY/MIN_ONLY (default COMBO_MIN)

It idicates how to merge haplotypes produced using different kmerSizes.
Only has effect when used in combination with (--likelihooCalculationEngine GraphBased)

  COMBO_MIN : use the smallest kmerSize with all haplotypes.
  COMBO_MAX : use the larger kmerSize with all haplotypes.
  MIN_ONLY : use the smallest kmerSize with haplotypes assembled using it.
  MAX_ONLY : use the larger kmerSize with haplotypes asembled using it.

Major code changes:
===================

 * Introduce multiple likelihood calculation engines (before there was just one).

 * Assembly results from different kmerSies are now packed together using the AssemblyResultSet class.

 * Added yet another PairHMM implementation with a different API in order to spport
   local PairHMM calculations, (e.g. a segment of the read vs a segment of the haplotype).

Major components:
================

 * FastLoglessPairHMM: New pair-hmm implemtation using some heuristic to speed up partial PairHMM calculations

 * GraphBasedLikelihoodCalculationEngine: delegates onto GraphBasedLikelihoodCalculationEngineInstance the exectution
     of the graph-based likelihood approach.

 * GraphBasedLikelihoodCalculationEngineInstance: one instance per active-region, implements the graph traversals
     to calcualte the likelihoods using the graph as an scafold.

 * HaplotypeGraph: haplotype threading graph where build from the assembly haplotypes. This structure is the one
     used by GraphBasedLikelihoodCalculationEngineInstance to do its work.

 * ReadAnchoring and KmerSequenceGraphMap: contain information as how a read map on the HaplotypeGraph that is
     used by GraphBasedLikelihoodCalcuationEngineInstance to do its work.

Remove mergeCommonChains from HaplotypeGraph creation

Fixed bamboo issues with HaplotypeGraphUnitTest

Fixed probrems with HaplotypeCallerIntegrationTest

Fixed issue with GraphLikelihoodVsLoglessAccuracyIntegrationTest

Fixed ReadThreadingLikelihoodCalculationEngine issues

Moved event-block iteration outside GraphBased*EngineInstance

Removed unecessary parameter from ReadAnchoring constructor.
Fixed test problem

Added a bit more documentation to EventBlockSearchEngine

Fixing some private - protected dependency issues

Further refactoring making GraphBased*Instance and HaplotypeGraph slimmer. Addressed last pull request commit comments

Fixed FastLoglessPairHMM public -> protected dependency

Fixed probrem with HaplotypeGraph unit test
2013-12-02 19:37:19 -05:00
..
R Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
activeregion Adaptations to accomodate Tribble API changes, comprising mostly of the following. 2013-08-19 15:52:47 -04:00
baq Fixing BQSR/BAQ bug: 2013-01-31 11:03:17 -05:00
classloader Enable convenient display of diff engine output in Bamboo, plus misc. minor test-related improvements 2013-05-10 19:00:33 -04:00
clipping Bugfix for HaplotypeCaller error: Only one of refStart or refStop must be < 0, not both 2013-06-04 10:33:46 -04:00
codecs/hapmap Adaptations to accomodate Tribble API changes, comprising mostly of the following. 2013-08-19 15:52:47 -04:00
collections Fixing license on Yossi's file 2013-02-05 11:14:43 -05:00
crypt Update expected test output for Java 7 2013-05-01 16:18:01 -04:00
fasta Move BaseUtils back to the GATK by request, along with associated utility methods 2013-01-30 13:09:44 -05:00
file Detect stuck lock-acquisition calls, and disable file locking for tests 2013-04-24 22:49:02 -04:00
fragments A whole slew of improvements to the Haplotype Caller and related code. 2013-07-12 10:09:10 -04:00
haplotype Major improvements to HC that trims down active regions before genotyping 2013-04-08 12:47:49 -04:00
interval Intervals: fix bug where we could fail to find the intersection of unsorted/missorted interval lists 2013-04-02 14:01:52 -04:00
io Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
locusiterator Working version of HaplotypeCaller ReferenceConfidenceModel that accounts for indels as well as SNP confidences 2013-07-02 15:46:38 -04:00
nanoScheduler Further tweaking of test timeouts 2013-03-15 14:49:21 -04:00
pileup Fixing ReadBackedPileup to represent mapping qualities as ints, not (signed) bytes. 2013-07-23 23:47:15 -04:00
progressmeter Subshard timeouts in the GATK 2013-05-15 07:00:39 -04:00
recalibration Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
report Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
runtime Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
sam Remove org.apache.commons.collections.IteratorUtils dependency from the test suite 2013-08-21 19:44:02 -04:00
smithwaterman New faster Smith-Waterman implementation that is edge greedy and assumes that ref and haplotype have same global start/end points. 2013-05-13 09:36:39 -04:00
text Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
threading Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
variant Introducing the latest-and-greatest in genotyping: CalculatePosteriors. 2013-11-27 13:00:45 -05:00
AutoFormattingTimeUnitTest.java AutoFormattingTimeUnitTest should be in utils 2013-01-30 09:47:47 -05:00
BaseUtilsUnitTest.java More aggressive checking of AWS key quality upon startup in the GATK 2013-01-31 09:08:38 -05:00
BitSetUtilsUnitTest.java Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
GenomeLocParserBenchmark.java Optimize GenomeLocParser.createGenomeLoc 2013-01-30 09:47:47 -05:00
GenomeLocParserUnitTest.java Adding Graph-based likelihood ratio calculation to HC 2013-12-02 19:37:19 -05:00
GenomeLocSortedSetUnitTest.java Fixed the add functionality of GenomeLocSortedSet. 2013-02-28 23:31:00 -05:00
GenomeLocUnitTest.java Added distance across contigs calculation to GenomeLocs 2013-02-07 16:31:41 -05:00
MRUCachingSAMSequencingDictionaryUnitTest.java Refactoring and unit testing GenomeLocParser 2013-01-30 09:47:47 -05:00
MWUnitTest.java Move some VCF/VariantContext methods back to the GATK based on feedback 2013-01-29 16:56:55 -05:00
MathUtilsUnitTest.java Introducing the latest-and-greatest in genotyping: CalculatePosteriors. 2013-11-27 13:00:45 -05:00
MedianUnitTest.java Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
NGSPlatformUnitTest.java Expand NGSPlatform to meet SAM 1.4 spec, with full unit tests 2013-02-09 11:16:21 -05:00
PathUtilsUnitTest.java Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
QualityUtilsUnitTest.java Final edge case bug fixes to QualityUtil routines 2013-02-16 07:31:38 -08:00
SequenceDictionaryUtilsUnitTest.java Sequence dictionary validation: detect problematic contig indexing differences 2013-02-25 11:14:22 -05:00
SimpleTimerUnitTest.java Fix tests that were consistently or intermittently failing when run in parallel on the farm 2013-03-06 13:56:54 -05:00
UtilsUnitTest.java New faster Smith-Waterman implementation that is edge greedy and assumes that ref and haplotype have same global start/end points. 2013-05-13 09:36:39 -04:00