gatk-3.8/public/java/src/org/broadinstitute/sting/utils
Valentin Ruano-Rubio 0f99778a59 Adding Graph-based likelihood ratio calculation to HC
To active this feature add '--likelihoodCalculationEngine GraphBased' to the HC command line.

New HC Options (both Advanced and Hidden):
==========================================

  --likelihoodCalculationEngine PairHMM/GraphBased/Random (default PairHMM)

Specifies what engine should be used to generate read vs haplotype likelihoods.

  PairHMM : standard full-PairHMM approach.
  GraphBased : using the assembly graph to accelarate the process.
  Random : generate random likelihoods - used for benchmarking purposes only.

  --heterogeneousKmerSizeResolution COMBO_MIN/COMBO_MAX/MAX_ONLY/MIN_ONLY (default COMBO_MIN)

It idicates how to merge haplotypes produced using different kmerSizes.
Only has effect when used in combination with (--likelihooCalculationEngine GraphBased)

  COMBO_MIN : use the smallest kmerSize with all haplotypes.
  COMBO_MAX : use the larger kmerSize with all haplotypes.
  MIN_ONLY : use the smallest kmerSize with haplotypes assembled using it.
  MAX_ONLY : use the larger kmerSize with haplotypes asembled using it.

Major code changes:
===================

 * Introduce multiple likelihood calculation engines (before there was just one).

 * Assembly results from different kmerSies are now packed together using the AssemblyResultSet class.

 * Added yet another PairHMM implementation with a different API in order to spport
   local PairHMM calculations, (e.g. a segment of the read vs a segment of the haplotype).

Major components:
================

 * FastLoglessPairHMM: New pair-hmm implemtation using some heuristic to speed up partial PairHMM calculations

 * GraphBasedLikelihoodCalculationEngine: delegates onto GraphBasedLikelihoodCalculationEngineInstance the exectution
     of the graph-based likelihood approach.

 * GraphBasedLikelihoodCalculationEngineInstance: one instance per active-region, implements the graph traversals
     to calcualte the likelihoods using the graph as an scafold.

 * HaplotypeGraph: haplotype threading graph where build from the assembly haplotypes. This structure is the one
     used by GraphBasedLikelihoodCalculationEngineInstance to do its work.

 * ReadAnchoring and KmerSequenceGraphMap: contain information as how a read map on the HaplotypeGraph that is
     used by GraphBasedLikelihoodCalcuationEngineInstance to do its work.

Remove mergeCommonChains from HaplotypeGraph creation

Fixed bamboo issues with HaplotypeGraphUnitTest

Fixed probrems with HaplotypeCallerIntegrationTest

Fixed issue with GraphLikelihoodVsLoglessAccuracyIntegrationTest

Fixed ReadThreadingLikelihoodCalculationEngine issues

Moved event-block iteration outside GraphBased*EngineInstance

Removed unecessary parameter from ReadAnchoring constructor.
Fixed test problem

Added a bit more documentation to EventBlockSearchEngine

Fixing some private - protected dependency issues

Further refactoring making GraphBased*Instance and HaplotypeGraph slimmer. Addressed last pull request commit comments

Fixed FastLoglessPairHMM public -> protected dependency

Fixed probrem with HaplotypeGraph unit test

Adding Graph-based likelihood ratio calculation to HC

  To active this feature add '--likelihoodCalculationEngine GraphBased' to the HC command line.

New HC Options (both Advanced and Hidden):
==========================================

  --likelihoodCalculationEngine PairHMM/GraphBased/Random (default PairHMM)

Specifies what engine should be used to generate read vs haplotype likelihoods.

  PairHMM : standard full-PairHMM approach.
  GraphBased : using the assembly graph to accelarate the process.
  Random : generate random likelihoods - used for benchmarking purposes only.

  --heterogeneousKmerSizeResolution COMBO_MIN/COMBO_MAX/MAX_ONLY/MIN_ONLY (default COMBO_MIN)

It idicates how to merge haplotypes produced using different kmerSizes.
Only has effect when used in combination with (--likelihooCalculationEngine GraphBased)

  COMBO_MIN : use the smallest kmerSize with all haplotypes.
  COMBO_MAX : use the larger kmerSize with all haplotypes.
  MIN_ONLY : use the smallest kmerSize with haplotypes assembled using it.
  MAX_ONLY : use the larger kmerSize with haplotypes asembled using it.

Major code changes:
===================

 * Introduce multiple likelihood calculation engines (before there was just one).

 * Assembly results from different kmerSies are now packed together using the AssemblyResultSet class.

 * Added yet another PairHMM implementation with a different API in order to spport
   local PairHMM calculations, (e.g. a segment of the read vs a segment of the haplotype).

Major components:
================

 * FastLoglessPairHMM: New pair-hmm implemtation using some heuristic to speed up partial PairHMM calculations

 * GraphBasedLikelihoodCalculationEngine: delegates onto GraphBasedLikelihoodCalculationEngineInstance the exectution
     of the graph-based likelihood approach.

 * GraphBasedLikelihoodCalculationEngineInstance: one instance per active-region, implements the graph traversals
     to calcualte the likelihoods using the graph as an scafold.

 * HaplotypeGraph: haplotype threading graph where build from the assembly haplotypes. This structure is the one
     used by GraphBasedLikelihoodCalculationEngineInstance to do its work.

 * ReadAnchoring and KmerSequenceGraphMap: contain information as how a read map on the HaplotypeGraph that is
     used by GraphBasedLikelihoodCalcuationEngineInstance to do its work.

Remove mergeCommonChains from HaplotypeGraph creation

Fixed bamboo issues with HaplotypeGraphUnitTest

Fixed probrems with HaplotypeCallerIntegrationTest

Fixed issue with GraphLikelihoodVsLoglessAccuracyIntegrationTest

Fixed ReadThreadingLikelihoodCalculationEngine issues

Moved event-block iteration outside GraphBased*EngineInstance

Removed unecessary parameter from ReadAnchoring constructor.
Fixed test problem

Added a bit more documentation to EventBlockSearchEngine

Fixing some private - protected dependency issues

Further refactoring making GraphBased*Instance and HaplotypeGraph slimmer. Addressed last pull request commit comments

Fixed FastLoglessPairHMM public -> protected dependency

Fixed probrem with HaplotypeGraph unit test
2013-12-02 19:37:19 -05:00
..
R R issue in Queue fixed. 2013-01-28 14:42:20 -05:00
activeregion Adding Graph-based likelihood ratio calculation to HC 2013-12-02 19:37:19 -05:00
analysis Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
baq Fixing BQSR/BAQ bug: 2013-01-31 11:03:17 -05:00
classloader Enable convenient display of diff engine output in Bamboo, plus misc. minor test-related improvements 2013-05-10 19:00:33 -04:00
clipping Bugfix for HaplotypeCaller error: Only one of refStart or refStop must be < 0, not both 2013-06-04 10:33:46 -04:00
codecs Adaptations to accomodate Tribble API changes, comprising mostly of the following. 2013-08-19 15:52:47 -04:00
collections Replace uses of NestedHashMap with NestedIntegerArray. 2013-02-27 14:03:39 -05:00
crypt Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
duplicates Cleanup and unit tests for QualityUtils 2013-02-16 07:31:37 -08:00
exceptions Removed plots generation from the BaseRecalibration software 2013-06-19 14:47:56 -04:00
fasta Move BaseUtils back to the GATK by request, along with associated utility methods 2013-01-30 13:09:44 -05:00
file Detect stuck lock-acquisition calls, and disable file locking for tests 2013-04-24 22:49:02 -04:00
fragments A whole slew of improvements to the Haplotype Caller and related code. 2013-07-12 10:09:10 -04:00
genotyper Adding Graph-based likelihood ratio calculation to HC 2013-12-02 19:37:19 -05:00
haplotype Adding Graph-based likelihood ratio calculation to HC 2013-12-02 19:37:19 -05:00
help More detailed labels for arguments in the gakdocs (requested by David) 2013-08-16 14:25:53 -04:00
instrumentation Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
interval Move some VCF/VariantContext methods back to the GATK based on feedback 2013-01-29 16:56:55 -05:00
io Encrypt GATK AWS keys using the GATK private key, and decrypt as needed as a resource when uploading to AWS logs 2013-01-30 16:42:23 -05:00
locusiterator Several improvements to AssessNA12878 and KB 2013-08-07 08:08:37 -04:00
nanoScheduler Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
pairhmm Adding Graph-based likelihood ratio calculation to HC 2013-12-02 19:37:19 -05:00
pileup Created a single sample calling pipeline which leverages the reference model calculation mode of the HaplotypeCaller 2013-09-06 16:56:34 -04:00
pileup2 Reorganized the codebase beneath top-level public and private directories, 2011-06-28 06:55:19 -04:00
progressmeter Subshard timeouts in the GATK 2013-05-15 07:00:39 -04:00
recalibration The Bayesian calculation of Qemp in the BQSR is now hierarchical. This fixes issues in which the covariate bins were very sparse and the prior estimate being used was the original quality score. This resulted in large correction factors for each covariate which breaks the equation. There is also now a new option, qlobalQScorePrior, which can be used to ignore the given (very high) quality scores and instead use this value as the prior. 2013-01-28 15:56:33 -05:00
runtime Remove com.sun.javadoc.* dependencies from the GATK proper, and isolate them for doclet use only 2013-06-13 15:52:41 -04:00
sam Set SAMFileWriter to create index in ReadUtils to fix SplitSamFile issue 2013-11-26 15:54:47 -05:00
smithwaterman Created a single sample calling pipeline which leverages the reference model calculation mode of the HaplotypeCaller 2013-09-06 16:56:34 -04:00
text Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
threading Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
variant Introducing the latest-and-greatest in genotyping: CalculatePosteriors. 2013-11-27 13:00:45 -05:00
wiggle Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
AutoFormattingTime.java Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
BaseUtils.java Adding Graph-based likelihood ratio calculation to HC 2013-12-02 19:37:19 -05:00
BitSetUtils.java Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
ContigComparator.java Generalize and fixup ContigComparator 2013-02-09 09:52:13 -05:00
DeprecatedToolChecks.java Added deprecation notice for SomaticIndelDetector 2013-07-26 15:51:30 -04:00
GenomeLoc.java Bugfix for incPos in GenomeLoc 2013-07-02 15:46:49 -04:00
GenomeLocParser.java Refactoring and unit testing GenomeLocParser 2013-01-30 09:47:47 -05:00
GenomeLocSortedSet.java Fixed the add functionality of GenomeLocSortedSet. 2013-02-28 23:31:00 -05:00
HasGenomeLocation.java Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
HeapSizeMonitor.java Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
IndelUtils.java Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
LRUCache.java Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
MRUCachingSAMSequenceDictionary.java Refactoring and unit testing GenomeLocParser 2013-01-30 09:47:47 -05:00
MannWhitneyU.java Move some VCF/VariantContext methods back to the GATK based on feedback 2013-01-29 16:56:55 -05:00
MathUtils.java Adding Graph-based likelihood ratio calculation to HC 2013-12-02 19:37:19 -05:00
Median.java Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
MendelianViolation.java Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
MultiThreadedErrorTracker.java Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
NGSPlatform.java Trivial BQSR bug fixes and improvement 2013-04-11 17:08:35 -04:00
PathUtils.java Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
QualityUtils.java Adding Graph-based likelihood ratio calculation to HC 2013-12-02 19:37:19 -05:00
SampleUtils.java Move some VCF/VariantContext methods back to the GATK based on feedback 2013-01-29 16:56:55 -05:00
SequenceDictionaryUtils.java Sequence dictionary validation: detect problematic contig indexing differences 2013-02-25 11:14:22 -05:00
SimpleTimer.java Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
UnvalidatingGenomeLoc.java Refactoring the SimpleGenomeLoc into the now public utility UnvalidatingGenomeLoc and the RR-specific FinishedGenomeLoc. 2013-01-30 10:45:29 -05:00
Utils.java Adding Graph-based likelihood ratio calculation to HC 2013-12-02 19:37:19 -05:00
package-info.java Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00