gatk-3.8/protected/java/test/org/broadinstitute/sting/gatk/walkers
Chris Hartl 1f777c4898 Introducing the latest-and-greatest in genotyping: CalculatePosteriors.
CalculatePosteriors enables the user to calculate genotype likelihood posteriors (and set genotypes accordingly) given one or more panels containing allele counts (for instance, calculating NA12878 genotypes based on 1000G EUR frequencies). The uncertainty in allele frequency is modeled by a Dirichlet distribution (parameters being the observed allele counts across each allele), and the genotype state is modeled by assuming independent draws (Hardy-Weinberg Equilibrium). This leads to the Dirichlet-Multinomial distribution.

Currently this is implemented only for ploidy=2. It should be straightforward to generalize. In addition there's a parameter for "EM" that currently does nothing but throw an exception -- another extension of this method is to run an EM over the Maximum A-Posteriori (MAP) allele count in the input sample as follows:
 while not converged:
  * AC = [external AC] + [sample AC]
  * Prior = DirichletMultinomial[AC]
  * Posteriors = [sample GL + Prior]
  * sample AC = MLEAC(Posteriors)

This is more useful for large callsets with small panels than for small callsets with large panels -- the latter of these being the more common usecase.

Fully unit tested.

Reviewer (Eric) jumped in to address many of his own comments plus removed public->protected dependencies.
2013-11-27 13:00:45 -05:00
..
annotator Improvements to the reference model pipeline. 2013-11-01 17:58:25 -04:00
beagle Simpler FILTER and info field encoding for BeagleOutputToVCF 2013-06-14 15:56:13 -04:00
bqsr Removed plots generation from the BaseRecalibration software 2013-06-19 14:47:56 -04:00
compression/reducereads Two reduce reads updates/fixes: 2013-08-01 14:34:59 -04:00
diagnostics add unit tests 2013-10-04 11:44:07 -04:00
diffengine Fixed issues raised by Appistry QA (mostly small fixes, corrections & clarifications to GATKDocs) 2013-03-12 10:57:14 -04:00
fasta Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
filters Don't allow users to specify keys and IDs that contain angle brackets or equals signs (not allowed in VCF spec). 2013-04-05 00:52:32 -04:00
genotyper Created a single sample calling pipeline which leverages the reference model calculation mode of the HaplotypeCaller 2013-09-06 16:56:34 -04:00
haplotypecaller Introducing the latest-and-greatest in genotyping: CalculatePosteriors. 2013-11-27 13:00:45 -05:00
indels Another fix for the Indel Realigner that arises because of secondary alignments. 2013-06-21 16:59:22 -04:00
phasing Fixed bug in PhaseByTransmission where it was completely dropping multi-allelic records. 2013-08-21 15:46:57 -04:00
validation MathUtils.randomSubset() now uses Collections.shuffle() (indirectly, through the other methods 2013-03-29 14:52:10 -04:00
varianteval adding a check for the UNAVAILABLE case of GenotypeType in CountVariants 2013-08-29 17:27:00 -04:00
variantrecalibration Remove org.apache.commons.collections.IteratorUtils dependency from the test suite 2013-08-21 19:44:02 -04:00
variantutils Merged bug fix from Stable into Unstable 2013-10-10 14:31:33 -04:00