@MathUtils - added a new method: cumBinomialProbLog which calculates a cumulant from any start point to any end point using the BinomProbabilityLog calculation.
@PoolUtils - added a new utility class specifically for items related to pooled sequencing. A major part of the power calculation is now to calculate powers
independently by read direction. The only method in this class (currently) takes your reads and offsets, and splits them into two groups
by read direction.
@CoverageAndPowerWalker - completely rewritten to split coverage, median qualities, and power by read direction. Makes use of cumBinomialProbLog rather than
doing that calculation within the object itself.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1462 348d0f76-0448-11de-a6fe-93d51630548a
binomialProbabilityLog uses a log-space calculation of the
binomial pmf to avoid the coefficient blowing up and thus
returning Infinity or NaN (or in some very strange cases
-Infinity). The log calculation compares very well, it seems
with our current method. It's in MathUtils but could stand
testing against rigorous truth data before becoming standard.
Added median calculator functions to ListUtils
getQScoreMedian is a new utility I wrote that given reads and
offsets will find the median Q score. While I was at it, I wrote
a similar method, getMedian, which will return the median of any
list of Comparables, independent of initial order. These are in
ListUtils.
Added a new poolseq directory and three walkers
CoverageAndPowerWalker is built on top of the PrintCoverage walker
and prints out the power to detect a mutant allele in a pool of
2*(number of individuals in the pool) alleles. It can be flagged
either to do this by boostrapping, or by pure math with a
probability of error based on the median Q-score. This walker
compiles, runs, and gives quite reasonable outputs that compare
visually well to the power calculation computed by Syzygy.
ArtificialPoolWalker is designed to take multiple single-sample
.bam files and create a (random) artificial pool. The coverage of
that pool is a user-defined proportion of the total coverage over
all of the input files. The output is not only a new .bam file,
but also an auxiliary file that has for each locus, the genotype
of the individuals, the confidence of that call, and that person's
representation in the artificial pool .bam at that locus. This
walker compiles and, uhh, looks pretty. Needs some testing.
AnalyzePowerWalker extends CoverageAndPowerWalker so that it can read previous power
calcuations (e.g. from Syzygy) and print them to the output file as well for direct
downstream comparisons.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1460 348d0f76-0448-11de-a6fe-93d51630548a
Also changed the default logger level from error to warn. Does anyone object? It makes sense for users to always get their warn user statements in the default logging level.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1451 348d0f76-0448-11de-a6fe-93d51630548a
also fixed some verbage in the GAEngine.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1449 348d0f76-0448-11de-a6fe-93d51630548a
Also the VCF version tag seems to be standardized as VCR. Updated the VCF code.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1447 348d0f76-0448-11de-a6fe-93d51630548a
from an array of bases to an object (ReferenceContext), and LocusContext has been renamed to reflect
the fact that it contains contextual information only about the alignments, not the locus in general.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1376 348d0f76-0448-11de-a6fe-93d51630548a
-move out isHet test to GenotypeUtils so all can use it
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1369 348d0f76-0448-11de-a6fe-93d51630548a
- SSG is much simpler now
- GeliText has been added as a GenotypeWriter
- AlleleFrequencyWalker will be deleted when I untangle the AlleleMetric's dependance on it
- GenotypeLikelihoods now implements GenotypeGenerator, but could still use cleanup
There is still a lot more work to do, but this is a good initial check-in.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1335 348d0f76-0448-11de-a6fe-93d51630548a
rough initial implementation, but should provide enough support so that people can stop
creating SAMFileWriters in reduceInit.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1332 348d0f76-0448-11de-a6fe-93d51630548a
-variants need a length method (can't assume it's a SNP)!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1324 348d0f76-0448-11de-a6fe-93d51630548a