gatk3的最后一个经典版本3.8
 
 
 
 
Go to file
delangel 30fae5cf18 Major redo of exact AF computation for UnifiedGenotyperV2. Fact of life is, there's no way we can compute an exact QUAL field and keep performing the AF computation in linear probability space. In good sites with lots of samples, the ratio of Pr(AC=K*|D) to Pr(AC=0|D) can be 10^1500 or some ridiculous large number like that, which no double can represent. So, we abandon probablity space and work now in log likelihood space, which has several major repercussions:
a) Sites were numerically well behaved now, but another hard fact of life is that the AF iteration is defined in linear Pr space, not in log likelihood space, and the math doesn't work out in log space. So, we need to convert back and forth from lin to log space.
b) As a consequence of a), the code got a major slowdown, and calling the 629 samples was about 15 times slower than before (sic).
c) To solve b), log10 of integers are now cached at init, and numerical approximations are now made. Most importantly, I'm using the approximation that log(exp(a) + exp(b)) ~= max(a,b) which seems almost inconsequential in practical performance but reduces computation time to what it was before. More detailes analyses are forthcoming. This approximation can be refined further on to avoid expensive log-exp conversions if further profiling and analysis deems it necessary.

Also, two other issues were solved:
a) Strand bias computation was actually wrong in the case where the optimal AC was bigger than max(forward reads,reverse reads). Now the code is exactly as buggy as the grid search model (all bugs are equal, but some are more equal than others)
b) Genotype likelihoods are now computed in a better way and if a likelihood < 0 we don't just cap to 0 but do something a bit smarter.




git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4600 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-31 01:26:04 +00:00
R Takes a list of BAMs, looks up the read group information in the sequencing platform's SQUID database, and computes the tearsheet stats. Also takes the VariantEval output (R format) and outputs the variant stats and some plots for the tearsheet. This script requires the gsalib library to be in the R library path (add the line '.libPaths('/path/to/Sting/R/')' to your ~/.Rprofile). 2010-10-27 19:06:22 +00:00
archive Fisher exact makes a return. Seems to be working properly. Current tagged as a work in progress. Needs to take the filtered context to be truly correct. 2010-10-22 20:35:53 +00:00
c Reduce file handle usage. 2010-01-05 18:03:01 +00:00
doc removing the custom reflections library from the libs, and adding a release version. Hopefully this will fix the problem Menachem has been seeing with random JVM crashes. Also 2010-08-19 00:42:37 +00:00
java Major redo of exact AF computation for UnifiedGenotyperV2. Fact of life is, there's no way we can compute an exact QUAL field and keep performing the AF computation in linear probability space. In good sites with lots of samples, the ratio of Pr(AC=K*|D) to Pr(AC=0|D) can be 10^1500 or some ridiculous large number like that, which no double can represent. So, we abandon probablity space and work now in log likelihood space, which has several major repercussions: 2010-10-31 01:26:04 +00:00
matlab Another matlab script -- this time for making power and coverage plots over a specific gene region. Lots of fun file reading, string manipulation, and exploration of the set() function 2009-11-30 20:02:25 +00:00
packages Include all standard annotations 2010-10-11 20:52:08 +00:00
perl Also removed contig intervals from the pipeline sanity check perl scsript. 2010-10-21 11:23:34 +00:00
python Commenting out annoying print statement 2010-10-24 11:55:26 +00:00
ruby accidentally commited an old tool 2010-08-25 15:42:02 +00:00
scala Now that the user is required to set the java temp directory, it is safer for the LsfJobRunner to write to the java temp directory instead of the command directory. 2010-10-28 15:00:21 +00:00
settings Added status email support with -statusTo. Will send emails on failure of an individual function or success/failure of the whole pipeline. 2010-10-14 15:58:52 +00:00
shell Inprocess functions by default now log what output files they are running for. 2010-10-07 19:08:02 +00:00
testdata Fixes for example bam files 2010-10-05 18:43:02 +00:00
LICENSE Adding a license to the root directory in case BOSC checks for one. Has the 2010-04-20 16:04:29 +00:00
build.xml Cleanup for multithreading memory leak during integration tests...unregister MXBean at end 2010-10-28 18:37:42 +00:00
ivy.xml - After removing special code for intervals, instead of being of type File they are generated as List[File]. Changed previous checkin that was appending to this list and instead assigning a singleton list. 2010-10-21 06:37:28 +00:00