gatk-3.8

gatk3的最后一个经典版本3.8

Go to file

delangel 30fae5cf18 Major redo of exact AF computation for UnifiedGenotyperV2. Fact of life is, there's no way we can compute an exact QUAL field and keep performing the AF computation in linear probability space. In good sites with lots of samples, the ratio of Pr(AC=K*\|D) to Pr(AC=0\|D) can be 10^1500 or some ridiculous large number like that, which no double can represent. So, we abandon probablity space and work now in log likelihood space, which has several major repercussions: a) Sites were numerically well behaved now, but another hard fact of life is that the AF iteration is defined in linear Pr space, not in log likelihood space, and the math doesn't work out in log space. So, we need to convert back and forth from lin to log space. b) As a consequence of a), the code got a major slowdown, and calling the 629 samples was about 15 times slower than before (sic). c) To solve b), log10 of integers are now cached at init, and numerical approximations are now made. Most importantly, I'm using the approximation that log(exp(a) + exp(b)) ~= max(a,b) which seems almost inconsequential in practical performance but reduces computation time to what it was before. More detailes analyses are forthcoming. This approximation can be refined further on to avoid expensive log-exp conversions if further profiling and analysis deems it necessary. Also, two other issues were solved: a) Strand bias computation was actually wrong in the case where the optimal AC was bigger than max(forward reads,reverse reads). Now the code is exactly as buggy as the grid search model (all bugs are equal, but some are more equal than others) b) Genotype likelihoods are now computed in a better way and if a likelihood < 0 we don't just cap to 0 but do something a bit smarter. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4600 348d0f76-0448-11de-a6fe-93d51630548a		2010-10-31 01:26:04 +00:00
R	Takes a list of BAMs, looks up the read group information in the sequencing platform's SQUID database, and computes the tearsheet stats. Also takes the VariantEval output (R format) and outputs the variant stats and some plots for the tearsheet. This script requires the gsalib library to be in the R library path (add the line '.libPaths('/path/to/Sting/R/')' to your ~/.Rprofile).	2010-10-27 19:06:22 +00:00
archive	Fisher exact makes a return. Seems to be working properly. Current tagged as a work in progress. Needs to take the filtered context to be truly correct.	2010-10-22 20:35:53 +00:00
c	Reduce file handle usage.	2010-01-05 18:03:01 +00:00
doc	removing the custom reflections library from the libs, and adding a release version. Hopefully this will fix the problem Menachem has been seeing with random JVM crashes. Also	2010-08-19 00:42:37 +00:00
java	Major redo of exact AF computation for UnifiedGenotyperV2. Fact of life is, there's no way we can compute an exact QUAL field and keep performing the AF computation in linear probability space. In good sites with lots of samples, the ratio of Pr(AC=K*\|D) to Pr(AC=0\|D) can be 10^1500 or some ridiculous large number like that, which no double can represent. So, we abandon probablity space and work now in log likelihood space, which has several major repercussions:	2010-10-31 01:26:04 +00:00
matlab	Another matlab script -- this time for making power and coverage plots over a specific gene region. Lots of fun file reading, string manipulation, and exploration of the set() function	2009-11-30 20:02:25 +00:00
packages	Include all standard annotations	2010-10-11 20:52:08 +00:00
perl	Also removed contig intervals from the pipeline sanity check perl scsript.	2010-10-21 11:23:34 +00:00
python	Commenting out annoying print statement	2010-10-24 11:55:26 +00:00
ruby	accidentally commited an old tool	2010-08-25 15:42:02 +00:00
scala	Now that the user is required to set the java temp directory, it is safer for the LsfJobRunner to write to the java temp directory instead of the command directory.	2010-10-28 15:00:21 +00:00
settings	Added status email support with -statusTo. Will send emails on failure of an individual function or success/failure of the whole pipeline.	2010-10-14 15:58:52 +00:00
shell	Inprocess functions by default now log what output files they are running for.	2010-10-07 19:08:02 +00:00
testdata	Fixes for example bam files	2010-10-05 18:43:02 +00:00
LICENSE	Adding a license to the root directory in case BOSC checks for one. Has the	2010-04-20 16:04:29 +00:00
build.xml	Cleanup for multithreading memory leak during integration tests...unregister MXBean at end	2010-10-28 18:37:42 +00:00
ivy.xml	- After removing special code for intervals, instead of being of type File they are generated as List[File]. Changed previous checkin that was appending to this list and instead assigning a singleton list.	2010-10-21 06:37:28 +00:00