gatk3的最后一个经典版本3.8
 
 
 
 
Go to file
kiran fdc514ded3 Intermediate commit for VariantEval 3.0. Among the changes:
* Stratifications (by comp rod, by eval rod, novelty, filter status, etc.) have been generalized.  They are very symmetric with evaluators now.  Each stratification can have multiple states (e.g. known, novel, all).  New stratifications can be added and optionally applied.  Some new stratifications include:
  - by sample
  - by functional class
  - by CpG status

* Output is to a single file in GATKReport format, rather than having the options of CSV, R, table, etc.

* Rather than needing to state up front that the allowable variant type is a SNP or an indel, each eval record is inspected and the appropriate record type is fetched from the comp track.  (This will require a bit more testing...)

* Evaluation context (basically a single row in a VariantEval report) generation and retrieval has been overhauled.  Now, every possible configuration of stratification state is generated recursively and stored in a HashMap.  The key of the HashMap is a key that represents that exact state configuration.  When examining a comp track and eval track, this key is computed based on the data, providing easy lookup for the appropriate evaluation context.  When there are only a handful of stratification configurations, this isn't a big deal.  But when operating on a file with hundreds of samples, multipled by 3 states for novelty, 3 states for filtration, 3 states for CpG status, etc., it becomes a very big deal.

There are still some known issues:
* When the per-sample stratification is turned off, things are getting overcounted (too many variants are showing up when compared to the VariantEval 2.0 code).  It's probably because I break out the VariantContext by sample even when not necessary, and those irrelevant contexts are still being counted.  Or my recursion is overaggressively creating evaluation contexts, and they all get added up in a weird way.  But that's why I'm committing now - so I can track down this issue without losing my work so far.

* The Jexl expressions are sometimes throwing an exception that I don't yet understand (they complain of an incorrect specification on the command-line... *after* the program has made it through a few thousand records.

* The request to have evaluations be smart enough to reject certain stratification states is not implemented yet.

There's still some work to do before I can replace VariantEval 2.0 with VariantEval 3.0, but feel free to take a look.  I'd love comments on the new code.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4946 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-06 15:20:24 +00:00
R Now uses PNGs and a very high downsampling value to more clearly display the information 2011-01-03 13:57:51 +00:00
archive Eric broke the build. Eric broke the build. 2010-12-15 17:01:38 +00:00
c Bug fixes for the bwa aligner and changes to support compiling against newer releases of the bwa code base. 2010-12-17 14:49:15 +00:00
doc removing the custom reflections library from the libs, and adding a release version. Hopefully this will fix the problem Menachem has been seeing with random JVM crashes. Also 2010-08-19 00:42:37 +00:00
java Intermediate commit for VariantEval 3.0. Among the changes: 2011-01-06 15:20:24 +00:00
matlab Another matlab script -- this time for making power and coverage plots over a specific gene region. Lots of fun file reading, string manipulation, and exploration of the set() function 2009-11-30 20:02:25 +00:00
packages Be certain to include core GATK reference metadata features and codecs in the 2010-12-21 21:30:07 +00:00
perl Fixed bug that required users to use the recordOriginalLocation option 2011-01-05 23:12:14 +00:00
python no longer prints unnecessary table conversion failures that muck up emails. Run script now uses du not ls to display archive size 2011-01-02 13:27:37 +00:00
ruby accidentally commited an old tool 2010-08-25 15:42:02 +00:00
scala Implemented a new argument (-DQS --defaultQualityScore) that allows GATK to deal with BAM files missing quality scores. If a value is specified, all reads are filled with the default quality score. Appropriate exception is thrown if -DQS is not provided and BAM file doesn't have quality scores for every base. 2011-01-05 22:25:08 +00:00
settings Update Picard / sam-jdk at Tim's request. 2011-01-03 02:17:25 +00:00
shell no longer prints unnecessary table conversion failures that muck up emails. Run script now uses du not ls to display archive size 2011-01-02 13:27:37 +00:00
testdata VQSR now operates on LOD scores in the INFO field directly, and doesn't adjust the QUAL field. New format for tranches file uses LOD score. Old file format no longer supported. log10sumlog10() function, a very useful utility in MathUtils. No more ExtendedPileupElement! Robust math calculations in GMM so that no infinities are generated! HaplotypeScore refactored to enable use of filtered context. Not yet enabled... InferredContext getDouble and getInteger arguments now parse values from Strings if necessary 2010-11-15 22:19:22 +00:00
LICENSE Adding a license to the root directory in case BOSC checks for one. Has the 2010-04-20 16:04:29 +00:00
build.xml Switched from LSF command line wrappers to JNA wrappers around the C API. Side effects: 2010-12-10 04:36:06 +00:00
ivy.xml Switched from LSF command line wrappers to JNA wrappers around the C API. Side effects: 2010-12-10 04:36:06 +00:00