Updated the Queue scatter/gather for read walkers to include -L unmapped on the last scatter job when intervals aren't specified, and to map it correctly when it is explicitly set.
Simplified the build.xml/ivy.xml to fix a bug reported with "ant clean dist test" where the scalac target wasn't found.
Now building all scala code at the same time, just like all java code is compiled at the same time.
Sped up the build for everyone by uncommenting a small bit of classes so that javac/scalac will not constantly launch trying to build .class files that will never compile.
Moved some source files to their expected location so that the .java/.scala -> .class is a one-to-one match, again keeping the compilers from wasting cycles.
Used <uptodate> and <touch> to skip extracting the help text and generating the GATK Queue extensions when the source files haven't been modified.
Fixed a couple errors when the <javadoc> task is run.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4963 348d0f76-0448-11de-a6fe-93d51630548a
* Stratifications (by comp rod, by eval rod, novelty, filter status, etc.) have been generalized. They are very symmetric with evaluators now. Each stratification can have multiple states (e.g. known, novel, all). New stratifications can be added and optionally applied. Some new stratifications include:
- by sample
- by functional class
- by CpG status
* Output is to a single file in GATKReport format, rather than having the options of CSV, R, table, etc.
* Rather than needing to state up front that the allowable variant type is a SNP or an indel, each eval record is inspected and the appropriate record type is fetched from the comp track. (This will require a bit more testing...)
* Evaluation context (basically a single row in a VariantEval report) generation and retrieval has been overhauled. Now, every possible configuration of stratification state is generated recursively and stored in a HashMap. The key of the HashMap is a key that represents that exact state configuration. When examining a comp track and eval track, this key is computed based on the data, providing easy lookup for the appropriate evaluation context. When there are only a handful of stratification configurations, this isn't a big deal. But when operating on a file with hundreds of samples, multipled by 3 states for novelty, 3 states for filtration, 3 states for CpG status, etc., it becomes a very big deal.
There are still some known issues:
* When the per-sample stratification is turned off, things are getting overcounted (too many variants are showing up when compared to the VariantEval 2.0 code). It's probably because I break out the VariantContext by sample even when not necessary, and those irrelevant contexts are still being counted. Or my recursion is overaggressively creating evaluation contexts, and they all get added up in a weird way. But that's why I'm committing now - so I can track down this issue without losing my work so far.
* The Jexl expressions are sometimes throwing an exception that I don't yet understand (they complain of an incorrect specification on the command-line... *after* the program has made it through a few thousand records.
* The request to have evaluations be smart enough to reject certain stratification states is not implemented yet.
There's still some work to do before I can replace VariantEval 2.0 with VariantEval 3.0, but feel free to take a look. I'd love comments on the new code.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4946 348d0f76-0448-11de-a6fe-93d51630548a
Adding the first version of the techdev pipeline (tdPipeline)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4943 348d0f76-0448-11de-a6fe-93d51630548a
using one as though it was. Fixed, and debug code reverted.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4917 348d0f76-0448-11de-a6fe-93d51630548a
streaming/piping VCFs into the GATK. Notable changes:
- Public interface to RMDTrackBuilder is greatly simplified; users can use it only to build
RMDTracks and lookup codecs.
- RODDataSource and RMDTrack are no longer functionally at the same level; RODDataSources now
manage RMDTracks on behalf of the GATK, and the only direct consumers of the RMDTrack class
are the walkers that feel the need to access the ROD system directly. (We need to stamp out
this access pattern.
A few minor warts were introduced as part of this process, labeled with TODOs. These'll be
fixed as part of the VCF streaming project.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4915 348d0f76-0448-11de-a6fe-93d51630548a
into the CommandLine* classes. This makes it easier for external functionality
(such as the VCF streamer) to use GenomeAnalysisEngine directly.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4897 348d0f76-0448-11de-a6fe-93d51630548a