and enable/disable performance logs from outside the JVM process. Making this
available for the moment; we'll see whether it ends up being useful.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4983 348d0f76-0448-11de-a6fe-93d51630548a
after the line continuation backslash and enhance the error message if it
appears.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4981 348d0f76-0448-11de-a6fe-93d51630548a
primitive types were properly validated and throw the proper
MissingArgumentValue UserException. Before this fix, the error reported
was the infamous DePristo BSOD (Could not create module String because
an exception of type NullPointerException occurred caused by exception null).
Thanks Mauricio!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4980 348d0f76-0448-11de-a6fe-93d51630548a
- No longer a standard eval module to keep integration tests happy
- Remove class name overlaps with SimpleMetricsByAC so that modules don't overwrite each other's files, and to make it easier to grep results.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4978 348d0f76-0448-11de-a6fe-93d51630548a
GSA-410 Local job runs now can run command lines longer than than 4096 on our linux machines.
When determining if the help text and Queue extensions need to be rebuilt, use the .class files not the .java so that GATK oneoffs are picked up correctly.
Added the most basic of all example QScripts for debugging, Hello World.
Minor updates to copy/pasted LSF code to reduce ant javadoc warnings by a third.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4970 348d0f76-0448-11de-a6fe-93d51630548a
Updated the Queue scatter/gather for read walkers to include -L unmapped on the last scatter job when intervals aren't specified, and to map it correctly when it is explicitly set.
Simplified the build.xml/ivy.xml to fix a bug reported with "ant clean dist test" where the scalac target wasn't found.
Now building all scala code at the same time, just like all java code is compiled at the same time.
Sped up the build for everyone by uncommenting a small bit of classes so that javac/scalac will not constantly launch trying to build .class files that will never compile.
Moved some source files to their expected location so that the .java/.scala -> .class is a one-to-one match, again keeping the compilers from wasting cycles.
Used <uptodate> and <touch> to skip extracting the help text and generating the GATK Queue extensions when the source files haven't been modified.
Fixed a couple errors when the <javadoc> task is run.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4963 348d0f76-0448-11de-a6fe-93d51630548a
* Stratifications (by comp rod, by eval rod, novelty, filter status, etc.) have been generalized. They are very symmetric with evaluators now. Each stratification can have multiple states (e.g. known, novel, all). New stratifications can be added and optionally applied. Some new stratifications include:
- by sample
- by functional class
- by CpG status
* Output is to a single file in GATKReport format, rather than having the options of CSV, R, table, etc.
* Rather than needing to state up front that the allowable variant type is a SNP or an indel, each eval record is inspected and the appropriate record type is fetched from the comp track. (This will require a bit more testing...)
* Evaluation context (basically a single row in a VariantEval report) generation and retrieval has been overhauled. Now, every possible configuration of stratification state is generated recursively and stored in a HashMap. The key of the HashMap is a key that represents that exact state configuration. When examining a comp track and eval track, this key is computed based on the data, providing easy lookup for the appropriate evaluation context. When there are only a handful of stratification configurations, this isn't a big deal. But when operating on a file with hundreds of samples, multipled by 3 states for novelty, 3 states for filtration, 3 states for CpG status, etc., it becomes a very big deal.
There are still some known issues:
* When the per-sample stratification is turned off, things are getting overcounted (too many variants are showing up when compared to the VariantEval 2.0 code). It's probably because I break out the VariantContext by sample even when not necessary, and those irrelevant contexts are still being counted. Or my recursion is overaggressively creating evaluation contexts, and they all get added up in a weird way. But that's why I'm committing now - so I can track down this issue without losing my work so far.
* The Jexl expressions are sometimes throwing an exception that I don't yet understand (they complain of an incorrect specification on the command-line... *after* the program has made it through a few thousand records.
* The request to have evaluations be smart enough to reject certain stratification states is not implemented yet.
There's still some work to do before I can replace VariantEval 2.0 with VariantEval 3.0, but feel free to take a look. I'd love comments on the new code.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4946 348d0f76-0448-11de-a6fe-93d51630548a
Adding the first version of the techdev pipeline (tdPipeline)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4943 348d0f76-0448-11de-a6fe-93d51630548a
using one as though it was. Fixed, and debug code reverted.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4917 348d0f76-0448-11de-a6fe-93d51630548a
streaming/piping VCFs into the GATK. Notable changes:
- Public interface to RMDTrackBuilder is greatly simplified; users can use it only to build
RMDTracks and lookup codecs.
- RODDataSource and RMDTrack are no longer functionally at the same level; RODDataSources now
manage RMDTracks on behalf of the GATK, and the only direct consumers of the RMDTrack class
are the walkers that feel the need to access the ROD system directly. (We need to stamp out
this access pattern.
A few minor warts were introduced as part of this process, labeled with TODOs. These'll be
fixed as part of the VCF streaming project.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4915 348d0f76-0448-11de-a6fe-93d51630548a
into the CommandLine* classes. This makes it easier for external functionality
(such as the VCF streamer) to use GenomeAnalysisEngine directly.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4897 348d0f76-0448-11de-a6fe-93d51630548a
Ability to genotype candidate variants from input vcf is retained and can be turned on by command line argument but is disabled by default.
Code, by default, will build a consensus of the most common indel event at a pileup. If that consensus allele has a count bigger than N (=5 by default), we proceed to genotype by computing probabilistic realigmment, AF distribution etc. and possibly emmiting a call.
Needed for this, also added ability to build haplotypes from list of alleles instead of from a variant context.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4893 348d0f76-0448-11de-a6fe-93d51630548a
unmapped reads and explicitly excluding (-XL unmapped) unmapped reads, augmenting
the suite of unit tests already put in place.
'-L unmapped' seems safe to use; go for it, but please validate results against
samtools flagstat when the process finishes.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4849 348d0f76-0448-11de-a6fe-93d51630548a
Fixed integration tests to wait on their own for the job to run instead of using SUB2_BSUB_BLOCK.
Updated VariantRecalibrationIntegrationTests MD5s which were knocked out of sync whele SUB2_BSUB_BLOCK was exiting in the middle of integration tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4840 348d0f76-0448-11de-a6fe-93d51630548a
- basic unit tests for interval sorting and merging with mix of mapped/unmapped.
- validation to ensure that locus walkers (really all non-read walkers) blow up with a user error when -L unmapped is specified.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4837 348d0f76-0448-11de-a6fe-93d51630548a
- Repeat Expansions
- Novel sequence
And for indels of size <=2 we get a per-mononuc. or dinuc. breakdown of novels and expansions.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4828 348d0f76-0448-11de-a6fe-93d51630548a
VariantEvalWalker's logger is made public, so that variant eval modules can access it through the parent object.
DesignFileGenerator comment lists how best to bind things to it, and the feature accessor is better refined to grab the genome loc. (old change)
scala changes:
convenience addAll( List[CommandLineFunction] ) added to QScript class (and thus removed from the fCPV2)
useful command line functions added to a new library package for command line functions (these are fast simple VCF command lines)
bug fixed in ProjectManagement for the class where there's only one batch to be batch-merged (not really part of the use-case, but an edge-condition that came up during pipeline testing)
first draft of a private mutations pipeline which will be elaborated in future
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4823 348d0f76-0448-11de-a6fe-93d51630548a