Using the VCFWriterATD isCompressed to check if the VCF index will be auto generated.
Tracking BAM and Tribble indexes as @Inputs and @Outputs in generated QFunctions.
Updates to the BamGatherFunction to disable the index during merge when disable_bam_indexing = true.
Made a shortcut for live-running pipelinetest, pipelinetestrun.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5606 348d0f76-0448-11de-a6fe-93d51630548a
Disabled the MFCP while the FCP gets an update.
Minor updates to email messages for upcoming scala 2.9.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5588 348d0f76-0448-11de-a6fe-93d51630548a
QuickCCTest is my test script for the gatherer.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5547 348d0f76-0448-11de-a6fe-93d51630548a
Custom gatherer prints out the class name in the logs.
Try to retrieve mail domain from /etc/mailname before falling back to the hostname.
Building oneoff jars during ant oneoffs.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5540 348d0f76-0448-11de-a6fe-93d51630548a
MDP: Removed ApplyVariantCut as it's no longer necessary with VQSR2.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5534 348d0f76-0448-11de-a6fe-93d51630548a
Fixed escaping expressions that have more than one space between arguments.
Updated example to match the wiki.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5516 348d0f76-0448-11de-a6fe-93d51630548a
VCF gathering passes on the no_header and sites_only flags to CombineVariants.
Fixed deletion of gathered log files. Although they are intermediate and do not need to be re-run if not present, they should not be deleted.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5508 348d0f76-0448-11de-a6fe-93d51630548a
JavaCommandLineFunctions can now specify the classpath+mainclass as an alternative to specifying a path to an executable jar.
JCLF by default pass on the current classpath and only require the mainclass be specified by the developer extending the JCLF, relieving the QScript author from having to explicitly specify the jar.
Like the Picard MergeSamFiles, GATK engine by default is now run from the current classpath. The GATK can still be overridden via .jarFile or .javaClasspath.
Walkers from the GATK package are now also embedded into the Queue package.
Updated AnalyzeCovariates to make it easier to guess the main class, AnalyzeCovariates instead of AnalyzeCovariatesCLP.
Removed the GATK jar argument from the example QScripts.
Removed one of the most FAQ when getting started with Scala/Queue, the use of Option[_] in QScripts:
1) Fixed mistaken assumption with java enums. In java enums can be null so they don't need nullable wrappers.
2) Added syntactic sugar for Nullable primitives to the QScript trait. Any variable defined as Option[Int] can just be assigned an Int value or None, ex: myFunc.memoryLimit = 3
Removed other unused code.
Re-fixed dry run function ordering.
Re-ordered the QCommandline companion object so that IntelliJ doesn't complain about missing main methods.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5504 348d0f76-0448-11de-a6fe-93d51630548a
- now it differentiates between confident REF calls and not confident calls.
- you can now use a BAM file as the truth set.
- output is much clearer now
dataProcessingPipeline version 2, ready to be used.
- All the processing is now done at the sample level
- Reads the input bam file headers to combine all lanes of the same sample.
- Cleaning is now scattered/gathered. Inteligently breaks down in as many intervals as possible, given the dataset.
- Outputs one processed bam file per sample (and a .list file with all processed files listed)
- Much faster, low pass (read Papuans) can run in the hour queue.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5493 348d0f76-0448-11de-a6fe-93d51630548a
Fixed initialization of pending counts when using -startFromScratch so the count doesn't start at zero and end at -<#njobs>.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5483 348d0f76-0448-11de-a6fe-93d51630548a
ProduceBeagleInputWalker can optionally emit a beagle markers file, necessary to use the beagled reference panel for imputation. Also supports the VQSR calibration curve idea that a site can be flagged as a certain FP, based on the VQSLOD field. This allows us to have both continuous quality in the refinement of sites as well as hard filtering at some threshold so we don't end up with lots of sites with all 1/3 1/3 1/3 likelihoods for all samples (i.e., a definite FP site where we don't know anything about the samples).
Added a new VariantsToBeagleUnphased walker that writes out a marker drive hard-call unphased genotypes file suitable for imputating missing genotypes with a reference panel with beagle. Can optionally keep back a fraction of sites, marked as missing in the genotypes file, for assessment of imputation accuracy and power. The bootstrap sites can be written to a separate VCF for assessment as well.
Finally, my general Queue script for creating and evaluating reference panels from VCF files. Supports explicitly genotyping a BAM file at each panel SNP site, for assessment of imputation accuracy of a reference panel. Lots of options for exploring the impact of the VQS likelihooods, multiple VCFs for constructing the reference panel, as well as fraction of sites left out in assessing the panel's power.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5467 348d0f76-0448-11de-a6fe-93d51630548a