gatk-3.8/scala/qscript/oneoffs/hanna/DoC.scala

60 lines
2.2 KiB
Scala
Raw Normal View History

import org.broadinstitute.sting.gatk.DownsampleType
import org.broadinstitute.sting.queue.QScript
import org.broadinstitute.sting.queue.extensions.gatk._
/**
* A pipeline for Queue that runs a custom walker outside of the GATK jar.
* NOTE: This code is an unsupported example for soliciting feedback on how to improve Queue.
* Future syntax will simplify running the GATK so please expect the syntax below to change significantly.
*/
class DoC extends QScript {
// The full packaged jar should be used.
// You can build this jar via 'ant package' and then find it under
// 'Sting/dist/packages/GenomeAnalysisTK-*/GenomeAnalysisTK.jar'
@Input(doc="The path to the packaged GenomeAnalysisTK.jar file.", shortName="gatk")
var gatkJar: File = null
@Input(doc="The reference file for the bam files.", shortName="R")
var referenceFile: File = null
// NOTE: Do not initialize List, Set, or Option to null
// as you won't be able to update the collection.
// By default set:
// List[T] = Nil
// Set[T] = Set.empty[T]
// Option[T] = None
@Input(doc="One or more bam files.", shortName="I")
var bamFiles: List[File] = Nil
@Input(doc="An optional file with a list of intervals to proccess.", shortName="L", required=false)
var intervalsString: List[String] = List("2:87000001-90000000")
// This trait allows us set the variables below in one place,
// and then reuse this trait on each CommandLineGATK function below.
trait DepthOfCoverageArguments extends CommandLineGATK {
this.jarFile = DoC.this.gatkJar
this.reference_sequence = DoC.this.referenceFile
this.intervalsString = DoC.this.intervalsString
Walkers can now specify a class extending from Gatherer to merge custom output formats. Add @Gather(MyGatherer.class) to the walker @Output. JavaCommandLineFunctions can now specify the classpath+mainclass as an alternative to specifying a path to an executable jar. JCLF by default pass on the current classpath and only require the mainclass be specified by the developer extending the JCLF, relieving the QScript author from having to explicitly specify the jar. Like the Picard MergeSamFiles, GATK engine by default is now run from the current classpath. The GATK can still be overridden via .jarFile or .javaClasspath. Walkers from the GATK package are now also embedded into the Queue package. Updated AnalyzeCovariates to make it easier to guess the main class, AnalyzeCovariates instead of AnalyzeCovariatesCLP. Removed the GATK jar argument from the example QScripts. Removed one of the most FAQ when getting started with Scala/Queue, the use of Option[_] in QScripts: 1) Fixed mistaken assumption with java enums. In java enums can be null so they don't need nullable wrappers. 2) Added syntactic sugar for Nullable primitives to the QScript trait. Any variable defined as Option[Int] can just be assigned an Int value or None, ex: myFunc.memoryLimit = 3 Removed other unused code. Re-fixed dry run function ordering. Re-ordered the QCommandline companion object so that IntelliJ doesn't complain about missing main methods. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5504 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-24 22:03:51 +08:00
this.memoryLimit = 8
}
def script = {
// Create the four function that we can run.
val doc = new DepthOfCoverage with DepthOfCoverageArguments
Walkers can now specify a class extending from Gatherer to merge custom output formats. Add @Gather(MyGatherer.class) to the walker @Output. JavaCommandLineFunctions can now specify the classpath+mainclass as an alternative to specifying a path to an executable jar. JCLF by default pass on the current classpath and only require the mainclass be specified by the developer extending the JCLF, relieving the QScript author from having to explicitly specify the jar. Like the Picard MergeSamFiles, GATK engine by default is now run from the current classpath. The GATK can still be overridden via .jarFile or .javaClasspath. Walkers from the GATK package are now also embedded into the Queue package. Updated AnalyzeCovariates to make it easier to guess the main class, AnalyzeCovariates instead of AnalyzeCovariatesCLP. Removed the GATK jar argument from the example QScripts. Removed one of the most FAQ when getting started with Scala/Queue, the use of Option[_] in QScripts: 1) Fixed mistaken assumption with java enums. In java enums can be null so they don't need nullable wrappers. 2) Added syntactic sugar for Nullable primitives to the QScript trait. Any variable defined as Option[Int] can just be assigned an Int value or None, ex: myFunc.memoryLimit = 3 Removed other unused code. Re-fixed dry run function ordering. Re-ordered the QCommandline companion object so that IntelliJ doesn't complain about missing main methods. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5504 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-24 22:03:51 +08:00
doc.downsampling_type = DownsampleType.NONE
doc.omitLocusTable = true
doc.omitIntervals = true
doc.omitSampleSummary = true
// If you are running this on a compute farm, make sure that the Sting/shell
// folder is in your path to use mergeText.sh and splitIntervals.sh.
//doc.scatterCount = 3
doc.input_file = DoC.this.bamFiles
doc.out = new File("doc-all.out")
add(doc)
}
}