gatk-3.8/scala/qscript/playground/MultiFullCallingPipeline.scala

100 lines
3.7 KiB
Scala
Raw Normal View History

import collection.JavaConversions
Walkers can now specify a class extending from Gatherer to merge custom output formats. Add @Gather(MyGatherer.class) to the walker @Output. JavaCommandLineFunctions can now specify the classpath+mainclass as an alternative to specifying a path to an executable jar. JCLF by default pass on the current classpath and only require the mainclass be specified by the developer extending the JCLF, relieving the QScript author from having to explicitly specify the jar. Like the Picard MergeSamFiles, GATK engine by default is now run from the current classpath. The GATK can still be overridden via .jarFile or .javaClasspath. Walkers from the GATK package are now also embedded into the Queue package. Updated AnalyzeCovariates to make it easier to guess the main class, AnalyzeCovariates instead of AnalyzeCovariatesCLP. Removed the GATK jar argument from the example QScripts. Removed one of the most FAQ when getting started with Scala/Queue, the use of Option[_] in QScripts: 1) Fixed mistaken assumption with java enums. In java enums can be null so they don't need nullable wrappers. 2) Added syntactic sugar for Nullable primitives to the QScript trait. Any variable defined as Option[Int] can just be assigned an Int value or None, ex: myFunc.memoryLimit = 3 Removed other unused code. Re-fixed dry run function ordering. Re-ordered the QCommandline companion object so that IntelliJ doesn't complain about missing main methods. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5504 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-24 22:03:51 +08:00
import org.broadinstitute.sting.queue.function.JavaCommandLineFunction
import org.broadinstitute.sting.queue.QScript
import org.broadinstitute.sting.queue.util.IOUtils
import org.broadinstitute.sting.utils.text.XReadLines
class MultiFullCallingPipeline extends QScript {
qscript =>
@Input(doc="Sting home", shortName="stingHome")
var stingHome: File = _
@Input(doc="yaml lists to run", shortName="YL")
var yamlList: File = _
@Argument(doc="number of jobs per batch", shortName="BS")
var batchSize: Int = _
@Argument(doc="pipeline status to", shortName="PS", required = false)
var pipelineStatusTo: String = _
@Argument(doc="pipeline job queue", shortName="PJQ", required = false)
var pipelineJobQueue: String = _
@Argument(doc="pipeline short queue", shortName="PSQ", required = false)
var pipelineShortQueue: String = _
@Argument(doc="pipeline priority", shortName="PP", required = false)
var pipelinePriority: Option[Int] = None
@Argument(doc="pipeline retry", shortName="PR", required = false)
var pipelineRetry: Option[Int] = None
@Argument(doc="run with -tearScript", shortName="TS")
var runWithTearScript = false
def script {
// Global arguments for all pipeline runs
stingHome = IOUtils.absolute(stingHome)
val queueJar = new File(stingHome, "dist/Queue.jar")
val pipelineScript = new File(stingHome, "scala/qscript/playground/FullCallingPipeline.q")
val gatkJar = new File(stingHome, "dist/GenomeAnalysisTK.jar")
val tearScript = if (runWithTearScript) new File(stingHome, "R/DataProcessingReport/GetTearsheetStats.R") else null
// Parse the yaml list
var yamls = List.empty[File]
for (yaml <- JavaConversions.asScalaIterator(new XReadLines(yamlList)))
yamls :+= new File(yaml)
// The list of previous outputs
val lastOuts = new Array[File](batchSize)
for (yamlGroup <- yamls.grouped(batchSize)) {
for ((yaml, i) <- yamlGroup.zipWithIndex) {
// Get the last output for index(i), which is null for the first job.
val lastOut = lastOuts(i)
// Run the pipeline on the yaml waiting for the last output.
val runPipeline = new RunPipeline(yaml, lastOut)
// Add this run to the graph.
add(runPipeline)
// Have the next job at index(i) wait for this output file.
lastOuts(i) = runPipeline.jobOutputFile
}
}
/**
* Runs a yaml in a pipeline only after a previous pipeline
* run has produced the passed in output file.
*/
Walkers can now specify a class extending from Gatherer to merge custom output formats. Add @Gather(MyGatherer.class) to the walker @Output. JavaCommandLineFunctions can now specify the classpath+mainclass as an alternative to specifying a path to an executable jar. JCLF by default pass on the current classpath and only require the mainclass be specified by the developer extending the JCLF, relieving the QScript author from having to explicitly specify the jar. Like the Picard MergeSamFiles, GATK engine by default is now run from the current classpath. The GATK can still be overridden via .jarFile or .javaClasspath. Walkers from the GATK package are now also embedded into the Queue package. Updated AnalyzeCovariates to make it easier to guess the main class, AnalyzeCovariates instead of AnalyzeCovariatesCLP. Removed the GATK jar argument from the example QScripts. Removed one of the most FAQ when getting started with Scala/Queue, the use of Option[_] in QScripts: 1) Fixed mistaken assumption with java enums. In java enums can be null so they don't need nullable wrappers. 2) Added syntactic sugar for Nullable primitives to the QScript trait. Any variable defined as Option[Int] can just be assigned an Int value or None, ex: myFunc.memoryLimit = 3 Removed other unused code. Re-fixed dry run function ordering. Re-ordered the QCommandline companion object so that IntelliJ doesn't complain about missing main methods. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5504 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-24 22:03:51 +08:00
class RunPipeline(yamlFile: File, lastOutput: File) extends JavaCommandLineFunction {
private var yamlName = yamlFile.getName.stripSuffix(".yaml")
@Input(doc="output file to wait for", required=false)
var waitJobOutputFile = lastOutput
@Output(doc="virtual output file tagging this pipeline as complete")
var pipelineComplete = new File(yamlFile.getParentFile, yamlName + ".mfcp")
commandDirectory = yamlFile.getParentFile
jobOutputFile = IOUtils.absolute(commandDirectory, yamlName + ".queue.txt")
jarFile = queueJar
Walkers can now specify a class extending from Gatherer to merge custom output formats. Add @Gather(MyGatherer.class) to the walker @Output. JavaCommandLineFunctions can now specify the classpath+mainclass as an alternative to specifying a path to an executable jar. JCLF by default pass on the current classpath and only require the mainclass be specified by the developer extending the JCLF, relieving the QScript author from having to explicitly specify the jar. Like the Picard MergeSamFiles, GATK engine by default is now run from the current classpath. The GATK can still be overridden via .jarFile or .javaClasspath. Walkers from the GATK package are now also embedded into the Queue package. Updated AnalyzeCovariates to make it easier to guess the main class, AnalyzeCovariates instead of AnalyzeCovariatesCLP. Removed the GATK jar argument from the example QScripts. Removed one of the most FAQ when getting started with Scala/Queue, the use of Option[_] in QScripts: 1) Fixed mistaken assumption with java enums. In java enums can be null so they don't need nullable wrappers. 2) Added syntactic sugar for Nullable primitives to the QScript trait. Any variable defined as Option[Int] can just be assigned an Int value or None, ex: myFunc.memoryLimit = 3 Removed other unused code. Re-fixed dry run function ordering. Re-ordered the QCommandline companion object so that IntelliJ doesn't complain about missing main methods. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5504 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-24 22:03:51 +08:00
memoryLimit = 1
override def commandLine = super.commandLine +
optional(" -statusTo ", qscript.pipelineStatusTo) +
optional(" -jobQueue ", qscript.pipelineJobQueue) +
optional(" -shortJobQueue ", qscript.pipelineShortQueue) +
optional(" -jobPriority ", qscript.pipelinePriority) +
optional(" -retry ", qscript.pipelineRetry) +
optional(" -tearScript ", tearScript) +
" -S %s --gatkjar %s -jobProject %s -jobPrefix %s -Y %s -bsub -run"
.format(pipelineScript, gatkJar, yamlName, yamlName, yamlFile)
override def dotString = "Queue: " + yamlName
}
}
}