Commit Graph

71 Commits (9ff8a01da2bcd705b0c2fae7fe2d796beac8ba7f)

Author SHA1 Message Date
Mark DePristo 0cc5c3d799 General improvements to Queue
-- Support for collecting resources info from DRMAA runners
-- Disabled the non-standard mem_free argument so that we can actually use our own SGE cluster gsa4
-- NCoresRequest is a testing queue script for this.
-- Added two command line arguments:
  -- multiCoreJerk: don't request multiple cores for jobs with nt > 1.  This was the old behavior but it's really not the best way to run parallel jobs.  Now with queue if you run nt = 4 the system requests 4 cores on your host.  If this flag is thrown, though, it will only request 1 and you'll just use 4, like a jerk
  -- job_parallel_env: parallel environment named used with SGE to request multicore jobs.  Equivalent to -pe job_parallel_env NT for NT > 1 jobs
2011-12-20 14:05:09 -05:00
Khalid Shakir 7486696c07 When using bam list mode in HSP deriving VCF name from bam list instead of requiring an additional parameter.
Creating a single temporary directory per ant test run instead of a putting temp files across all runs in the same directory.
Updated various tests for above items and other small fixes.
2011-12-16 18:09:25 -05:00
Mark DePristo 550fb498be Support for NT testing (default up to 4) for CC and UG
-- Added convenience function addJobReportBinding to just new binding to the map (x -> y) as well
2011-12-14 18:45:00 -05:00
David Roazen 1ba03a5e72 Use optional() instead of required() to construct javaMemoryLimit argument in JavaCommandLineFunction 2011-12-05 14:06:00 -05:00
David Roazen d014c7faf9 Queue now properly escapes all shell arguments in generated shell scripts
This has implications for both Qscript authors and CommandLineFunction authors.

Qscript authors:
You no longer need to (and in fact must not) manually escape String values to
avoid interpretation by the shell when setting up Walker parameters. Queue will
safely escape all of your Strings for you so that they'll be interpreted literally. Eg.,

Old way:
filterSNPs.filterExpression = List("\"QD<2.0\"", "\"MQ<40.0\"", "\"HaplotypeScore>13.0\"")

New way:
filterSNPs.filterExpression = List("QD<2.0", "MQ<40.0", "HaplotypeScore>13.0")

CommandLineFunction authors:
If you're writing a one-off CommandLineFunction in a Qscript and don't really
care about quoting issues, just keep doing things the direct, simple way:

def commandLine = "cat %s | grep -v \"#\" > %s".format(files, out)

If you're writing a CommandLineFunction that will become part of Queue and
will be used by other QScripts, however, it's advisable to do things the
newer, safer way, ie.:

When you construct your commandLine, you should do so ONLY using the API methods
required(), optional(), conditional(), and repeat(). These will manage quoting
and whitespace separation for you, so you shouldn't insert quotes/extraneous
whitespace in your Strings. By default you get both (quoting and whitespace
separation), but you can disable either of these via parameters. Eg.,

override def commandLine = super.commandLine +
                           required("eff") +
                           conditional(verbose, "-v") +
                           optional("-c", config) +
                           required("-i", "vcf") +
                           required("-o", "vcf") +
                           required(genomeVersion) +
                           required(inVcf) +
                           required(">", escape=false) +  // This will be shell-interpreted
                           required(outVcf)

I've ported the Picard/Samtools/SnpEff CommandLineFunction classes to the new
system, so you'll get free shell escaping when you use those in Qscripts just
like with walkers.
2011-12-01 18:13:44 -05:00
David Roazen fdd90825a1 Queue now outputs a GATK-like header with version number, build timestamp, etc. 2011-11-23 14:28:35 -05:00
Khalid Shakir c50274e02e During flanking interval creation merging overlapping flanks so that on scatter the list doesn't accidentally genotype the same site twice.
Moved flanking interval utilies to IntervalUtils with UnitTests.
2011-11-17 13:56:42 -05:00
Mark DePristo 0111e58d4e Don't generate PDF unless you have -run specified 2011-11-09 14:45:40 -05:00
Mark DePristo 849c0757f2 Bug fix for LocusScatterFunction when no intervals are provided
-- Now correctly grabs reference contigs and cuts them all up, rather than NPE as intervalString == null.
2011-11-04 10:55:09 -04:00
Mark DePristo bd977c2d92 Bug fix to avoid infinite loop in GATKScatterFunction 2011-11-02 16:20:42 -04:00
Mark DePristo c1da8cd5e7 Final version of bp-resolved locus scatter/gather
-- Minor refactoring to allow LocusScatterFunction to have maxIntervals be the original scatter count, rather than capping this by the interval count as Contig and Interval do
2011-11-02 11:26:34 -04:00
Mark DePristo c2b97030a4 IntervalUtils for completely balanced locus-based scatter/gather
-- scatterLocusIntervals master utility
-- Moved around some general functionality from GenomeLocSortedSet to GenomeLoc
-- Util function for reversing a list (List<T> -> List<T>, unlike Collections version)
-- DoC is PartitionType.INTERVAL
-- Significant unit tests on new functionality (all passing)
-- Ready for real-world testing, as soon as I can get LocusScatterFunction.scala to actually work
2011-11-02 10:49:40 -04:00
Mark DePristo 5fc613f972 Better default partition types for walkers
-- Added PartitionType.READ, and associated ReadScatterFunction.  ReadScatterFunction is literally just ContigScatterFunction until someone wants to implement something better
-- LocusWalkers (and subclasses RodWalkers and RefWalkers) are by default PartitionType.LOCUS.
2011-11-01 19:47:10 -04:00
Khalid Shakir e25d40882a Swapping Thread.sleep(0) with Object.wait(0) caused Queue to lock up. Thanks to rpoplin for pointing it out. 2011-10-28 15:51:03 -04:00
Khalid Shakir b80d407dc7 No more hunting down R "resources". As a tradeoff Rscript cannot be specified on the commandline and will be found in the environment path.
Other minor cleanup.
2011-10-27 14:17:07 -04:00
Eric Banks b39fcb1bea Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-10-26 15:44:25 -04:00
Eric Banks 3273c20c98 Added integration tests for Tribble-based intervals and fixed up some of the other tests based on some method changes. 2011-10-26 15:29:18 -04:00
Khalid Shakir fac9932938 Embedding gsalib source and queueJobReport R scripts in the dist and package jars.
Moved gsalib and queueJobReport.R to embeddable namespaced locations.
Updated packager dependencies/dir to add an @includes which filters the embedded fileset.
RScriptExecutor can now JIT compiles the gsalib.
RScriptExecutor uses ProcessController and sends the Rscript output to java's stdout when run under -l DEBUG.
Refactored ProcessController and IOUtils from Queue to Sting Utils.
Added more unit tests to ProcessController along with a utility class to hard stop OutputStreams at a specified byte count.
Replaced uses of some IOUtils with Apache Commons IO.
ShellJobRunner refactored to use direct ProcessController and now kills jobs on shutdown.
Better QGraph responsiveness on shutdown by using Object.wait() instead of Thread.sleep().
2011-10-24 15:58:34 -04:00
Mauricio Carneiro 9f867d77ca no sort order
subtle bug fixed.
2011-10-20 18:44:09 -04:00
Mauricio Carneiro c9d8b22092 Added BWASW support to the pipeline
Data Processing Pipeline can now use BWASW for realigning the reads. Useful for Ion Torrent data.
2011-10-20 18:36:28 -04:00
Menachem Fromer e5fc828546 With Khalid's implicit approval, I have removed this line that overrides the memory limit of the VCF-gathering function, so that the inherited limit remains 2011-10-18 14:47:39 -04:00
Khalid Shakir 84bd355690 Merged bug fix from Stable into Unstable 2011-09-27 14:34:39 -04:00
Khalid Shakir b090751f62 Fixed Ant / PluginManager issue where reflections was picking up all class files under current working directory due to "." in jar manifest classpaths.
Updates to HybridSelectionPipeline:
- Added annotations back via snpEff
- Minor updates to VQSR paths and lowered memory
2011-09-27 14:33:57 -04:00
Khalid Shakir 648b959361 Minor change to log an info message when a signal such as Ctrl-C is caught. 2011-09-27 00:50:19 -04:00
Mark DePristo 6ea57bf036 Merge branch 'master' into sgintervals 2011-09-19 09:50:19 -04:00
Khalid Shakir 33967a4e0c Fixed issue reported by chartl where cloned functions lost tags on @Inputs.
Updated ExampleUnifiedGenotyper.scala with new syntax.
2011-09-16 12:46:07 -04:00
Mark DePristo 06cb20f2a5 Intermediate commit cleaning up scatter intervals
-- Adding unit tests to ensure uniformity of intervals
2011-09-09 12:56:45 -04:00
Khalid Shakir 510d5e7730 Merged bug fix from Stable into Unstable 2011-09-09 01:34:55 -04:00
Khalid Shakir 367bbee25a Fixed typo when printing the contents or last N lines of a file. Thanks to larryns. 2011-09-09 01:33:25 -04:00
Mark DePristo 61633c95a8 Default jobreport is now jobPrefix, so you see logs like Q-2508.jobreport.txt 2011-08-28 19:19:45 -04:00
Mark DePristo b38de1fa35 Now captures the exechost in the job report
-- Works for in process, shell, and LSF runners
-- Cleanup of debugging output
2011-08-28 12:05:56 -04:00
Mark DePristo e37a638e09 Fix for disallowed characters in GATKReportTable
-- Illegal characters are automatically replaced with _
2011-08-26 13:24:06 -04:00
Mark DePristo 0cb1605df0 Clean documentation for JobRunInfo 2011-08-26 09:22:58 -04:00
Mark DePristo 415d5d5301 LSF long times are in seconds, convert to milliseconds to meet standard 2011-08-26 09:18:28 -04:00
Mark DePristo eef1ac415a Merge branch 'master' into rodTesting
Conflicts:
	public/java/src/org/broadinstitute/sting/gatk/walkers/variantutils/VariantsToTable.java
2011-08-26 00:35:41 -04:00
Mark DePristo e03dfdb0ab Automatic iteration field addition works properly. 2011-08-25 16:59:02 -04:00
Mark DePristo e01273ca7c Queue now writes out queueJobReport.pdf
-- General purpose RScript executor in java (please use when invoking RScripts)
-- Removed groupName.  This is now analysisName
-- Explicitly added capability to enable/disable individual QFunction
2011-08-25 16:57:11 -04:00
Mark DePristo 0f4be2c4a4 Argument to disable queueJobReport entirely
-- Minor improvements to RodPerformanceGoals
2011-08-25 13:32:03 -04:00
Mark DePristo d65faf509c Default output name for Queue JobReport is queue_jobreport.gatkreport.txt 2011-08-25 13:15:20 -04:00
Mark DePristo a7d6946b22 Refactored QJobReport and QFunction, which is now automatically tracked
-- All QFunctions, including sg ones, are tracked
-- Removed memory information
2011-08-25 13:13:55 -04:00
Mauricio Carneiro 16caca0822 BLASR BAMs and new BWA parameters
*Added the functions to turn a BLASR generated BAM file into a usable BAM file.
*Modified the bwa parameters according to test results from NA12878 pb2k dataset.
2011-08-24 17:04:07 -04:00
Mauricio Carneiro e3f5d7067a Added ReorderSam queue binding 2011-08-24 17:03:11 -04:00
Mark DePristo 08fb21f127 Removing hostname 2011-08-24 16:45:50 -04:00
Mark DePristo 06e30a81d1 Fixes throughout for getting job information
-- no more hostname -- it's just not going to be important
2011-08-24 15:30:09 -04:00
Mark DePristo 4918519a58 No more NPE in getRuntime() when you cntr-c out of Queue 2011-08-24 14:14:01 -04:00
Mark DePristo 16d8360592 QJobReport is now the official capability name 2011-08-24 13:59:14 -04:00
Mark DePristo d047c19ad1 Writes output to file 2011-08-24 13:52:05 -04:00
Mark DePristo 3ae68e2397 JobLogging trait now writes out GATKReport log of jobs 2011-08-24 13:36:39 -04:00
Mark DePristo b8bc03bb42 JobRunInfo improvements
-- dry-run now adds some info, for testing
-- InProcessRunner adds some, but not all, of the information we want
2011-08-23 17:11:22 -04:00
Mark DePristo 31ec6e316c First implementation of JobRunInfo
-- onExecutionDone(Map(QFunction, JobRunInfo)) is the new signature, so that you can walk over your jobs and inspect their success/failure and runtime characteristics
2011-08-23 16:51:54 -04:00