Commit Graph

166 Commits (b7334dcc1e28c5b93dc8d2d3f4ee559a65e13305)

Author SHA1 Message Date
kshakir 57353294cc Copying jobLimitSeconds to clones.
Some cleanup and refactoring around copying values to clones.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5128 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-30 06:35:53 +00:00
kshakir 2ef66af903 Moved the maximum number of intervals check from FCP to the Queue core so that scatter gather will no longer blow up if you specify a scatter count that is too high.
Moved the BamListWriter from FCP to ListWriterFunction in the Queue core.
Added an ExampleCountLoci QScript along with an example pipeline integration test which checks MD5s.
Added a few more utility methods to PipelineTest including a currentGATK variable that points to the GATK jar.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5121 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-28 23:33:58 +00:00
kshakir ce5b11317b Moved some shutdown logic from the LSF job runner into the QGraph.
Because of Java's type erasure JobManagers must provide runtime access to the runner class to shutdown.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5076 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-25 20:28:54 +00:00
kshakir b3c9b9bfbe +1 file that should have been with the last checkin.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5069 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-25 05:31:17 +00:00
kshakir 9923e05e0a Moved MD5 utils from WalkerTest to BaseTest for use by PipelineTests.
Moved VariantEval validation from FCPTest to PipelineTest.
Cleaned up some duplicate code for writing temp files during tests.
Moved FCPTest to playground namespace to match move for FCP.q.
Added a basic HelloWorldPipelineTest for the HelloWorld QScript. 
Moved duplicated error handling from JobRunners into the FunctionEdge.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5068 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-25 04:11:49 +00:00
kshakir 6fbd18c759 Cleaning up obsolete code.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5044 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-21 16:27:35 +00:00
kshakir 8855f080c2 For the fullCallingPipeline.q:
- Reading the refseq table from the YAML if not specified on the command line.
 - Removed obsolete -bigMemQueue now that CombineVariants runs in 4g.
 - Added a -mountDir /broad/software option to work around adpr automount issues.
 - Merged the LSF preexec used for automount into the shell script used to execute tasks.
 - Using the LSF C Library to determine when jobs are complete instead of postexec.
 - Updated queue.sh to match the changes above.
 - Updated the FCPTest to match the changes above.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5036 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-20 22:34:43 +00:00
rpoplin 00453919d2 VQSR now only uses the valid polymorphic sites for training and truth sensitivity calculations. Any number of tracks whose ROD binding begins with the name truth can be used as truth sensitivity tracks.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5012 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-18 20:48:19 +00:00
kshakir 2355a55067 Added a QFunction.jobLimitSeconds for experimentation, currently only used as the equivalent of bsub -W.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5002 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-14 19:45:03 +00:00
kshakir 8ba3a5a43f Command lines for locally run Queue jobs no longer have to be escaped differently than bsub'ed jobs.
GSA-410 Local job runs now can run command lines longer than than 4096 on our linux machines.
When determining if the help text and Queue extensions need to be rebuilt, use the .class files not the .java so that GATK oneoffs are picked up correctly.
Added the most basic of all example QScripts for debugging, Hello World.
Minor updates to copy/pasted LSF code to reduce ant javadoc warnings by a third.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4970 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-10 21:07:29 +00:00
kshakir b34e2f733f Removed stochasticity from IndelRealigner by random sampling using and seed based on the read list.
Updated the Queue scatter/gather for read walkers to include -L unmapped on the last scatter job when intervals aren't specified, and to map it correctly when it is explicitly set.
Simplified the build.xml/ivy.xml to fix a bug reported with "ant clean dist test" where the scalac target wasn't found.
Now building all scala code at the same time, just like all java code is compiled at the same time.
Sped up the build for everyone by uncommenting a small bit of classes so that javac/scalac will not constantly launch trying to build .class files that will never compile.
Moved some source files to their expected location so that the .java/.scala -> .class is a one-to-one match, again keeping the compilers from wasting cycles.
Used <uptodate> and <touch> to skip extracting the help text and generating the GATK Queue extensions when the source files haven't been modified.
Fixed a couple errors when the <javadoc> task is run.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4963 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-07 22:03:36 +00:00
chartl 2235245af0 PrivatePermutations generalized to compute transition counts and average probabilities (and thus was renamed). Changes in some pipelines to reflect the change. Bugfix in the batch merging pipeline (it would halt because the allele VCF for genotyping batches could become off-spec).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4894 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-22 15:16:15 +00:00
chartl fc33901810 Graph structure must be known at compile time. Removing GroupIntervals until a future point where in-process-functions can predict their output based on inputs [though this is probably forever: the inputs may not exist at compile time!]
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4886 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-20 21:22:58 +00:00
chartl 61d5daa65c EXTREME interval processing. Still undergoing testing.
+ GroupIntervals allows user-defined scattering (e.g. take an interval list file, split it into k smaller interval list files by number of lines)
 + ExpandIntervals expands the intervals, either by widening them, or allowing the definition for nearby intervals (e.g. flanks starting 1bp before and after, ending 10bp after that)
 + IntersectIntervals takes n interval lists, writes 1 interval list that is the n-way intersection of all of them



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4885 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-20 19:42:50 +00:00
chartl 8118a439c0 Commit for Khalid
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4876 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-18 22:24:18 +00:00
chartl 6db86ae0c6 PAC sets the refseq file to optional.
Additional ipf for expanding interval lists. Either there's heavy latency on the toro -- or it's not working properly yet (e.g. the system print in the same scope as the file print outputs a line, but no file shows up on the system)



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4874 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-18 19:04:52 +00:00
chartl fd1d817d45 Cryptic implementation of base-string entropy. I suspect this scales ~linearly with length, so I may choose to normalize in the future.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4861 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-16 22:25:05 +00:00
kshakir 3a6d1dbcef Fixed a class initializer crash on shutdown when the graph has nothing to run.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4860 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-16 18:56:55 +00:00
chartl 08710fc71e A successor to the Design File Generator and GCContent walkers; allows for refseq/other metadata annotation of intervals, and calculates reference GC content and entropy of the interval. Compiles, but as yet untested and incomplete (but my repositories are kinda messed up so i'm committing this to blow 'em away and re checkout
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4857 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-16 18:14:03 +00:00
chartl 3e75431bc8 Thanks to mark: VCFInfoToTable removed in favor of a more flexible walker. Slight change to the argument structure of the walker to make it play more nicely with Queue: the field list parsing is pushed into the command line system (e.g. the variable is exposed as a List<String> and not a String, so Queue doesn't have to join a list into a string only to have it broken out again. This also allows the user to specify -F field1 -F field2 -F field3 if he/she so desires.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4842 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-15 03:33:36 +00:00
chartl 2217837845 Commit for Khalid -- should be a scala version of vcf2table but for some reason the run method isn't getting called.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4841 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-15 00:44:15 +00:00
chartl f36861eeee One more little bfix -- the issue was not the grep command, but instead the NFS in the awk; i changed it to ++count in the last commit which was really responsible for the fix. Then this ultra-escaping semi-broke teh grep again.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4831 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-13 20:36:14 +00:00
chartl d34c5640d2 Bugfix for clf version of extract samples. Due to dynamic shell creation and bsubs and whatnot, the OR pipe for grep ("a|b") needs to be super-escaped ("a\\\\\\\\|b").
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4829 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-13 19:06:30 +00:00
chartl f795b25c47 In-process versions of sample extraction and interval-list conversion for VCF files. Required an in-process-function branch of the queue library.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4827 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-13 17:36:53 +00:00
chartl 7bc2049031 Updates and bug fixes to private mutations qscript and pipeline libraries. Hand filter strings are now not busted (boo to having to escape quotes); convenience method added to VariantCalling to propagate standard trait data to a given GATK command line -- should be made more scala-esque in the future.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4824 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-13 04:55:13 +00:00
chartl cf75caf653 java changes:
VariantEvalWalker's logger is made public, so that variant eval modules can access it through the parent object.
 DesignFileGenerator comment lists how best to bind things to it, and the feature accessor is better refined to grab the genome loc. (old change)

scala changes:

convenience addAll( List[CommandLineFunction] ) added to QScript class (and thus removed from the fCPV2)
useful command line functions added to a new library package for command line functions (these are fast simple VCF command lines)
bug fixed in ProjectManagement for the class where there's only one batch to be batch-merged (not really part of the use-case, but an edge-condition that came up during pipeline testing)
first draft of a private mutations pipeline which will be elaborated in future



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4823 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-12 05:10:45 +00:00
kshakir 56433ebf6b Switched from LSF command line wrappers to JNA wrappers around the C API. Side effects:
- bsub command line is no longer fully printed out.
- extraBsubArgs hack is now a callback function updateJobRun.
Updated FullCallingPipelineTest to reflect latest changes to fullCallingPipeline.q.
Added a pipeline that tests the UGv2 runtimes at different bam counts and memory limits.
Updated VE packages that live in oneoffs to compile to oneoffs.
Added a hack to replace the deprecated symbol environ in Mac OS X 10.5+ which is needed by LSF7 on Mac.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4816 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-10 04:36:06 +00:00
chartl f8dd59c1d1 Tightening of the batch merging pipeline. Optimized to run on hour queue, so please: if you run this, crush 'hour' with it. Testing is forthcoming, but it merged 700 samples overnight.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4805 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-08 14:36:23 +00:00
chartl 02de9a9764 With multi-sample genotyping must come scatter+gather. Also Khalid informed me of the .group(size) method, so removing my useless (but pretty) code.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4797 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-06 20:12:23 +00:00
chartl f4c43f013f Due to the overhead for reading VCF files (>32g for 700 5MB VCF files), batched merging has to generate likelihoods in batches.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4796 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-06 18:23:54 +00:00
chartl 0944184832 Major refactoring of library and full calling pipeline (v2) structure.
Arguments to the full calling qscript (and indeed, any qscript that wants them) are now specified via the PipelineArgumentCollection

Libraries require a Pipeline object for instantiation -- eliminating their previous dependence on yaml files

Functions added to PipelineUtils to build out the proper Pipeline object from the PipelineArgumentCollection, which now contains 
additional arguments to specify pipeline properties (name, ref, bams, dbsnp, interval list); which are mutually exclusive with
the yaml file.

Pipeline length reduced to a mere 62 lines.




git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4790 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-05 02:33:54 +00:00
kshakir c7dbf66d41 Added a javaMemoryLimit option for cases where the java -Xmx memory should be lower than the bsub memory limit.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4778 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-02 22:38:06 +00:00
chartl 670ae814b3 Get rid of files from the grep string
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4773 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-02 18:39:59 +00:00
chartl 220fb0c44a Added a pipeline for merging batches. For now takes a file containing a list of VCFs, and a file containing a list of bams. Does not do anything smart (e.g. if you leave out some .bams or add some extra ones, you will not be warned). Heavy lifting done in (the beginnings of) a library for managing multi-batch or multi-project tasks.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4771 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-02 07:31:59 +00:00
chartl 9f03f09cc9 Changes to V2 pipeline and libraries. AB dropped. Cleaning enabled. Project name now properly propagated to intermediate files (instead of the string repr of the object). Indel mask is now expanded prior to filtering at indels.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4769 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-01 18:55:48 +00:00
chartl 06a0fb4489 Library-ized pipeline now functions
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4759 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-30 21:34:59 +00:00
ebanks d89e17ec8c Fare thee well, UGv1. Here come the days UGv2.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4747 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-29 21:51:19 +00:00
kshakir 787e5d85e9 Added the ability to test pipelines in dry or live mode via 'ant pipelinetest' and 'ant pipelinetest -Dpipeline.run=run'.
Added an initial test for genotyping chr20 on ten 1000G bams.
Since tribble needs logging support too, for now setting the logging level and appending the console logger to the root logger, not just to "org.broadinstitute.sting".
Updated IntervalUtilsUnitTest to output to a temp directory and not the SVN controlled testdata directory.
Added refseq tables and dbsnps to validation data in BaseTest.
Now waiting up to two minutes for gather parts to propagate over NFS before attempting to merge the files.
Setting scatter/gather directories relative to the -run directory instead of the current directory that queue is running.
Fixed a bug where escaping test expressions didn't handle delimiters at the beginning or end of the String.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4717 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-22 22:59:42 +00:00
kshakir 79725f2d9c Excluding the QFunction log files from the set of files to delete on completion.
When a QGraph is empty displaying a warning instead of crashing with an JGraph internal assertion error.
Cleaned up code using the Log4J root logger and explicitly talking to a logger for Sting.
When integration tests are run detecting that the logger has already been setup so that messages aren't logged twice.
Updated from Ivy 2.2.0-rc1 to 2.2.0.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4707 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-18 20:22:01 +00:00
kshakir 302e8f0239 Fixed bug where the command directory was not being set to an absolute path, leading LSF to write some .done files to /tmp.
No longer using the command directory for temporary .done files, and instead using the user specified temporary directory.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4678 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-15 17:59:39 +00:00
kshakir 801c562909 Now actually checking in the integration test mentioned in the prior commit: compiles the full calling pipeline.
Removed QScript usages of VariantRecalibrator's -reportDatFile, --report_dat_file


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4668 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-14 04:27:10 +00:00
kshakir 673fa841a4 Updated PluginManager so that during testing Queue can dynamically compile and load separately multiple class directories into the same class loader.
Removed obsolete usages of PackageUtils with updated PluginManager.
Ported Queue interval utilities written in scala over to Sting's java IntervalUtils.
Added a very basic intergration test to ensure that the fullCallingPipeline.q compiles.
Added options to specify the temporary directories without having to use -Djava.io.tmpdir (useful during the above integration test).
While adding tempDir added options to specify the run directory from the command line, for example "-runDir v1".
Upgraded to scala 2.8.1 and updated calls to deprecated functions.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4661 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-12 20:14:28 +00:00
kshakir f35d1aa43f Moving all file cleanup to IOUtils for easier debugging.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4646 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-10 21:00:58 +00:00
hanna 8e36a07bea Convert GenomeLocParser into an instance variable. This change is required
for anything that needs to be simultaneously aware of multiple references, eg
Queue's interval sharding code, liftover support, distributed GATK etc.  

GenomeLocParser instances must now be used to create/parse GenomeLocs.
GenomeLocParser instances are available in walkers by calling either

-getToolkit().getGenomeLocParser()
or
-refContext.getGenomeLocParser()

This is an intermediate change; GenomeLocParser will eventually be merged
with the reference, but we're not clear exactly how to do that yet.  This
will become clearer when contig aliasing is implemented.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4642 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-10 17:59:50 +00:00
kshakir d768c6558d Now that the user is required to set the java temp directory, it is safer for the LsfJobRunner to write to the java temp directory instead of the command directory.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4593 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-28 15:00:21 +00:00
hanna 4c23b1fe9c Get rid of the static cache of ArgumentTypeDescriptors by making them an integral part of the
parsing engine.  Hugely lowers our memory footprint in integrationtests, but not yet enough to 
run Mark's new parallelized VariantEvalIntegrationTests.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4585 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-27 19:44:55 +00:00
kshakir 8211cee0b2 Queue UI Improvements:
- Forcing user to set the temp directory via -Djava.io.tmpdir to avoid filling up /tmp.
- By default deleting job outputs tagged as intermediate.
- Defaulting pipeline to scatter count 1 (no reads deleted).
- Cleaning up temp classes even when scripting fails.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4573 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-26 19:49:08 +00:00
kshakir 80259b9e20 Changed fullCallingPipeline to output all contigs in the refence if scattering.
When the cleaner interval scatter count is set to one explicitly setting the intrevals to Nil.
TODO: Need to add an option that lets the user choose from the command line to scatter all contigs or just those in the intervals list.  For now can get relatively the same behavior by setting the interval scatter count equal to the number of contigs+1, assuming the random contigs come at the end of the sequence dictionary.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4565 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-24 03:01:06 +00:00
kshakir e9c6f681a4 Instead of the pipeline's cleaner only writing BAMs with the target intervals, now pulling the list of contigs from the target intervals and outputing reads in those contigs.
Added a brute force -retry <count> option to Queue for transient errors.
Waiting up to 2 minutes for the LSF logs to appear before trying to display the errors from the logs.
Updates to the local job runner error logging when a job fails.
Refactored QGraph's settings as duplicate code was getting out of control.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4563 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-22 22:22:30 +00:00
kshakir b954a5a4d5 - After removing special code for intervals, instead of being of type File they are generated as List[File]. Changed previous checkin that was appending to this list and instead assigning a singleton list.
- More cleanup including removing the temporary classes and intermediate error files.  Quieting any errors using Apache Commons IO 2.0.
- Counting the contigs during the QScript generation instead of the end user having to pass a separate contig interval list.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4539 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-21 06:37:28 +00:00
kshakir 88a0d77433 Changed parsing engine to store the order the argument bindings based on their definition in the class, moving "-T" to the front of Queue command lines.
Queue GATK generated .intervals is now a List(File) again removing special case handling in the generator.
Instead of using @Scatter annotation, using ScatterFunction instance to determine if a job can be scattered.
Implemented special VcfGatherFunction which only uses the header from the first file, even if the other files differ in their headers.
Added a -deleteIntermediates to Queue to delete the outputs from intermediate commands after a successful run.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4536 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-20 21:43:52 +00:00
kshakir 81479229e1 QScript authors can now tag functions as intermediate. Functions tagged as intermediate will be skipped unless another function in the graph needs their output.
Re-logging the failed jobs and the path to their log files at the end of a run.
Added a parameter -bigMemQueue for the fullCallingPipeline.q instead of hardcoding gsa (gsa was backed up and it was actually faster to run on week).


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4520 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-18 22:11:14 +00:00
kshakir 9dc2e931b6 Saving the order functions are added to in the QScript. Using the order during submission of ready jobs (but not currently dryrun) and during -status.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4508 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-15 20:00:35 +00:00
kshakir 7157cb9090 While bkill'ing on the shutdown thread Queue will no longer try to submit more jobs on the original thread.
Updated pipeline output structure to current recommendations by Corin.
Directories are now automatically before the function runs.
Fixed several bugs with scatter gather binding when the script author needs to change the directories.
Fixed bug with tracking of log files for CloneFunctions.
More error handling and logging of exceptions (good test environment while LSF was down this early AM!)
Removed cleanup utility for scatter gather.  SG Output structure has changed significantly.  Will need to discuss and find a better approach for Queue programatically deleting files.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4504 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-15 17:01:36 +00:00
kshakir 63e3848187 Added status email support with -statusTo. Will send emails on failure of an individual function or success/failure of the whole pipeline.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4496 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-14 15:58:52 +00:00
kshakir 5034ca18dc ...and forgot to sync up the changes to CommandLineFunction with CloneFunction.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4492 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-13 22:40:02 +00:00
kshakir 5ee12875fb Emergency fix for Ryan:
- Catching errors when LSF fails and retrying.
- When LSF retries fail, catching the error, marking the job as failed, and no longer bkilling everything by exiting Queue.
- Caching function fields by class instead of each instance of a function saving a list of its fields.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4490 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-13 22:22:01 +00:00
chartl 6368a46bab Scala protected is more akin to Java private than Java protected. Not typing these defs.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4470 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-08 19:36:23 +00:00
chartl 21ec44339d Somewhat major update. Changes:
- ProduceBeagleInputWalker
 + Now takes a validation ROD and a prior to give it, will use those genotypes in place of the variant genotypes if both are present
 + Takes a bootstrap argument -- can use some given %age of the validation sites
 + Optionally takes a bootstrap output argument -- re-prints the validation VCF, filtering those sites used as part of the bootstrap
-BeagleOutputToVCFWalker
 + Now filters sites where the genotypes have been reverted to hom ref
 + Now calls in to the new VCUtils to calculate AC/AN

-Queue
 + New pipeline libraries for easy qscript creation, still a work in progress, but this is a considerable prototype
 + full calling pipeline v2 uses the above libraries
 + minor changes to some of my own scripts
 + no more need for contig interval lists, these will be parsed out of your normal interval list when it is provided



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4459 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-08 13:30:28 +00:00
kshakir e02f837659 Added the ability for Queue functions like mkdirs to override if they are done or not.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4458 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-08 06:39:55 +00:00
kshakir 7f25019f37 Inprocess functions by default now log what output files they are running for.
On -run cleaning up .done and .fail files for jobs that will be run.
Added detection to Firehose YAML generator shell script for (g)awk versions that ignore "\n" in patterns.
Removed obsolete mergeText and splitIntervals shell scripts.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4452 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-07 19:08:02 +00:00
kshakir db47230dd9 Wrapping ScatterGatherableFunctions with a facade instead of using slower clone library. Will require keeping Clone's facade code in sync with CommandLineFunction but runs *much* faster.
Shell invoking scripts so that even really long shell scripts make it through LSF.
Using the truncated (up to 1000 characters) of the command line for the job name for use with bjobs.
Switched the default from re-running everything to re-running only files that need to be regenerated.  --skip_up_to_date replaced with --start_clean for those who want to regenerate everything.
Updated logging to let users know when the scatter gather generator is running, which still takes a while but is orders of magnatudes faster for large lists of functions.  (40s for a 100 function graph exploding to a 2500 function graph)


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4448 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-07 01:19:18 +00:00
kshakir ca5db821ce Added the ability to Queue to run scala functions inside the JVM. NOTE: Extend from InProcessFunction instead of CommandLineFunction to use this functionality.
Queue now submits new LSF jobs only after previous functions have completed successfully.
When the Queue process is shutdown (ex: via Control-C) sends a bkill command for any running jobs.
Ported commands like creating directories and scatter/gather interval list to scala functions.
Updates to LSF status tracking by porting the python to internally generated bash scripts.
Temporarily disabled job name submission to LSF.  Plus side is that the full command is now available in "bjobs -w".  TODO: Put back jobName passing to LSF based on an option?
Changed BaseTest to allow scala to access paths to references.
Changed the extension generator to default the analysis name to the walker "name".

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4442 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-06 18:29:56 +00:00
kshakir bb44044ce0 Fixed re-builds of queue so that previously compiled classes are included. Fixes redundant case of "ant queue test" vs. "ant test".
Refactored temp directory utils.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4426 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-04 21:12:07 +00:00
kiran 145fb0df8b Changed the wait job's dispatch queue from short (which doesn't exist anymore) to hour
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4354 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-25 23:36:49 +00:00
kshakir 0cc48d46ec Escaping quotes in dot files.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4344 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-24 17:13:12 +00:00
chartl c355afc320 Queue now does job tracking (replace -run with -status in the command line). Produces output that looks like:
INFO  20:58:17,827 QCommandLine - Checking pipeline status 
INFO  20:58:23,234 QGraph$$anonfun$formatStatus$1 - Height_Hirschhorn_NHGRI.uncleaned_MergeIndels [DONE] 
INFO  20:58:23,236 QGraph$$anonfun$formatStatus$1 - IndelGenotyper_158.bam [DONE] 5t/5d/0r/0p/0f 
INFO  20:58:23,237 QGraph$$anonfun$formatStatus$1 - IndelGenotyper_929.bam [DONE] 5t/5d/0r/0p/0f 
INFO  20:58:23,238 QGraph$$anonfun$formatStatus$1 - Height_Hirschhorn_NHGRI.uncleaned_SNP_calls [NOT DONE] 5t/0d/0r/5p/0f 
INFO  20:58:23,239 QGraph$$anonfun$formatStatus$1 - Height_Hirschhorn_NHGRI.uncleaned_HandFilter [NOT DONE] 
INFO  20:58:23,240 QGraph$$anonfun$formatStatus$1 - IndelGenotyper_1122.bam [DONE] 5t/5d/0r/0p/0f 
INFO  20:58:23,240 QGraph$$anonfun$formatStatus$1 - Height_Hirschhorn_NHGRI.uncleaned_VariantRecalibrator [NOT DONE] 
INFO  20:58:23,241 QGraph$$anonfun$formatStatus$1 - IndelGenotyper_913.bam [DONE] 5t/5d/0r/0p/0f 
INFO  20:58:23,242 QGraph$$anonfun$formatStatus$1 - IndelGenotyper_2037.bam [DONE] 5t/5d/0r/0p/0f 
INFO  20:58:23,243 QGraph$$anonfun$formatStatus$1 - Height_Hirschhorn_NHGRI.uncleaned_VariantEval [NOT DONE] 
INFO  20:58:23,244 QGraph$$anonfun$formatStatus$1 - Height_Hirschhorn_NHGRI.uncleaned_Cluster [NOT DONE] 
INFO  20:58:23,245 QGraph$$anonfun$formatStatus$1 - IndelGenotyper_106.bam [DONE] 5t/5d/0r/0p/0f 
INFO  20:58:23,246 QGraph$$anonfun$formatStatus$1 - Height_Hirschhorn_NHGRI.uncleaned_Cluster_and_Indel_filter [NOT DONE] 
INFO  20:58:23,247 QGraph$$anonfun$formatStatus$1 - Height_Hirschhorn_NHGRI.uncleaned_ApplyVariantCuts [NOT DONE] 
INFO  20:58:23,248 QGraph$$anonfun$formatStatus$1 - Height_Hirschhorn_NHGRI.uncleaned_GenomicAnnotator [NOT DONE] 
INFO  20:58:23,248 QGraph$$anonfun$formatStatus$1 - IndelGenotyper_1713.bam [DONE] 5t/5d/0r/0p/0f 




git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4340 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-24 00:59:09 +00:00
kshakir f9707bb7bf Fix for Matt: For Mac OS 10.6 temporary directories replace paths like '/var/folders/Ax/AxRUoz51Fh05fVe-j6C1Wk+++TI/-Tmp-/' with '/tmp/' so that google reflections 0.95RC2 still works on classes in the directory.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4316 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-20 22:02:07 +00:00
chartl b24172c80f Queue now utilizes .[file].done to allow skipping of previous jobs, if they have been completed. This is, unfortunately, reliant on a python script to do the post-execution touching of .done files.
That is to say, proper resumability is live (but not extensively tested)



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4312 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-20 00:16:53 +00:00
kshakir bf69b5fa21 "!=" != "=="
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4297 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-16 19:15:37 +00:00
depristo 81c82ce134 Fix for Queue
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4268 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-12 15:18:08 +00:00
kshakir fd5970fdd4 At chartl's superb suggestion, command line files are now all Files instead of old method of sometimes "has a File". Should be easier when reassigning them.
No longer generating deprecated GATK arguments on the Queue extensions.
Emitting deprecation warnings to Queue compile to help debugging issues.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4195 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-02 21:30:48 +00:00
kshakir 78946c4ffd Allowing the Queue to run the GATK via -cp instead of only from -jar.
Added an example of using a walker with Queue and a custom -classpath.
Removed an unused import statement in NamedFileWrapper.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4143 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 16:25:59 +00:00
kshakir 0105e8d063 Updated Queue GATK generation to reflect -B and -I changes.
To add support for "-I:tumor tumor.bam", the GATK argument
import_file (-I) is now generated as a List of NamedFile objects.
Could not get sugar working 100%.  To activate sugar import the
gatk package.  This effectively adds a new method to java.io.File
called toNamedFile.  When adding a file to the list call
  countReads.import_file :+= myJavaFile.toNamedFile
See scala/qscript/examples for actual examples.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4122 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-25 22:17:36 +00:00
kshakir 4eff69d95e Back to using the LSF job name during dry runs since when the real job ids weren't available '-w(null)' wasn't too informative.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4107 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-25 15:18:02 +00:00
hanna 3dc78855fd Command-line argument tagging is in, and the ROD system is hacked slightly to support the new syntax
(-B:name,type file) as well as the old syntax.  Also, a bonus feature: BAMs can now be tagged at the
command-line, which should allow us to get rid of some of the hackier calls in GenomeAnalysisEngine.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4105 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-25 03:47:57 +00:00
kshakir 3aedd0055e Updated firehose clean bam pipeline to pull firehose info and push back firehose clean bam.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4088 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 20:38:42 +00:00
kshakir 51678d48e4 Using job ids instead of job names for LSF dependency tracking.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4071 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-19 23:42:06 +00:00
kshakir 88ca1fb22c Lazy loading reflections so Queue can hack the classpath before the PluginManager looks for classes.
Removed extra quotes from 'cd' pre-exec command.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4067 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-19 20:29:52 +00:00
aaron 3dc4d3c3a9 removing the custom reflections library from the libs, and adding a release version. Hopefully this will fix the problem Menachem has been seeing with random JVM crashes. Also
removed the auto-deletion of the reflections jar, and removed the very old OmniPlan document we had checked-in.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4056 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-19 00:42:37 +00:00
kshakir 618c69f8dc More updates to the CleanBamFile pipeline.
Added the a CommandLineFunction.jobDependencies that will explicitly force a function to wait for a file, even if the value isn't otherwise listed on an @Input.
More bug fixes and refactoring of functions.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4048 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-17 14:59:42 +00:00
kshakir 307c8ca027 Created a new playground script for cleaning bams in Firehose.
Some refactoring of Queue extensions for reusability in scripts.
Putting the extensions into the Queue.jar after building them.
More updates to GATK walker arguments specifying @Input and @Output for Queue.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4032 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-13 23:52:24 +00:00
kshakir 8e46d5de04 Printing to INFO where to find the job output files.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4029 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-13 19:26:53 +00:00
kshakir 542d394e09 Cleaning up Queue debugging output.
-l DEBUG with local programs now prints out the stdout/stderr of the programs as they are run.
More documentation in the examples with a new even simpler CountReads example.
Took out unused option to build Queue GATK extensions separately.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4025 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-13 15:54:08 +00:00
kshakir f39dce1082 Exposed CommandLineFunction defaults to the Queue.jar command line (see -help).
Added ability to skip up-to-date jobs where the outputs are older than the inputs.
Changed -T CountDuplicates --quiet to --quietLocus so that Queue GATK extensions can use both short and full argument names.
Short names can be used to set values on Queue GATK extensions, for example: vf.XL :+= myFile
Moved Hidden from the GATK to StingUtils.
Updated ivy from 2.0.0 to 2.2.0-rc1 to fix sha1 issue: http://bit.ly/aX72w7
Added Queue to javadoc and testing build targets.
Added first Queue unit test.
Another pass at avoiding cycles in the DAG thanks to all function I/O being files.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4017 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-11 21:58:26 +00:00
kshakir 4f51a02dea Changed logging level to default at INFO instead of WARN.
Changes to StingUtils command line for use in Queue, replacing Queue's use of property files.
Updates to walkers used in existing QScripts to add @Input/@Output.
RMD used in @Required/@Allows now has a new default equal to "any" type.
New QueueGATKExtensions.jar generator for auto wrapping walkers as Queue CommandLineFunctions.
Added hooks to modify the functions that perform the Scattering and Gathering (setting their jar files, other arguments, etc.)
Removed dependency on BroadCore by porting LSF job submitter to scala.
Ivy now pulls down module dependencies from maven.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3984 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-09 16:42:48 +00:00
chartl 54d93f63d2 Hacky fix for LSF confusion -- submitted jobs check to see if their directory exists, despite depending on the job which creates said directory. Filter strings now have escaped quotes.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3903 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-29 21:35:50 +00:00
kshakir 735ef19dc8 Added option to sleep after creating temporary directories.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3900 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-29 19:53:17 +00:00
kshakir 82c37fceb5 Create intermediate directories and don't error if the directory already exists.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3899 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-29 19:45:12 +00:00
chartl 4d4cf6e1dc Updates to calling pipeline
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3896 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-29 18:37:20 +00:00
chartl f35e6d73b4 Actually name the class the name of the file. (Clearly created by cp)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3892 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-28 15:18:04 +00:00
chartl cd9395fa14 Since Picard's FixMateInformation merges, fixes mates, and sorts, allow it to be used as a gather function.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3891 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-28 15:09:19 +00:00
depristo b0fc42906e Better DOT support and updated recalibration pipeline
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3811 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-16 20:54:51 +00:00
depristo 81eef0d993 DOT visualization with Queue. More sophisticated recalibation queue script with scatter/gather
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3799 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-15 22:32:48 +00:00
depristo 6bf5df4eb5 Better merge command
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3797 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-15 17:02:47 +00:00
kshakir 1d399aa2f3 Added a temporary gatkLoggingLevel field to the soon to be obsolete GatkFunction while finishing up the delayed generic gatk walker utility.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3757 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-11 03:27:32 +00:00
kshakir 7be8c35eb2 Workaround for scala trait erasing parameterized types:
- Requiring explicit @ClassType on parameterized fields in traits.
- Scatter / Gather functions are now abstract classes since @ClassType can't be used on parameterized fields with type parameters.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3726 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-07 03:15:10 +00:00
kshakir a46e22ed13 Refactored ArgumentDefinition to absorb functionality from ArgumentDefinition and ArgumentTypeDescriptor.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3689 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-30 18:55:57 +00:00
kshakir 178cf64a0c Refactored ArgumentDefinition to absorb functionality from ArgumentDefinition and ArgumentTypeDescriptor.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3688 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-30 18:37:58 +00:00
kshakir dce2c17404 Added "-bsubWait" where Queue waits for all the jobs to exit before exiting.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3661 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-28 19:52:17 +00:00