Updated a call to swapExt to specify the directory.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4586 348d0f76-0448-11de-a6fe-93d51630548a
- Forcing user to set the temp directory via -Djava.io.tmpdir to avoid filling up /tmp.
- By default deleting job outputs tagged as intermediate.
- Defaulting pipeline to scatter count 1 (no reads deleted).
- Cleaning up temp classes even when scripting fails.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4573 348d0f76-0448-11de-a6fe-93d51630548a
When the cleaner interval scatter count is set to one explicitly setting the intrevals to Nil.
TODO: Need to add an option that lets the user choose from the command line to scatter all contigs or just those in the intervals list. For now can get relatively the same behavior by setting the interval scatter count equal to the number of contigs+1, assuming the random contigs come at the end of the sequence dictionary.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4565 348d0f76-0448-11de-a6fe-93d51630548a
Added a brute force -retry <count> option to Queue for transient errors.
Waiting up to 2 minutes for the LSF logs to appear before trying to display the errors from the logs.
Updates to the local job runner error logging when a job fails.
Refactored QGraph's settings as duplicate code was getting out of control.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4563 348d0f76-0448-11de-a6fe-93d51630548a
- More cleanup including removing the temporary classes and intermediate error files. Quieting any errors using Apache Commons IO 2.0.
- Counting the contigs during the QScript generation instead of the end user having to pass a separate contig interval list.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4539 348d0f76-0448-11de-a6fe-93d51630548a
Queue GATK generated .intervals is now a List(File) again removing special case handling in the generator.
Instead of using @Scatter annotation, using ScatterFunction instance to determine if a job can be scattered.
Implemented special VcfGatherFunction which only uses the header from the first file, even if the other files differ in their headers.
Added a -deleteIntermediates to Queue to delete the outputs from intermediate commands after a successful run.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4536 348d0f76-0448-11de-a6fe-93d51630548a
Re-logging the failed jobs and the path to their log files at the end of a run.
Added a parameter -bigMemQueue for the fullCallingPipeline.q instead of hardcoding gsa (gsa was backed up and it was actually faster to run on week).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4520 348d0f76-0448-11de-a6fe-93d51630548a
Modified - SelectVariants: Hook up to VariantContextUtils to recalculate AC/AF/AN, which uses the accessor in VariantContext to do this. Somehow sites that were selected down to hom-ref genotypes only wound up getting positive AC.
**IMPORTANT** I kind of need input here. The header of a file used for an integration test specifies AC as being an integer. Recalculating it casts it into an integer list (which it should be, as it allows for alternate alleles). However this appears to clash with what the jexl expression is looking for? For now, the integration test itself needed to be changed -- it's unclear what to do when the header specifies AC of being one class, but recalculating it casts to another class, and I'm not sure what to do.
I'm committing my omni_qc pipeline because I'm almost certain 2 months down the road I'm going to wonder what the heck I did to generate my results.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4511 348d0f76-0448-11de-a6fe-93d51630548a
Updated pipeline output structure to current recommendations by Corin.
Directories are now automatically before the function runs.
Fixed several bugs with scatter gather binding when the script author needs to change the directories.
Fixed bug with tracking of log files for CloneFunctions.
More error handling and logging of exceptions (good test environment while LSF was down this early AM!)
Removed cleanup utility for scatter gather. SG Output structure has changed significantly. Will need to discuss and find a better approach for Queue programatically deleting files.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4504 348d0f76-0448-11de-a6fe-93d51630548a
- ProduceBeagleInputWalker
+ Now takes a validation ROD and a prior to give it, will use those genotypes in place of the variant genotypes if both are present
+ Takes a bootstrap argument -- can use some given %age of the validation sites
+ Optionally takes a bootstrap output argument -- re-prints the validation VCF, filtering those sites used as part of the bootstrap
-BeagleOutputToVCFWalker
+ Now filters sites where the genotypes have been reverted to hom ref
+ Now calls in to the new VCUtils to calculate AC/AN
-Queue
+ New pipeline libraries for easy qscript creation, still a work in progress, but this is a considerable prototype
+ full calling pipeline v2 uses the above libraries
+ minor changes to some of my own scripts
+ no more need for contig interval lists, these will be parsed out of your normal interval list when it is provided
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4459 348d0f76-0448-11de-a6fe-93d51630548a
Queue now submits new LSF jobs only after previous functions have completed successfully.
When the Queue process is shutdown (ex: via Control-C) sends a bkill command for any running jobs.
Ported commands like creating directories and scatter/gather interval list to scala functions.
Updates to LSF status tracking by porting the python to internally generated bash scripts.
Temporarily disabled job name submission to LSF. Plus side is that the full command is now available in "bjobs -w". TODO: Put back jobName passing to LSF based on an option?
Changed BaseTest to allow scala to access paths to references.
Changed the extension generator to default the analysis name to the walker "name".
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4442 348d0f76-0448-11de-a6fe-93d51630548a
This will be librarized soon; but if you need to do something like this, feel free to cannibalize.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4387 348d0f76-0448-11de-a6fe-93d51630548a
Added a pipeline java bean and YAML utility to serialize java beans.
Added a getFirehosePipelineYaml.sh that can pull firehose data into the pipeline yaml file format.
Updated the fullCallingPipeline.q to begin using the pipeline yaml file format for bams and reference.
More changes to come as this code gets tested out in the fullCallingPipeline.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4329 348d0f76-0448-11de-a6fe-93d51630548a
That is to say, proper resumability is live (but not extensively tested)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4312 348d0f76-0448-11de-a6fe-93d51630548a
No longer generating deprecated GATK arguments on the Queue extensions.
Emitting deprecation warnings to Queue compile to help debugging issues.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4195 348d0f76-0448-11de-a6fe-93d51630548a
Added an example of using a walker with Queue and a custom -classpath.
Removed an unused import statement in NamedFileWrapper.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4143 348d0f76-0448-11de-a6fe-93d51630548a
To add support for "-I:tumor tumor.bam", the GATK argument
import_file (-I) is now generated as a List of NamedFile objects.
Could not get sugar working 100%. To activate sugar import the
gatk package. This effectively adds a new method to java.io.File
called toNamedFile. When adding a file to the list call
countReads.import_file :+= myJavaFile.toNamedFile
See scala/qscript/examples for actual examples.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4122 348d0f76-0448-11de-a6fe-93d51630548a
Added the a CommandLineFunction.jobDependencies that will explicitly force a function to wait for a file, even if the value isn't otherwise listed on an @Input.
More bug fixes and refactoring of functions.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4048 348d0f76-0448-11de-a6fe-93d51630548a