kshakir
b954a5a4d5
- After removing special code for intervals, instead of being of type File they are generated as List[File]. Changed previous checkin that was appending to this list and instead assigning a singleton list.
...
- More cleanup including removing the temporary classes and intermediate error files. Quieting any errors using Apache Commons IO 2.0.
- Counting the contigs during the QScript generation instead of the end user having to pass a separate contig interval list.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4539 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-21 06:37:28 +00:00
kshakir
88a0d77433
Changed parsing engine to store the order the argument bindings based on their definition in the class, moving "-T" to the front of Queue command lines.
...
Queue GATK generated .intervals is now a List(File) again removing special case handling in the generator.
Instead of using @Scatter annotation, using ScatterFunction instance to determine if a job can be scattered.
Implemented special VcfGatherFunction which only uses the header from the first file, even if the other files differ in their headers.
Added a -deleteIntermediates to Queue to delete the outputs from intermediate commands after a successful run.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4536 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-20 21:43:52 +00:00
kshakir
81479229e1
QScript authors can now tag functions as intermediate. Functions tagged as intermediate will be skipped unless another function in the graph needs their output.
...
Re-logging the failed jobs and the path to their log files at the end of a run.
Added a parameter -bigMemQueue for the fullCallingPipeline.q instead of hardcoding gsa (gsa was backed up and it was actually faster to run on week).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4520 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-18 22:11:14 +00:00
kshakir
7157cb9090
While bkill'ing on the shutdown thread Queue will no longer try to submit more jobs on the original thread.
...
Updated pipeline output structure to current recommendations by Corin.
Directories are now automatically before the function runs.
Fixed several bugs with scatter gather binding when the script author needs to change the directories.
Fixed bug with tracking of log files for CloneFunctions.
More error handling and logging of exceptions (good test environment while LSF was down this early AM!)
Removed cleanup utility for scatter gather. SG Output structure has changed significantly. Will need to discuss and find a better approach for Queue programatically deleting files.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4504 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-15 17:01:36 +00:00
corin
5e0c4ecc21
Added DbSnp to VariantEval
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4497 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-14 17:02:17 +00:00
corin
e340be34d8
upping mem limit since something was unhappy with the lower limit
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4427 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-05 02:38:17 +00:00
kiran
51fdf9d701
Default memory limit is now 4g (apparently necessary when testing on full 100-sample Autism_Daly dataset)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4359 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-27 05:43:08 +00:00
kiran
bcc09f5d8c
Simplifications: removed command-line arguments to control SNP cluster filter parameters. Infer the number of contigs to scatter indel cleaning from the contig list (which we should get rid of too). Changed the PY argument to just Y for specifying the path to the YAML file. Cleaned up command-line argument documentation. See http://iwww.broadinstitute.org/gsa/wiki/index.php/Queue-based_pipeline for a list of remaining issues.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4356 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-26 22:50:30 +00:00
kiran
9820a12fa5
Removed unnecessary dbSNP big-table dependency. Ti/Tv is now required. Consistent downsampling level for all programs. Spelling corrections. VariantEval now generates R-style output.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4355 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-26 16:55:58 +00:00
kiran
9bfbc3b784
Commented out changes to ADPR and VariantEval modules that are causing this script to not compile.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4353 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-25 15:12:10 +00:00
corin
3ec0e09edd
ADPR is now included in the full calling pipeline. The most up to date version of the ADPR is about to be committed and should be used with the script for now. The qscript now calls for two additional strings as inputs: the sequencing machines used and the sequencing protocol. In order for ADPR to finish successfully, a squid file for both the lane and sample level data needs to be produced, reformatted and named <projectBase>_lanes.txt or <projectBase>_samps.txt, respectively. These files need to be in the working directory. When database access is ready, this and the protocol and sequencer parameters of the r script will go away.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4345 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-24 19:28:43 +00:00
chartl
c355afc320
Queue now does job tracking (replace -run with -status in the command line). Produces output that looks like:
...
INFO 20:58:17,827 QCommandLine - Checking pipeline status
INFO 20:58:23,234 QGraph$$anonfun$formatStatus$1 - Height_Hirschhorn_NHGRI.uncleaned_MergeIndels [DONE]
INFO 20:58:23,236 QGraph$$anonfun$formatStatus$1 - IndelGenotyper_158.bam [DONE] 5t/5d/0r/0p/0f
INFO 20:58:23,237 QGraph$$anonfun$formatStatus$1 - IndelGenotyper_929.bam [DONE] 5t/5d/0r/0p/0f
INFO 20:58:23,238 QGraph$$anonfun$formatStatus$1 - Height_Hirschhorn_NHGRI.uncleaned_SNP_calls [NOT DONE] 5t/0d/0r/5p/0f
INFO 20:58:23,239 QGraph$$anonfun$formatStatus$1 - Height_Hirschhorn_NHGRI.uncleaned_HandFilter [NOT DONE]
INFO 20:58:23,240 QGraph$$anonfun$formatStatus$1 - IndelGenotyper_1122.bam [DONE] 5t/5d/0r/0p/0f
INFO 20:58:23,240 QGraph$$anonfun$formatStatus$1 - Height_Hirschhorn_NHGRI.uncleaned_VariantRecalibrator [NOT DONE]
INFO 20:58:23,241 QGraph$$anonfun$formatStatus$1 - IndelGenotyper_913.bam [DONE] 5t/5d/0r/0p/0f
INFO 20:58:23,242 QGraph$$anonfun$formatStatus$1 - IndelGenotyper_2037.bam [DONE] 5t/5d/0r/0p/0f
INFO 20:58:23,243 QGraph$$anonfun$formatStatus$1 - Height_Hirschhorn_NHGRI.uncleaned_VariantEval [NOT DONE]
INFO 20:58:23,244 QGraph$$anonfun$formatStatus$1 - Height_Hirschhorn_NHGRI.uncleaned_Cluster [NOT DONE]
INFO 20:58:23,245 QGraph$$anonfun$formatStatus$1 - IndelGenotyper_106.bam [DONE] 5t/5d/0r/0p/0f
INFO 20:58:23,246 QGraph$$anonfun$formatStatus$1 - Height_Hirschhorn_NHGRI.uncleaned_Cluster_and_Indel_filter [NOT DONE]
INFO 20:58:23,247 QGraph$$anonfun$formatStatus$1 - Height_Hirschhorn_NHGRI.uncleaned_ApplyVariantCuts [NOT DONE]
INFO 20:58:23,248 QGraph$$anonfun$formatStatus$1 - Height_Hirschhorn_NHGRI.uncleaned_GenomicAnnotator [NOT DONE]
INFO 20:58:23,248 QGraph$$anonfun$formatStatus$1 - IndelGenotyper_1713.bam [DONE] 5t/5d/0r/0p/0f
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4340 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-24 00:59:09 +00:00
kshakir
20b38b38f3
Updated from SnakeYAML 1.6 to 1.7.
...
Added a pipeline java bean and YAML utility to serialize java beans.
Added a getFirehosePipelineYaml.sh that can pull firehose data into the pipeline yaml file format.
Updated the fullCallingPipeline.q to begin using the pipeline yaml file format for bams and reference.
More changes to come as this code gets tested out in the fullCallingPipeline.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4329 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-22 19:47:49 +00:00
chartl
6dec042288
Re-enabling indel cleaning, explicitly calling fix mates in the case where indel cleaning is not scatter/gathered
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4324 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-21 20:37:49 +00:00
chartl
b24172c80f
Queue now utilizes .[file].done to allow skipping of previous jobs, if they have been completed. This is, unfortunately, reliant on a python script to do the post-execution touching of .done files.
...
That is to say, proper resumability is live (but not extensively tested)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4312 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-20 00:16:53 +00:00
chartl
6f6d2eb31f
Told people this worked...forgot to commit!
...
-c
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4306 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-18 03:46:00 +00:00
chartl
c1720cc8f5
Now compiles.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4295 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-16 18:49:53 +00:00
chartl
c581bd2d84
Minor modifications to fCP
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4294 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-16 18:29:24 +00:00
kshakir
fd5970fdd4
At chartl's superb suggestion, command line files are now all Files instead of old method of sometimes "has a File". Should be easier when reassigning them.
...
No longer generating deprecated GATK arguments on the Queue extensions.
Emitting deprecation warnings to Queue compile to help debugging issues.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4195 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-02 21:30:48 +00:00
chartl
5e710050d6
minor change, bamFiles comes from the input list, not the script
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4170 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-31 16:03:35 +00:00
chartl
1a14dbee1e
Adding in .bam indexing; commit for Khalid
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4169 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-31 15:21:41 +00:00
chartl
2ffa98aea5
Ugh! varout --> out
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4157 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-29 02:34:41 +00:00
chartl
d7edce31a2
Commit of fCP for Khalid
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4156 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-29 02:24:25 +00:00
chartl
576ae30df1
A version of the full calling pipeline queue script that fully compiles without String/File/NamedFile type exceptions (e.g. expected String but got NamedFile/Expected NamedFile but got File). Pipeline itself is under testing with 5 bam files.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4154 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-28 22:51:11 +00:00
chartl
c6441b585a
Actually hook up the new indel genotyper and merge analyses into DAG (aka "i forgot to add()")
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4149 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 18:00:50 +00:00
chartl
7908237b90
Full calling pipeline now calls indels through the indel genotyper, merges with combine variants, and filters on them. Since new genomic annotator is fast, it is no longer scatter-gathered.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4144 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 16:28:24 +00:00
chartl
6eb1559c1d
End-to-end calling works again (changes to walker arguments, and changes to queue, affect its validity, so it often goes out-of-date before I try to use it again)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4116 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-25 18:52:44 +00:00
chartl
0028b884d8
Reformatting and tweaks to the end-to-end pipeline
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4066 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-19 20:29:48 +00:00
chartl
3a4977c75e
Re-add the 1KG trigger as a comp as well
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4045 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-16 18:19:47 +00:00
kshakir
4f51a02dea
Changed logging level to default at INFO instead of WARN.
...
Changes to StingUtils command line for use in Queue, replacing Queue's use of property files.
Updates to walkers used in existing QScripts to add @Input/@Output.
RMD used in @Required/@Allows now has a new default equal to "any" type.
New QueueGATKExtensions.jar generator for auto wrapping walkers as Queue CommandLineFunctions.
Added hooks to modify the functions that perform the Scattering and Gathering (setting their jar files, other arguments, etc.)
Removed dependency on BroadCore by porting LSF job submitter to scala.
Ivy now pulls down module dependencies from maven.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3984 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-09 16:42:48 +00:00
chartl
5815348ebc
Switch to newer version of comp tracks (and make the trigger track a comp as well). Indel cleaning should override the interval list and only use the contig interval list; and also force jobs to go to long.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3941 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-04 20:05:27 +00:00
chartl
9132c98eec
Slightly smarter interval list dealing (whole exome intervals are .interval_list, whole genome are .interval.list). Also use BTI with the Genomic Annotator
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3904 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-29 22:04:02 +00:00
chartl
54d93f63d2
Hacky fix for LSF confusion -- submitted jobs check to see if their directory exists, despite depending on the job which creates said directory. Filter strings now have escaped quotes.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3903 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-29 21:35:50 +00:00
chartl
0f9baa2e94
Ha ha ha ha ha
...
:(
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3902 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-29 20:48:35 +00:00
chartl
7a5ee485d2
Full pipeline now works through DAG creation. First draft; more work to do to make it cleaner and better command-line input handling (and properties handling); but the DAG is rendered and looks good.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3898 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-29 19:36:17 +00:00
chartl
4d4cf6e1dc
Updates to calling pipeline
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3896 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-29 18:37:20 +00:00
chartl
62a9217a61
A brute-force exome/genome independent end-to-end cleaning/calling pipeline using Queue
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3894 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-29 13:17:14 +00:00