chartl
0142047da9
And a bugfix 3 seconds later. Don't tell java to use up to 20g while telling the farm to kill the job if it tries to exceed 4g.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4388 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-30 02:08:47 +00:00
chartl
06970ae039
A qscript that refines genotypes with beagle and merges them into one vcf (running currently on the recent chr20 production calls).
...
This will be librarized soon; but if you need to do something like this, feel free to cannibalize.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4387 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-30 02:05:30 +00:00
chartl
2708e83198
For show (Queue works nicely): An analysis script that runs QC for the omni chip
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4380 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-29 15:04:17 +00:00
kiran
51fdf9d701
Default memory limit is now 4g (apparently necessary when testing on full 100-sample Autism_Daly dataset)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4359 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-27 05:43:08 +00:00
kiran
bcc09f5d8c
Simplifications: removed command-line arguments to control SNP cluster filter parameters. Infer the number of contigs to scatter indel cleaning from the contig list (which we should get rid of too). Changed the PY argument to just Y for specifying the path to the YAML file. Cleaned up command-line argument documentation. See http://iwww.broadinstitute.org/gsa/wiki/index.php/Queue-based_pipeline for a list of remaining issues.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4356 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-26 22:50:30 +00:00
kiran
9820a12fa5
Removed unnecessary dbSNP big-table dependency. Ti/Tv is now required. Consistent downsampling level for all programs. Spelling corrections. VariantEval now generates R-style output.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4355 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-26 16:55:58 +00:00
kiran
9bfbc3b784
Commented out changes to ADPR and VariantEval modules that are causing this script to not compile.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4353 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-25 15:12:10 +00:00
corin
3ec0e09edd
ADPR is now included in the full calling pipeline. The most up to date version of the ADPR is about to be committed and should be used with the script for now. The qscript now calls for two additional strings as inputs: the sequencing machines used and the sequencing protocol. In order for ADPR to finish successfully, a squid file for both the lane and sample level data needs to be produced, reformatted and named <projectBase>_lanes.txt or <projectBase>_samps.txt, respectively. These files need to be in the working directory. When database access is ready, this and the protocol and sequencer parameters of the r script will go away.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4345 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-24 19:28:43 +00:00
kshakir
67bcf3a7e4
Fixed VariantEval rod binding names.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4342 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-24 14:52:51 +00:00
chartl
c355afc320
Queue now does job tracking (replace -run with -status in the command line). Produces output that looks like:
...
INFO 20:58:17,827 QCommandLine - Checking pipeline status
INFO 20:58:23,234 QGraph$$anonfun$formatStatus$1 - Height_Hirschhorn_NHGRI.uncleaned_MergeIndels [DONE]
INFO 20:58:23,236 QGraph$$anonfun$formatStatus$1 - IndelGenotyper_158.bam [DONE] 5t/5d/0r/0p/0f
INFO 20:58:23,237 QGraph$$anonfun$formatStatus$1 - IndelGenotyper_929.bam [DONE] 5t/5d/0r/0p/0f
INFO 20:58:23,238 QGraph$$anonfun$formatStatus$1 - Height_Hirschhorn_NHGRI.uncleaned_SNP_calls [NOT DONE] 5t/0d/0r/5p/0f
INFO 20:58:23,239 QGraph$$anonfun$formatStatus$1 - Height_Hirschhorn_NHGRI.uncleaned_HandFilter [NOT DONE]
INFO 20:58:23,240 QGraph$$anonfun$formatStatus$1 - IndelGenotyper_1122.bam [DONE] 5t/5d/0r/0p/0f
INFO 20:58:23,240 QGraph$$anonfun$formatStatus$1 - Height_Hirschhorn_NHGRI.uncleaned_VariantRecalibrator [NOT DONE]
INFO 20:58:23,241 QGraph$$anonfun$formatStatus$1 - IndelGenotyper_913.bam [DONE] 5t/5d/0r/0p/0f
INFO 20:58:23,242 QGraph$$anonfun$formatStatus$1 - IndelGenotyper_2037.bam [DONE] 5t/5d/0r/0p/0f
INFO 20:58:23,243 QGraph$$anonfun$formatStatus$1 - Height_Hirschhorn_NHGRI.uncleaned_VariantEval [NOT DONE]
INFO 20:58:23,244 QGraph$$anonfun$formatStatus$1 - Height_Hirschhorn_NHGRI.uncleaned_Cluster [NOT DONE]
INFO 20:58:23,245 QGraph$$anonfun$formatStatus$1 - IndelGenotyper_106.bam [DONE] 5t/5d/0r/0p/0f
INFO 20:58:23,246 QGraph$$anonfun$formatStatus$1 - Height_Hirschhorn_NHGRI.uncleaned_Cluster_and_Indel_filter [NOT DONE]
INFO 20:58:23,247 QGraph$$anonfun$formatStatus$1 - Height_Hirschhorn_NHGRI.uncleaned_ApplyVariantCuts [NOT DONE]
INFO 20:58:23,248 QGraph$$anonfun$formatStatus$1 - Height_Hirschhorn_NHGRI.uncleaned_GenomicAnnotator [NOT DONE]
INFO 20:58:23,248 QGraph$$anonfun$formatStatus$1 - IndelGenotyper_1713.bam [DONE] 5t/5d/0r/0p/0f
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4340 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-24 00:59:09 +00:00
kshakir
20b38b38f3
Updated from SnakeYAML 1.6 to 1.7.
...
Added a pipeline java bean and YAML utility to serialize java beans.
Added a getFirehosePipelineYaml.sh that can pull firehose data into the pipeline yaml file format.
Updated the fullCallingPipeline.q to begin using the pipeline yaml file format for bams and reference.
More changes to come as this code gets tested out in the fullCallingPipeline.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4329 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-22 19:47:49 +00:00
chartl
6dec042288
Re-enabling indel cleaning, explicitly calling fix mates in the case where indel cleaning is not scatter/gathered
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4324 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-21 20:37:49 +00:00
chartl
b24172c80f
Queue now utilizes .[file].done to allow skipping of previous jobs, if they have been completed. This is, unfortunately, reliant on a python script to do the post-execution touching of .done files.
...
That is to say, proper resumability is live (but not extensively tested)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4312 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-20 00:16:53 +00:00
chartl
6f6d2eb31f
Told people this worked...forgot to commit!
...
-c
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4306 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-18 03:46:00 +00:00
chartl
c1720cc8f5
Now compiles.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4295 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-16 18:49:53 +00:00
chartl
c581bd2d84
Minor modifications to fCP
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4294 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-16 18:29:24 +00:00
depristo
3c5b8730d5
More Queue scripts for analysis
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4260 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-12 14:04:10 +00:00
kshakir
fd5970fdd4
At chartl's superb suggestion, command line files are now all Files instead of old method of sometimes "has a File". Should be easier when reassigning them.
...
No longer generating deprecated GATK arguments on the Queue extensions.
Emitting deprecation warnings to Queue compile to help debugging issues.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4195 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-02 21:30:48 +00:00
depristo
ca503e5801
Queue scripts for recalibration and running nSample UG jobs pre and dynamic merging
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4186 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-01 20:23:37 +00:00
chartl
5e710050d6
minor change, bamFiles comes from the input list, not the script
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4170 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-31 16:03:35 +00:00
chartl
1a14dbee1e
Adding in .bam indexing; commit for Khalid
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4169 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-31 15:21:41 +00:00
chartl
2ffa98aea5
Ugh! varout --> out
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4157 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-29 02:34:41 +00:00
chartl
d7edce31a2
Commit of fCP for Khalid
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4156 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-29 02:24:25 +00:00
chartl
576ae30df1
A version of the full calling pipeline queue script that fully compiles without String/File/NamedFile type exceptions (e.g. expected String but got NamedFile/Expected NamedFile but got File). Pipeline itself is under testing with 5 bam files.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4154 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-28 22:51:11 +00:00
chartl
c6441b585a
Actually hook up the new indel genotyper and merge analyses into DAG (aka "i forgot to add()")
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4149 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 18:00:50 +00:00
chartl
7908237b90
Full calling pipeline now calls indels through the indel genotyper, merges with combine variants, and filters on them. Since new genomic annotator is fast, it is no longer scatter-gathered.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4144 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 16:28:24 +00:00
kshakir
78946c4ffd
Allowing the Queue to run the GATK via -cp instead of only from -jar.
...
Added an example of using a walker with Queue and a custom -classpath.
Removed an unused import statement in NamedFileWrapper.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4143 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 16:25:59 +00:00
kshakir
0105e8d063
Updated Queue GATK generation to reflect -B and -I changes.
...
To add support for "-I:tumor tumor.bam", the GATK argument
import_file (-I) is now generated as a List of NamedFile objects.
Could not get sugar working 100%. To activate sugar import the
gatk package. This effectively adds a new method to java.io.File
called toNamedFile. When adding a file to the list call
countReads.import_file :+= myJavaFile.toNamedFile
See scala/qscript/examples for actual examples.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4122 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-25 22:17:36 +00:00
chartl
6eb1559c1d
End-to-end calling works again (changes to walker arguments, and changes to queue, affect its validity, so it often goes out-of-date before I try to use it again)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4116 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-25 18:52:44 +00:00
kshakir
3aedd0055e
Updated firehose clean bam pipeline to pull firehose info and push back firehose clean bam.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4088 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 20:38:42 +00:00
chartl
0028b884d8
Reformatting and tweaks to the end-to-end pipeline
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4066 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-19 20:29:48 +00:00
kshakir
618c69f8dc
More updates to the CleanBamFile pipeline.
...
Added the a CommandLineFunction.jobDependencies that will explicitly force a function to wait for a file, even if the value isn't otherwise listed on an @Input.
More bug fixes and refactoring of functions.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4048 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-17 14:59:42 +00:00
chartl
3a4977c75e
Re-add the 1KG trigger as a comp as well
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4045 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-16 18:19:47 +00:00
depristo
c85ab9db37
functional recalibrate script
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4034 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-14 16:01:37 +00:00
kshakir
307c8ca027
Created a new playground script for cleaning bams in Firehose.
...
Some refactoring of Queue extensions for reusability in scripts.
Putting the extensions into the Queue.jar after building them.
More updates to GATK walker arguments specifying @Input and @Output for Queue.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4032 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-13 23:52:24 +00:00
kshakir
542d394e09
Cleaning up Queue debugging output.
...
-l DEBUG with local programs now prints out the stdout/stderr of the programs as they are run.
More documentation in the examples with a new even simpler CountReads example.
Took out unused option to build Queue GATK extensions separately.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4025 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-13 15:54:08 +00:00
kshakir
f39dce1082
Exposed CommandLineFunction defaults to the Queue.jar command line (see -help).
...
Added ability to skip up-to-date jobs where the outputs are older than the inputs.
Changed -T CountDuplicates --quiet to --quietLocus so that Queue GATK extensions can use both short and full argument names.
Short names can be used to set values on Queue GATK extensions, for example: vf.XL :+= myFile
Moved Hidden from the GATK to StingUtils.
Updated ivy from 2.0.0 to 2.2.0-rc1 to fix sha1 issue: http://bit.ly/aX72w7
Added Queue to javadoc and testing build targets.
Added first Queue unit test.
Another pass at avoiding cycles in the DAG thanks to all function I/O being files.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4017 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-11 21:58:26 +00:00
depristo
cd2d051209
full path to Rscript
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3999 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-10 12:02:38 +00:00
depristo
9b432d0801
1kg script now works
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3998 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-10 12:01:18 +00:00
kshakir
4f51a02dea
Changed logging level to default at INFO instead of WARN.
...
Changes to StingUtils command line for use in Queue, replacing Queue's use of property files.
Updates to walkers used in existing QScripts to add @Input/@Output.
RMD used in @Required/@Allows now has a new default equal to "any" type.
New QueueGATKExtensions.jar generator for auto wrapping walkers as Queue CommandLineFunctions.
Added hooks to modify the functions that perform the Scattering and Gathering (setting their jar files, other arguments, etc.)
Removed dependency on BroadCore by porting LSF job submitter to scala.
Ivy now pulls down module dependencies from maven.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3984 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-09 16:42:48 +00:00
chartl
5815348ebc
Switch to newer version of comp tracks (and make the trigger track a comp as well). Indel cleaning should override the interval list and only use the contig interval list; and also force jobs to go to long.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3941 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-04 20:05:27 +00:00
chartl
9132c98eec
Slightly smarter interval list dealing (whole exome intervals are .interval_list, whole genome are .interval.list). Also use BTI with the Genomic Annotator
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3904 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-29 22:04:02 +00:00
chartl
54d93f63d2
Hacky fix for LSF confusion -- submitted jobs check to see if their directory exists, despite depending on the job which creates said directory. Filter strings now have escaped quotes.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3903 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-29 21:35:50 +00:00
chartl
0f9baa2e94
Ha ha ha ha ha
...
:(
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3902 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-29 20:48:35 +00:00
chartl
7a5ee485d2
Full pipeline now works through DAG creation. First draft; more work to do to make it cleaner and better command-line input handling (and properties handling); but the DAG is rendered and looks good.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3898 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-29 19:36:17 +00:00
chartl
4d4cf6e1dc
Updates to calling pipeline
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3896 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-29 18:37:20 +00:00
chartl
62a9217a61
A brute-force exome/genome independent end-to-end cleaning/calling pipeline using Queue
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3894 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-29 13:17:14 +00:00
depristo
25a27b78bc
1KG Table 1 counting pipeline. Useful example
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3819 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-17 22:30:56 +00:00
depristo
b0fc42906e
Better DOT support and updated recalibration pipeline
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3811 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-16 20:54:51 +00:00
depristo
81eef0d993
DOT visualization with Queue. More sophisticated recalibation queue script with scatter/gather
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3799 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-15 22:32:48 +00:00
depristo
530a320f28
Intermediate commit of scatter/gather recalibation pipeline
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3785 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-13 22:46:08 +00:00
kshakir
1d399aa2f3
Added a temporary gatkLoggingLevel field to the soon to be obsolete GatkFunction while finishing up the delayed generic gatk walker utility.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3757 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-11 03:27:32 +00:00
kshakir
7be8c35eb2
Workaround for scala trait erasing parameterized types:
...
- Requiring explicit @ClassType on parameterized fields in traits.
- Scatter / Gather functions are now abstract classes since @ClassType can't be used on parameterized fields with type parameters.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3726 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-07 03:15:10 +00:00
rpoplin
87470d5fe5
Checking in a simplistic VR qscript file for posterity's sake
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3705 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-01 18:53:17 +00:00
kshakir
894ad354fa
Fixed typo in the name of the shell directory.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3644 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-25 20:59:40 +00:00
kshakir
75c98c42b8
Started path of deprecation of Sting's @Argument by splitting the annotation into @Output and @Input. Anything that's not an @Output should be an @Input.
...
Checked in example qscripts that are basically todo integration tests.
Replaced use of queue @Input/@Output with Sting's new @Input/@Output. This means you'll now have to doc-ument the annotations.
More work on dependency resolution cycles being created in the graph during scatter/gather.
Filtering nulls to avoid NPE exceptions in scala's 'Collection'.hashCode.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3643 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-25 20:51:13 +00:00