gatk-3.8

Commit Graph

Author	SHA1	Message	Date
kshakir	7157cb9090	While bkill'ing on the shutdown thread Queue will no longer try to submit more jobs on the original thread. Updated pipeline output structure to current recommendations by Corin. Directories are now automatically before the function runs. Fixed several bugs with scatter gather binding when the script author needs to change the directories. Fixed bug with tracking of log files for CloneFunctions. More error handling and logging of exceptions (good test environment while LSF was down this early AM!) Removed cleanup utility for scatter gather. SG Output structure has changed significantly. Will need to discuss and find a better approach for Queue programatically deleting files. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4504 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-15 17:01:36 +00:00
corin	5e0c4ecc21	Added DbSnp to VariantEval git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4497 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-14 17:02:17 +00:00
kshakir	63e3848187	Added status email support with -statusTo. Will send emails on failure of an individual function or success/failure of the whole pipeline. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4496 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-14 15:58:52 +00:00
kshakir	5034ca18dc	...and forgot to sync up the changes to CommandLineFunction with CloneFunction. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4492 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-13 22:40:02 +00:00
kshakir	5ee12875fb	Emergency fix for Ryan: - Catching errors when LSF fails and retrying. - When LSF retries fail, catching the error, marking the job as failed, and no longer bkilling everything by exiting Queue. - Caching function fields by class instead of each instance of a function saving a list of its fields. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4490 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-13 22:22:01 +00:00
chartl	6368a46bab	Scala protected is more akin to Java private than Java protected. Not typing these defs. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4470 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-08 19:36:23 +00:00
chartl	bffb8bb01f	The SVN repository is not for dumb analysis-specific scripts. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4460 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-08 14:04:53 +00:00
chartl	21ec44339d	Somewhat major update. Changes: - ProduceBeagleInputWalker + Now takes a validation ROD and a prior to give it, will use those genotypes in place of the variant genotypes if both are present + Takes a bootstrap argument -- can use some given %age of the validation sites + Optionally takes a bootstrap output argument -- re-prints the validation VCF, filtering those sites used as part of the bootstrap -BeagleOutputToVCFWalker + Now filters sites where the genotypes have been reverted to hom ref + Now calls in to the new VCUtils to calculate AC/AN -Queue + New pipeline libraries for easy qscript creation, still a work in progress, but this is a considerable prototype + full calling pipeline v2 uses the above libraries + minor changes to some of my own scripts + no more need for contig interval lists, these will be parsed out of your normal interval list when it is provided git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4459 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-08 13:30:28 +00:00
kshakir	e02f837659	Added the ability for Queue functions like mkdirs to override if they are done or not. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4458 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-08 06:39:55 +00:00
kshakir	7f25019f37	Inprocess functions by default now log what output files they are running for. On -run cleaning up .done and .fail files for jobs that will be run. Added detection to Firehose YAML generator shell script for (g)awk versions that ignore "\n" in patterns. Removed obsolete mergeText and splitIntervals shell scripts. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4452 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-07 19:08:02 +00:00
kshakir	db47230dd9	Wrapping ScatterGatherableFunctions with a facade instead of using slower clone library. Will require keeping Clone's facade code in sync with CommandLineFunction but runs much faster. Shell invoking scripts so that even really long shell scripts make it through LSF. Using the truncated (up to 1000 characters) of the command line for the job name for use with bjobs. Switched the default from re-running everything to re-running only files that need to be regenerated. --skip_up_to_date replaced with --start_clean for those who want to regenerate everything. Updated logging to let users know when the scatter gather generator is running, which still takes a while but is orders of magnatudes faster for large lists of functions. (40s for a 100 function graph exploding to a 2500 function graph) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4448 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-07 01:19:18 +00:00
kshakir	ca5db821ce	Added the ability to Queue to run scala functions inside the JVM. NOTE: Extend from InProcessFunction instead of CommandLineFunction to use this functionality. Queue now submits new LSF jobs only after previous functions have completed successfully. When the Queue process is shutdown (ex: via Control-C) sends a bkill command for any running jobs. Ported commands like creating directories and scatter/gather interval list to scala functions. Updates to LSF status tracking by porting the python to internally generated bash scripts. Temporarily disabled job name submission to LSF. Plus side is that the full command is now available in "bjobs -w". TODO: Put back jobName passing to LSF based on an option? Changed BaseTest to allow scala to access paths to references. Changed the extension generator to default the analysis name to the walker "name". git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4442 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-06 18:29:56 +00:00
chartl	28ac1d325e	Commit for Ryan git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4433 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-05 19:04:10 +00:00
corin	e340be34d8	upping mem limit since something was unhappy with the lower limit git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4427 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-05 02:38:17 +00:00
kshakir	bb44044ce0	Fixed re-builds of queue so that previously compiled classes are included. Fixes redundant case of "ant queue test" vs. "ant test". Refactored temp directory utils. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4426 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-04 21:12:07 +00:00
chartl	7639692e5b	Sigh. Fix the source of even more UserErrors in the phone home directory: make sure to gunzip the beagle files before passing them into the conversion walker... git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4399 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-01 03:28:36 +00:00
chartl	f34b4c6b82	Be smarter if the beagle output is set such that getParent() returns null. Up the memory limit. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4389 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-30 12:48:47 +00:00
chartl	0142047da9	And a bugfix 3 seconds later. Don't tell java to use up to 20g while telling the farm to kill the job if it tries to exceed 4g. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4388 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-30 02:08:47 +00:00
chartl	06970ae039	A qscript that refines genotypes with beagle and merges them into one vcf (running currently on the recent chr20 production calls). This will be librarized soon; but if you need to do something like this, feel free to cannibalize. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4387 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-30 02:05:30 +00:00
chartl	2708e83198	For show (Queue works nicely): An analysis script that runs QC for the omni chip git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4380 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-29 15:04:17 +00:00
kiran	51fdf9d701	Default memory limit is now 4g (apparently necessary when testing on full 100-sample Autism_Daly dataset) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4359 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-27 05:43:08 +00:00
kiran	bcc09f5d8c	Simplifications: removed command-line arguments to control SNP cluster filter parameters. Infer the number of contigs to scatter indel cleaning from the contig list (which we should get rid of too). Changed the PY argument to just Y for specifying the path to the YAML file. Cleaned up command-line argument documentation. See http://iwww.broadinstitute.org/gsa/wiki/index.php/Queue-based_pipeline for a list of remaining issues. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4356 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-26 22:50:30 +00:00
kiran	9820a12fa5	Removed unnecessary dbSNP big-table dependency. Ti/Tv is now required. Consistent downsampling level for all programs. Spelling corrections. VariantEval now generates R-style output. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4355 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-26 16:55:58 +00:00
kiran	145fb0df8b	Changed the wait job's dispatch queue from short (which doesn't exist anymore) to hour git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4354 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-25 23:36:49 +00:00
kiran	9bfbc3b784	Commented out changes to ADPR and VariantEval modules that are causing this script to not compile. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4353 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-25 15:12:10 +00:00
corin	3ec0e09edd	ADPR is now included in the full calling pipeline. The most up to date version of the ADPR is about to be committed and should be used with the script for now. The qscript now calls for two additional strings as inputs: the sequencing machines used and the sequencing protocol. In order for ADPR to finish successfully, a squid file for both the lane and sample level data needs to be produced, reformatted and named <projectBase>_lanes.txt or <projectBase>_samps.txt, respectively. These files need to be in the working directory. When database access is ready, this and the protocol and sequencer parameters of the r script will go away. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4345 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-24 19:28:43 +00:00
kshakir	0cc48d46ec	Escaping quotes in dot files. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4344 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-24 17:13:12 +00:00
kshakir	67bcf3a7e4	Fixed VariantEval rod binding names. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4342 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-24 14:52:51 +00:00
chartl	c355afc320	Queue now does job tracking (replace -run with -status in the command line). Produces output that looks like: INFO 20:58:17,827 QCommandLine - Checking pipeline status INFO 20:58:23,234 QGraph$$anonfun$formatStatus$1 - Height_Hirschhorn_NHGRI.uncleaned_MergeIndels [DONE] INFO 20:58:23,236 QGraph$$anonfun$formatStatus$1 - IndelGenotyper_158.bam [DONE] 5t/5d/0r/0p/0f INFO 20:58:23,237 QGraph$$anonfun$formatStatus$1 - IndelGenotyper_929.bam [DONE] 5t/5d/0r/0p/0f INFO 20:58:23,238 QGraph$$anonfun$formatStatus$1 - Height_Hirschhorn_NHGRI.uncleaned_SNP_calls [NOT DONE] 5t/0d/0r/5p/0f INFO 20:58:23,239 QGraph$$anonfun$formatStatus$1 - Height_Hirschhorn_NHGRI.uncleaned_HandFilter [NOT DONE] INFO 20:58:23,240 QGraph$$anonfun$formatStatus$1 - IndelGenotyper_1122.bam [DONE] 5t/5d/0r/0p/0f INFO 20:58:23,240 QGraph$$anonfun$formatStatus$1 - Height_Hirschhorn_NHGRI.uncleaned_VariantRecalibrator [NOT DONE] INFO 20:58:23,241 QGraph$$anonfun$formatStatus$1 - IndelGenotyper_913.bam [DONE] 5t/5d/0r/0p/0f INFO 20:58:23,242 QGraph$$anonfun$formatStatus$1 - IndelGenotyper_2037.bam [DONE] 5t/5d/0r/0p/0f INFO 20:58:23,243 QGraph$$anonfun$formatStatus$1 - Height_Hirschhorn_NHGRI.uncleaned_VariantEval [NOT DONE] INFO 20:58:23,244 QGraph$$anonfun$formatStatus$1 - Height_Hirschhorn_NHGRI.uncleaned_Cluster [NOT DONE] INFO 20:58:23,245 QGraph$$anonfun$formatStatus$1 - IndelGenotyper_106.bam [DONE] 5t/5d/0r/0p/0f INFO 20:58:23,246 QGraph$$anonfun$formatStatus$1 - Height_Hirschhorn_NHGRI.uncleaned_Cluster_and_Indel_filter [NOT DONE] INFO 20:58:23,247 QGraph$$anonfun$formatStatus$1 - Height_Hirschhorn_NHGRI.uncleaned_ApplyVariantCuts [NOT DONE] INFO 20:58:23,248 QGraph$$anonfun$formatStatus$1 - Height_Hirschhorn_NHGRI.uncleaned_GenomicAnnotator [NOT DONE] INFO 20:58:23,248 QGraph$$anonfun$formatStatus$1 - IndelGenotyper_1713.bam [DONE] 5t/5d/0r/0p/0f git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4340 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-24 00:59:09 +00:00
kshakir	20b38b38f3	Updated from SnakeYAML 1.6 to 1.7. Added a pipeline java bean and YAML utility to serialize java beans. Added a getFirehosePipelineYaml.sh that can pull firehose data into the pipeline yaml file format. Updated the fullCallingPipeline.q to begin using the pipeline yaml file format for bams and reference. More changes to come as this code gets tested out in the fullCallingPipeline. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4329 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-22 19:47:49 +00:00
chartl	6dec042288	Re-enabling indel cleaning, explicitly calling fix mates in the case where indel cleaning is not scatter/gathered git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4324 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-21 20:37:49 +00:00
kshakir	f9707bb7bf	Fix for Matt: For Mac OS 10.6 temporary directories replace paths like '/var/folders/Ax/AxRUoz51Fh05fVe-j6C1Wk+++TI/-Tmp-/' with '/tmp/' so that google reflections 0.95RC2 still works on classes in the directory. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4316 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-20 22:02:07 +00:00
chartl	b24172c80f	Queue now utilizes .[file].done to allow skipping of previous jobs, if they have been completed. This is, unfortunately, reliant on a python script to do the post-execution touching of .done files. That is to say, proper resumability is live (but not extensively tested) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4312 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-20 00:16:53 +00:00
chartl	6f6d2eb31f	Told people this worked...forgot to commit! -c git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4306 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-18 03:46:00 +00:00
kshakir	bf69b5fa21	"!=" != "==" git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4297 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-16 19:15:37 +00:00
chartl	c1720cc8f5	Now compiles. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4295 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-16 18:49:53 +00:00
chartl	c581bd2d84	Minor modifications to fCP git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4294 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-16 18:29:24 +00:00
depristo	81c82ce134	Fix for Queue git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4268 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-12 15:18:08 +00:00
depristo	3c5b8730d5	More Queue scripts for analysis git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4260 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-12 14:04:10 +00:00
kshakir	fd5970fdd4	At chartl's superb suggestion, command line files are now all Files instead of old method of sometimes "has a File". Should be easier when reassigning them. No longer generating deprecated GATK arguments on the Queue extensions. Emitting deprecation warnings to Queue compile to help debugging issues. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4195 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-02 21:30:48 +00:00
depristo	ca503e5801	Queue scripts for recalibration and running nSample UG jobs pre and dynamic merging git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4186 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-01 20:23:37 +00:00
chartl	5e710050d6	minor change, bamFiles comes from the input list, not the script git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4170 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-31 16:03:35 +00:00
chartl	1a14dbee1e	Adding in .bam indexing; commit for Khalid git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4169 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-31 15:21:41 +00:00
chartl	2ffa98aea5	Ugh! varout --> out git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4157 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-29 02:34:41 +00:00
chartl	d7edce31a2	Commit of fCP for Khalid git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4156 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-29 02:24:25 +00:00
chartl	576ae30df1	A version of the full calling pipeline queue script that fully compiles without String/File/NamedFile type exceptions (e.g. expected String but got NamedFile/Expected NamedFile but got File). Pipeline itself is under testing with 5 bam files. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4154 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-28 22:51:11 +00:00
chartl	c6441b585a	Actually hook up the new indel genotyper and merge analyses into DAG (aka "i forgot to add()") git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4149 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-27 18:00:50 +00:00
chartl	7908237b90	Full calling pipeline now calls indels through the indel genotyper, merges with combine variants, and filters on them. Since new genomic annotator is fast, it is no longer scatter-gathered. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4144 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-27 16:28:24 +00:00
kshakir	78946c4ffd	Allowing the Queue to run the GATK via -cp instead of only from -jar. Added an example of using a walker with Queue and a custom -classpath. Removed an unused import statement in NamedFileWrapper. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4143 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-27 16:25:59 +00:00
kshakir	0105e8d063	Updated Queue GATK generation to reflect -B and -I changes. To add support for "-I:tumor tumor.bam", the GATK argument import_file (-I) is now generated as a List of NamedFile objects. Could not get sugar working 100%. To activate sugar import the gatk package. This effectively adds a new method to java.io.File called toNamedFile. When adding a file to the list call countReads.import_file :+= myJavaFile.toNamedFile See scala/qscript/examples for actual examples. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4122 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-25 22:17:36 +00:00

1 2 3

107 Commits (8b2d387643604e0e96a042de2bd1c9c287cabd5e)