gatk-3.8

Commit Graph

Author	SHA1	Message	Date
delangel	fa0c476b82	Script for calling indels in all phase 1 samples - VQSR part still needs work but raw calling is done git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5052 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-22 14:07:10 +00:00
carneiro	a0731eaa81	updated NA12878 Trio gold standard data. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5048 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-21 18:48:31 +00:00
depristo	94b64ec54a	Moving scala script into analysis directory git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5047 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-21 18:42:18 +00:00
depristo	b45566760e	intermediate checkin git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5045 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-21 18:39:25 +00:00
rpoplin	b6497c404f	Moving Phase1Calling qscript over to using the cleaned, pre-BAQed bams git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5039 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-21 02:41:20 +00:00
carneiro	fc73569d62	Added NA12878 Trio dataset to the pipeline. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5037 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-20 23:15:33 +00:00
kshakir	8855f080c2	For the fullCallingPipeline.q: - Reading the refseq table from the YAML if not specified on the command line. - Removed obsolete -bigMemQueue now that CombineVariants runs in 4g. - Added a -mountDir /broad/software option to work around adpr automount issues. - Merged the LSF preexec used for automount into the shell script used to execute tasks. - Using the LSF C Library to determine when jobs are complete instead of postexec. - Updated queue.sh to match the changes above. - Updated the FCPTest to match the changes above. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5036 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-20 22:34:43 +00:00
depristo	41c8552d0a	Added implements HasGenomeLocation to all revelant classes. It's not possible to write generic code for working with objects that support the getLocation() function in HasGenomeLocation. Please, if you have an object that has a location, implement this interface and start using / writing generic functions to sort, compare, etc. these objects. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5031 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-20 12:54:03 +00:00
kshakir	4d611e53e7	Passing the ADPR R script to FCPTest. Changed the FCP.q to use an InProcessFunction work around the -runDir issue GSA-420. Tested the FCPTest using the following dotkits and "ant clean pipelinetest -Dpipeline.run=run": - R-2.11 - Oracle-full-client - .cx-oracle-5.0.2-python-2.6.5-oracle-full-client-11.1 git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5029 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-20 06:08:45 +00:00
corin	50fcebb0c4	Incorporates tearsheet and plot production with database access into standard pipeline. Note that the following dotkit packages must be run before the adpr will be correctly generated: R-2.10, Oracle-full-client, cx-oracle-5.0.2-python-2.6.5-oracle-full-client-11.1 This also removes the unused titv argument git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5024 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-19 20:48:42 +00:00
rpoplin	55eb0387ac	Another relevant qscript. I use this one to do thousands of variant recalibration jobs to search for optimal parameters. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5019 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-19 18:17:32 +00:00
chartl	a463dbcda1	Refactoring the qscript directory; oneoffs, playground, and core git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5017 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-19 15:23:40 +00:00
rpoplin	7db9601c9d	Checking in the 1000G phase1 cleaning and calling scripts for posterity's sake, but also to show everyone what the current best practices for VQSR training looks like. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5015 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-19 14:32:52 +00:00
rpoplin	457c59e737	Use the sites-only HapMap files in the Methods development pipeline git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5013 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-18 20:50:09 +00:00
carneiro	35a4f1e366	.Added VariantEval as an optional step in the pipeline. .Lifted to HapMap 3.3 .Lifted to dbSNP 132 where possible. .Added the CEU-Trio WEx(hg19) dataset .Added some options to the pipeline You can now use : -dataset WEX -dataset HiSeq ... to choose which datasets to run through the pipeline. You can now without BAQ and indel mask: -noBAQ -noMASK Choose not to run the gold standard comparison analysis: -skipGoldStandard Activate the VariantEval walker analysis on the Recalibrated vcf: -eval The default behavior is to run exactly like it used to, so this version shouldn't change the way you used to use the pipeline. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5004 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-14 21:55:02 +00:00
carneiro	c4f9b262e5	removing the tech dev pipeline script from the repository to keep the methods development pipeline as the reference script. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4992 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-13 18:15:55 +00:00
carneiro	9e93091e9a	-baqGOP now takes phred scaled scores instead of probabilities in the command line. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4982 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-13 00:06:38 +00:00
kshakir	8ba3a5a43f	Command lines for locally run Queue jobs no longer have to be escaped differently than bsub'ed jobs. GSA-410 Local job runs now can run command lines longer than than 4096 on our linux machines. When determining if the help text and Queue extensions need to be rebuilt, use the .class files not the .java so that GATK oneoffs are picked up correctly. Added the most basic of all example QScripts for debugging, Hello World. Minor updates to copy/pasted LSF code to reduce ant javadoc warnings by a third. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4970 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-10 21:07:29 +00:00
kshakir	b34e2f733f	Removed stochasticity from IndelRealigner by random sampling using and seed based on the read list. Updated the Queue scatter/gather for read walkers to include -L unmapped on the last scatter job when intervals aren't specified, and to map it correctly when it is explicitly set. Simplified the build.xml/ivy.xml to fix a bug reported with "ant clean dist test" where the scalac target wasn't found. Now building all scala code at the same time, just like all java code is compiled at the same time. Sped up the build for everyone by uncommenting a small bit of classes so that javac/scalac will not constantly launch trying to build .class files that will never compile. Moved some source files to their expected location so that the .java/.scala -> .class is a one-to-one match, again keeping the compilers from wasting cycles. Used <uptodate> and <touch> to skip extracting the help text and generating the GATK Queue extensions when the source files haven't been modified. Fixed a couple errors when the <javadoc> task is run. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4963 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-07 22:03:36 +00:00
chartl	3e7802a3e0	Minor changes to a qscript and the GQ constants on PrivatePermutations git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4956 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-07 18:26:21 +00:00
carneiro	5e9a8f9cb3	Implemented a new argument (-DQS --defaultQualityScore) that allows GATK to deal with BAM files missing quality scores. If a value is specified, all reads are filled with the default quality score. Appropriate exception is thrown if -DQS is not provided and BAM file doesn't have quality scores for every base. Adding the first version of the techdev pipeline (tdPipeline) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4943 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-05 22:25:08 +00:00
rpoplin	20f29e4690	In the Methods development pipeline the call confidence threshold must be lowered from the default value for lowpass calling. What a bone-headed mistake! git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4941 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-05 20:30:55 +00:00
corin	6d809321d3	Updating combien variants memory limit and dcov default for the full calling pipeline. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4907 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-24 03:06:50 +00:00
depristo	5265f943b0	phasing per sample. tmp checkin git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4898 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-22 20:14:06 +00:00
corin	e7569cfe6f	Updated dbsnp version usage. Calling with 132, but still using 129 for eval to maintain consistant known/novel eval git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4895 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-22 17:37:27 +00:00
chartl	2235245af0	PrivatePermutations generalized to compute transition counts and average probabilities (and thus was renamed). Changes in some pipelines to reflect the change. Bugfix in the batch merging pipeline (it would halt because the allele VCF for genotyping batches could become off-spec). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4894 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-22 15:16:15 +00:00
rpoplin	7185fcb47b	Committing my notes about the methods development pipeline so we stay synced up while I'm on vacation. Cheers. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4891 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-21 21:14:20 +00:00
chartl	80770dc032	Expanded target pipeline complete. Stop trying to be clever about scatter-gather; wait until functional SG is built-in to Q. Til then, a lazy version of the fullCallingPipeline. Seems to take a long time to generate the graph though... git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4888 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-21 00:56:16 +00:00
kshakir	758d14a261	Checking in scripts used for testing the linear index MAX_FEATURES_PER_BIN. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4887 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-20 21:25:36 +00:00
chartl	fc33901810	Graph structure must be known at compile time. Removing GroupIntervals until a future point where in-process-functions can predict their output based on inputs [though this is probably forever: the inputs may not exist at compile time!] git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4886 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-20 21:22:58 +00:00
chartl	61d5daa65c	EXTREME interval processing. Still undergoing testing. + GroupIntervals allows user-defined scattering (e.g. take an interval list file, split it into k smaller interval list files by number of lines) + ExpandIntervals expands the intervals, either by widening them, or allowing the definition for nearby intervals (e.g. flanks starting 1bp before and after, ending 10bp after that) + IntersectIntervals takes n interval lists, writes 1 interval list that is the n-way intersection of all of them git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4885 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-20 19:42:50 +00:00
rpoplin	4ca1da1d07	Updating the NA12878.HiSeq bam file to be the correct bam file in the methods development qscript. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4879 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-20 14:53:10 +00:00
rpoplin	8fac346ac1	Misc cleanup in Methods Development Qscript git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4878 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-20 04:24:25 +00:00
rpoplin	34ab5b4889	Turning on BAQ in Methods Development pipeline. A new dataset is added: 363 EUR samples from the November 1000G release. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4877 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-19 21:13:25 +00:00
chartl	8118a439c0	Commit for Khalid git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4876 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-18 22:24:18 +00:00
rpoplin	15a33545f4	Updating Methods development pipeline qscript with the bam lists for all the data sets. It is ready for people to start running with it. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4875 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-18 22:19:14 +00:00
corin	f0ab7b849a	Adding a window size variable to avoid indel genotyper error git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4873 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-18 04:19:54 +00:00
rpoplin	bdef4e775a	Initial checkin of methods development pipeline qscript. It allows the methods dev team to run an overnight job which calls and recalibrates a variety of data sets and allows for an end-to-end sanity check of final results for potential changes to the methods. It isn't meant to be used by anybody quite yet, but shows the general structure and flow. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4871 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-17 22:14:02 +00:00
rpoplin	095fc1922a	By popular demand I'm adding the qscript I used to do the 660 bamfile 1000G calling for ASHG. It does cleaning, BAQing, and merging in 3mb chunks genome-wide then calls SNPs on those temporary bams. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4866 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-17 18:49:03 +00:00
depristo	32d5397c01	Experimental support for sided annotations. Currently not more/less valuable than two-tailed testing. Future experiments are needed git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4864 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-17 15:08:31 +00:00
chartl	0d18bd1011	Now that addAll() is in the superclass, no longer need this definition (which, without override, prevents the script from compiling anyway) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4862 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-17 05:36:31 +00:00
chartl	3e75431bc8	Thanks to mark: VCFInfoToTable removed in favor of a more flexible walker. Slight change to the argument structure of the walker to make it play more nicely with Queue: the field list parsing is pushed into the command line system (e.g. the variable is exposed as a List<String> and not a String, so Queue doesn't have to join a list into a string only to have it broken out again. This also allows the user to specify -F field1 -F field2 -F field3 if he/she so desires. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4842 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-15 03:33:36 +00:00
chartl	2217837845	Commit for Khalid -- should be a scala version of vcf2table but for some reason the run method isn't getting called. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4841 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-15 00:44:15 +00:00
chartl	f36861eeee	One more little bfix -- the issue was not the grep command, but instead the NFS in the awk; i changed it to ++count in the last commit which was really responsible for the fix. Then this ultra-escaping semi-broke teh grep again. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4831 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-13 20:36:14 +00:00
chartl	d34c5640d2	Bugfix for clf version of extract samples. Due to dynamic shell creation and bsubs and whatnot, the OR pipe for grep ("a\|b") needs to be super-escaped ("a\\\\\\\\\|b"). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4829 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-13 19:06:30 +00:00
chartl	f795b25c47	In-process versions of sample extraction and interval-list conversion for VCF files. Required an in-process-function branch of the queue library. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4827 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-13 17:36:53 +00:00
depristo	e219f6a4b5	Q script to run VQSR on a whole variety of common data sets. To be used as a basis for general methods development pipeline git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4826 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-13 16:55:52 +00:00
chartl	7bc2049031	Updates and bug fixes to private mutations qscript and pipeline libraries. Hand filter strings are now not busted (boo to having to escape quotes); convenience method added to VariantCalling to propagate standard trait data to a given GATK command line -- should be made more scala-esque in the future. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4824 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-13 04:55:13 +00:00
chartl	cf75caf653	java changes: VariantEvalWalker's logger is made public, so that variant eval modules can access it through the parent object. DesignFileGenerator comment lists how best to bind things to it, and the feature accessor is better refined to grab the genome loc. (old change) scala changes: convenience addAll( List[CommandLineFunction] ) added to QScript class (and thus removed from the fCPV2) useful command line functions added to a new library package for command line functions (these are fast simple VCF command lines) bug fixed in ProjectManagement for the class where there's only one batch to be batch-merged (not really part of the use-case, but an edge-condition that came up during pipeline testing) first draft of a private mutations pipeline which will be elaborated in future git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4823 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-12 05:10:45 +00:00
chartl	81290d238d	Restructuring my qscripts git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4821 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-11 20:58:45 +00:00
kshakir	56433ebf6b	Switched from LSF command line wrappers to JNA wrappers around the C API. Side effects: - bsub command line is no longer fully printed out. - extraBsubArgs hack is now a callback function updateJobRun. Updated FullCallingPipelineTest to reflect latest changes to fullCallingPipeline.q. Added a pipeline that tests the UGv2 runtimes at different bam counts and memory limits. Updated VE packages that live in oneoffs to compile to oneoffs. Added a hack to replace the deprecated symbol environ in Mac OS X 10.5+ which is needed by LSF7 on Mac. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4816 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-10 04:36:06 +00:00
corin	27acede64d	Removing old arguments. We'll now be running with the defaults. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4811 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-09 18:58:56 +00:00
chartl	f8dd59c1d1	Tightening of the batch merging pipeline. Optimized to run on hour queue, so please: if you run this, crush 'hour' with it. Testing is forthcoming, but it merged 700 samples overnight. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4805 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-08 14:36:23 +00:00
chartl	f4c43f013f	Due to the overhead for reading VCF files (>32g for 700 5MB VCF files), batched merging has to generate likelihoods in batches. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4796 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-06 18:23:54 +00:00
chartl	0944184832	Major refactoring of library and full calling pipeline (v2) structure. Arguments to the full calling qscript (and indeed, any qscript that wants them) are now specified via the PipelineArgumentCollection Libraries require a Pipeline object for instantiation -- eliminating their previous dependence on yaml files Functions added to PipelineUtils to build out the proper Pipeline object from the PipelineArgumentCollection, which now contains additional arguments to specify pipeline properties (name, ref, bams, dbsnp, interval list); which are mutually exclusive with the yaml file. Pipeline length reduced to a mere 62 lines. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4790 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-05 02:33:54 +00:00
corin	bdc7516168	Taking out recalibrating for now, since having these files is confusing people and we've not gone to dbsnp 132 yet so cluster generation's broken with these command lines. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4786 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-03 22:12:09 +00:00
chartl	220fb0c44a	Added a pipeline for merging batches. For now takes a file containing a list of VCFs, and a file containing a list of bams. Does not do anything smart (e.g. if you leave out some .bams or add some extra ones, you will not be warned). Heavy lifting done in (the beginnings of) a library for managing multi-batch or multi-project tasks. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4771 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-02 07:31:59 +00:00
chartl	9f03f09cc9	Changes to V2 pipeline and libraries. AB dropped. Cleaning enabled. Project name now properly propagated to intermediate files (instead of the string repr of the object). Indel mask is now expanded prior to filtering at indels. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4769 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-01 18:55:48 +00:00
chartl	06a0fb4489	Library-ized pipeline now functions git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4759 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-30 21:34:59 +00:00
ebanks	4413208c45	Removing unnecessary and incorrect includes git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4752 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-30 02:06:48 +00:00
corin	6b70cde0b9	Adding a forgotten quote mark git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4729 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-24 16:38:27 +00:00
corin	e15d18129c	Adding by sample metrics. Not sure why we didn't have this in here in the first place git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4723 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-23 21:36:03 +00:00
corin	fe28f8da9c	Removing Uniquify from main pipeline indel merge, since the pipeline isn't merging from samples with the same name anyway. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4721 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-23 17:25:22 +00:00
kiran	28805d17ca	Commenting out allele-balance for now. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4715 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-22 16:48:08 +00:00
corin	8dca5bd861	Putting the annotation back in, both to the filters and to UG git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4709 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-18 21:02:15 +00:00
corin	da1fe5bb37	Removing the AB filter given that we don't have that in the VCF anymore git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4708 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-18 20:22:05 +00:00
hanna	302cc13735	Trying out Queue for the first time. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4705 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-18 18:29:12 +00:00
corin	5466365575	Fixing a silly typo git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4680 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-15 18:16:51 +00:00
corin	a64f693b20	Updated pipeline script to include dbSnp for UG git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4679 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-15 18:09:47 +00:00
kshakir	801c562909	Now actually checking in the integration test mentioned in the prior commit: compiles the full calling pipeline. Removed QScript usages of VariantRecalibrator's -reportDatFile, --report_dat_file git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4668 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-14 04:27:10 +00:00
kshakir	673fa841a4	Updated PluginManager so that during testing Queue can dynamically compile and load separately multiple class directories into the same class loader. Removed obsolete usages of PackageUtils with updated PluginManager. Ported Queue interval utilities written in scala over to Sting's java IntervalUtils. Added a very basic intergration test to ensure that the fullCallingPipeline.q compiles. Added options to specify the temporary directories without having to use -Djava.io.tmpdir (useful during the above integration test). While adding tempDir added options to specify the run directory from the command line, for example "-runDir v1". Upgraded to scala 2.8.1 and updated calls to deprecated functions. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4661 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-12 20:14:28 +00:00
chartl	c19f567424	Sometimes, inputs are really outputs in disguise. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4631 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-05 19:51:16 +00:00
chartl	0e40321a52	Brütall hack: make the bam list creator job wait for the interval creator job, so that there is an implicit dependency of UG on the interval list, by way of the bam list git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4628 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-04 20:43:11 +00:00
chartl	cb0b2f9811	My analysis script for private mutations. I'm committing it because it contains a number of specialized command line functions that could prove useful in the future. (For example: ConcatVCF and ExtractSample) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4626 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-04 19:57:27 +00:00
chartl	42e9987e69	Bug fix to GenotypeConcordance. AC metrics get instantiated based on number of eval samples; if Comp has more samples, we can see AC indeces outside the bounds of the array. Bug fix to LiftoverVariants - no barfing at reference sites. AlleleFrequencyComparison - local changes added to make sure parsing works properly Added HammingDistance annotation. Mostly useless. But only mostly. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4622 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-03 19:23:03 +00:00
kshakir	5cdd7a7ba4	There's no such thing as a sam index, so the GATK extension generator doesn't need to add an @Input for them. Updated a call to swapExt to specify the directory. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4586 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-27 20:39:03 +00:00
corin	6d7ed5781c	Added Dbsnp to Indel Realigner; added known indels rod-binding to realigner. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4576 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-26 22:22:28 +00:00
kshakir	8211cee0b2	Queue UI Improvements: - Forcing user to set the temp directory via -Djava.io.tmpdir to avoid filling up /tmp. - By default deleting job outputs tagged as intermediate. - Defaulting pipeline to scatter count 1 (no reads deleted). - Cleaning up temp classes even when scripting fails. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4573 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-26 19:49:08 +00:00
kshakir	80259b9e20	Changed fullCallingPipeline to output all contigs in the refence if scattering. When the cleaner interval scatter count is set to one explicitly setting the intrevals to Nil. TODO: Need to add an option that lets the user choose from the command line to scatter all contigs or just those in the intervals list. For now can get relatively the same behavior by setting the interval scatter count equal to the number of contigs+1, assuming the random contigs come at the end of the sequence dictionary. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4565 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-24 03:01:06 +00:00
kshakir	e9c6f681a4	Instead of the pipeline's cleaner only writing BAMs with the target intervals, now pulling the list of contigs from the target intervals and outputing reads in those contigs. Added a brute force -retry <count> option to Queue for transient errors. Waiting up to 2 minutes for the LSF logs to appear before trying to display the errors from the logs. Updates to the local job runner error logging when a job fails. Refactored QGraph's settings as duplicate code was getting out of control. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4563 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-22 22:22:30 +00:00
kshakir	b954a5a4d5	- After removing special code for intervals, instead of being of type File they are generated as List[File]. Changed previous checkin that was appending to this list and instead assigning a singleton list. - More cleanup including removing the temporary classes and intermediate error files. Quieting any errors using Apache Commons IO 2.0. - Counting the contigs during the QScript generation instead of the end user having to pass a separate contig interval list. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4539 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-21 06:37:28 +00:00
kshakir	88a0d77433	Changed parsing engine to store the order the argument bindings based on their definition in the class, moving "-T" to the front of Queue command lines. Queue GATK generated .intervals is now a List(File) again removing special case handling in the generator. Instead of using @Scatter annotation, using ScatterFunction instance to determine if a job can be scattered. Implemented special VcfGatherFunction which only uses the header from the first file, even if the other files differ in their headers. Added a -deleteIntermediates to Queue to delete the outputs from intermediate commands after a successful run. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4536 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-20 21:43:52 +00:00
kshakir	81479229e1	QScript authors can now tag functions as intermediate. Functions tagged as intermediate will be skipped unless another function in the graph needs their output. Re-logging the failed jobs and the path to their log files at the end of a run. Added a parameter -bigMemQueue for the fullCallingPipeline.q instead of hardcoding gsa (gsa was backed up and it was actually faster to run on week). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4520 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-18 22:11:14 +00:00
chartl	2bc5971ca1	Added - a tool to fix reference bases of a VCF. The OMNI had a couple of sites with incorrect reference bases (look to be legacy from other chips), and a few more that had ref and alt flipped. GAP should probably take care of it, but since I need results by monday, I'm doing it. Modified - SelectVariants: Hook up to VariantContextUtils to recalculate AC/AF/AN, which uses the accessor in VariantContext to do this. Somehow sites that were selected down to hom-ref genotypes only wound up getting positive AC. IMPORTANT I kind of need input here. The header of a file used for an integration test specifies AC as being an integer. Recalculating it casts it into an integer list (which it should be, as it allows for alternate alleles). However this appears to clash with what the jexl expression is looking for? For now, the integration test itself needed to be changed -- it's unclear what to do when the header specifies AC of being one class, but recalculating it casts to another class, and I'm not sure what to do. I'm committing my omni_qc pipeline because I'm almost certain 2 months down the road I'm going to wonder what the heck I did to generate my results. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4511 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-17 03:18:01 +00:00
kshakir	7157cb9090	While bkill'ing on the shutdown thread Queue will no longer try to submit more jobs on the original thread. Updated pipeline output structure to current recommendations by Corin. Directories are now automatically before the function runs. Fixed several bugs with scatter gather binding when the script author needs to change the directories. Fixed bug with tracking of log files for CloneFunctions. More error handling and logging of exceptions (good test environment while LSF was down this early AM!) Removed cleanup utility for scatter gather. SG Output structure has changed significantly. Will need to discuss and find a better approach for Queue programatically deleting files. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4504 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-15 17:01:36 +00:00
corin	5e0c4ecc21	Added DbSnp to VariantEval git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4497 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-14 17:02:17 +00:00
chartl	bffb8bb01f	The SVN repository is not for dumb analysis-specific scripts. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4460 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-08 14:04:53 +00:00
chartl	21ec44339d	Somewhat major update. Changes: - ProduceBeagleInputWalker + Now takes a validation ROD and a prior to give it, will use those genotypes in place of the variant genotypes if both are present + Takes a bootstrap argument -- can use some given %age of the validation sites + Optionally takes a bootstrap output argument -- re-prints the validation VCF, filtering those sites used as part of the bootstrap -BeagleOutputToVCFWalker + Now filters sites where the genotypes have been reverted to hom ref + Now calls in to the new VCUtils to calculate AC/AN -Queue + New pipeline libraries for easy qscript creation, still a work in progress, but this is a considerable prototype + full calling pipeline v2 uses the above libraries + minor changes to some of my own scripts + no more need for contig interval lists, these will be parsed out of your normal interval list when it is provided git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4459 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-08 13:30:28 +00:00
kshakir	ca5db821ce	Added the ability to Queue to run scala functions inside the JVM. NOTE: Extend from InProcessFunction instead of CommandLineFunction to use this functionality. Queue now submits new LSF jobs only after previous functions have completed successfully. When the Queue process is shutdown (ex: via Control-C) sends a bkill command for any running jobs. Ported commands like creating directories and scatter/gather interval list to scala functions. Updates to LSF status tracking by porting the python to internally generated bash scripts. Temporarily disabled job name submission to LSF. Plus side is that the full command is now available in "bjobs -w". TODO: Put back jobName passing to LSF based on an option? Changed BaseTest to allow scala to access paths to references. Changed the extension generator to default the analysis name to the walker "name". git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4442 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-06 18:29:56 +00:00
chartl	28ac1d325e	Commit for Ryan git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4433 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-05 19:04:10 +00:00
corin	e340be34d8	upping mem limit since something was unhappy with the lower limit git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4427 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-05 02:38:17 +00:00
chartl	7639692e5b	Sigh. Fix the source of even more UserErrors in the phone home directory: make sure to gunzip the beagle files before passing them into the conversion walker... git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4399 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-01 03:28:36 +00:00
chartl	f34b4c6b82	Be smarter if the beagle output is set such that getParent() returns null. Up the memory limit. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4389 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-30 12:48:47 +00:00
chartl	0142047da9	And a bugfix 3 seconds later. Don't tell java to use up to 20g while telling the farm to kill the job if it tries to exceed 4g. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4388 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-30 02:08:47 +00:00
chartl	06970ae039	A qscript that refines genotypes with beagle and merges them into one vcf (running currently on the recent chr20 production calls). This will be librarized soon; but if you need to do something like this, feel free to cannibalize. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4387 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-30 02:05:30 +00:00
chartl	2708e83198	For show (Queue works nicely): An analysis script that runs QC for the omni chip git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4380 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-29 15:04:17 +00:00
kiran	51fdf9d701	Default memory limit is now 4g (apparently necessary when testing on full 100-sample Autism_Daly dataset) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4359 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-27 05:43:08 +00:00
kiran	bcc09f5d8c	Simplifications: removed command-line arguments to control SNP cluster filter parameters. Infer the number of contigs to scatter indel cleaning from the contig list (which we should get rid of too). Changed the PY argument to just Y for specifying the path to the YAML file. Cleaned up command-line argument documentation. See http://iwww.broadinstitute.org/gsa/wiki/index.php/Queue-based_pipeline for a list of remaining issues. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4356 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-26 22:50:30 +00:00
kiran	9820a12fa5	Removed unnecessary dbSNP big-table dependency. Ti/Tv is now required. Consistent downsampling level for all programs. Spelling corrections. VariantEval now generates R-style output. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4355 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-26 16:55:58 +00:00
kiran	9bfbc3b784	Commented out changes to ADPR and VariantEval modules that are causing this script to not compile. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4353 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-25 15:12:10 +00:00
corin	3ec0e09edd	ADPR is now included in the full calling pipeline. The most up to date version of the ADPR is about to be committed and should be used with the script for now. The qscript now calls for two additional strings as inputs: the sequencing machines used and the sequencing protocol. In order for ADPR to finish successfully, a squid file for both the lane and sample level data needs to be produced, reformatted and named <projectBase>_lanes.txt or <projectBase>_samps.txt, respectively. These files need to be in the working directory. When database access is ready, this and the protocol and sequencer parameters of the r script will go away. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4345 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-24 19:28:43 +00:00
kshakir	67bcf3a7e4	Fixed VariantEval rod binding names. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4342 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-24 14:52:51 +00:00
chartl	c355afc320	Queue now does job tracking (replace -run with -status in the command line). Produces output that looks like: INFO 20:58:17,827 QCommandLine - Checking pipeline status INFO 20:58:23,234 QGraph$$anonfun$formatStatus$1 - Height_Hirschhorn_NHGRI.uncleaned_MergeIndels [DONE] INFO 20:58:23,236 QGraph$$anonfun$formatStatus$1 - IndelGenotyper_158.bam [DONE] 5t/5d/0r/0p/0f INFO 20:58:23,237 QGraph$$anonfun$formatStatus$1 - IndelGenotyper_929.bam [DONE] 5t/5d/0r/0p/0f INFO 20:58:23,238 QGraph$$anonfun$formatStatus$1 - Height_Hirschhorn_NHGRI.uncleaned_SNP_calls [NOT DONE] 5t/0d/0r/5p/0f INFO 20:58:23,239 QGraph$$anonfun$formatStatus$1 - Height_Hirschhorn_NHGRI.uncleaned_HandFilter [NOT DONE] INFO 20:58:23,240 QGraph$$anonfun$formatStatus$1 - IndelGenotyper_1122.bam [DONE] 5t/5d/0r/0p/0f INFO 20:58:23,240 QGraph$$anonfun$formatStatus$1 - Height_Hirschhorn_NHGRI.uncleaned_VariantRecalibrator [NOT DONE] INFO 20:58:23,241 QGraph$$anonfun$formatStatus$1 - IndelGenotyper_913.bam [DONE] 5t/5d/0r/0p/0f INFO 20:58:23,242 QGraph$$anonfun$formatStatus$1 - IndelGenotyper_2037.bam [DONE] 5t/5d/0r/0p/0f INFO 20:58:23,243 QGraph$$anonfun$formatStatus$1 - Height_Hirschhorn_NHGRI.uncleaned_VariantEval [NOT DONE] INFO 20:58:23,244 QGraph$$anonfun$formatStatus$1 - Height_Hirschhorn_NHGRI.uncleaned_Cluster [NOT DONE] INFO 20:58:23,245 QGraph$$anonfun$formatStatus$1 - IndelGenotyper_106.bam [DONE] 5t/5d/0r/0p/0f INFO 20:58:23,246 QGraph$$anonfun$formatStatus$1 - Height_Hirschhorn_NHGRI.uncleaned_Cluster_and_Indel_filter [NOT DONE] INFO 20:58:23,247 QGraph$$anonfun$formatStatus$1 - Height_Hirschhorn_NHGRI.uncleaned_ApplyVariantCuts [NOT DONE] INFO 20:58:23,248 QGraph$$anonfun$formatStatus$1 - Height_Hirschhorn_NHGRI.uncleaned_GenomicAnnotator [NOT DONE] INFO 20:58:23,248 QGraph$$anonfun$formatStatus$1 - IndelGenotyper_1713.bam [DONE] 5t/5d/0r/0p/0f git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4340 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-24 00:59:09 +00:00
kshakir	20b38b38f3	Updated from SnakeYAML 1.6 to 1.7. Added a pipeline java bean and YAML utility to serialize java beans. Added a getFirehosePipelineYaml.sh that can pull firehose data into the pipeline yaml file format. Updated the fullCallingPipeline.q to begin using the pipeline yaml file format for bams and reference. More changes to come as this code gets tested out in the fullCallingPipeline. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4329 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-22 19:47:49 +00:00
chartl	6dec042288	Re-enabling indel cleaning, explicitly calling fix mates in the case where indel cleaning is not scatter/gathered git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4324 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-21 20:37:49 +00:00
chartl	b24172c80f	Queue now utilizes .[file].done to allow skipping of previous jobs, if they have been completed. This is, unfortunately, reliant on a python script to do the post-execution touching of .done files. That is to say, proper resumability is live (but not extensively tested) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4312 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-20 00:16:53 +00:00
chartl	6f6d2eb31f	Told people this worked...forgot to commit! -c git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4306 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-18 03:46:00 +00:00
chartl	c1720cc8f5	Now compiles. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4295 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-16 18:49:53 +00:00
chartl	c581bd2d84	Minor modifications to fCP git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4294 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-16 18:29:24 +00:00
depristo	3c5b8730d5	More Queue scripts for analysis git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4260 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-12 14:04:10 +00:00
kshakir	fd5970fdd4	At chartl's superb suggestion, command line files are now all Files instead of old method of sometimes "has a File". Should be easier when reassigning them. No longer generating deprecated GATK arguments on the Queue extensions. Emitting deprecation warnings to Queue compile to help debugging issues. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4195 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-02 21:30:48 +00:00
depristo	ca503e5801	Queue scripts for recalibration and running nSample UG jobs pre and dynamic merging git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4186 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-01 20:23:37 +00:00
chartl	5e710050d6	minor change, bamFiles comes from the input list, not the script git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4170 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-31 16:03:35 +00:00
chartl	1a14dbee1e	Adding in .bam indexing; commit for Khalid git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4169 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-31 15:21:41 +00:00
chartl	2ffa98aea5	Ugh! varout --> out git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4157 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-29 02:34:41 +00:00
chartl	d7edce31a2	Commit of fCP for Khalid git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4156 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-29 02:24:25 +00:00
chartl	576ae30df1	A version of the full calling pipeline queue script that fully compiles without String/File/NamedFile type exceptions (e.g. expected String but got NamedFile/Expected NamedFile but got File). Pipeline itself is under testing with 5 bam files. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4154 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-28 22:51:11 +00:00
chartl	c6441b585a	Actually hook up the new indel genotyper and merge analyses into DAG (aka "i forgot to add()") git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4149 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-27 18:00:50 +00:00
chartl	7908237b90	Full calling pipeline now calls indels through the indel genotyper, merges with combine variants, and filters on them. Since new genomic annotator is fast, it is no longer scatter-gathered. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4144 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-27 16:28:24 +00:00
kshakir	78946c4ffd	Allowing the Queue to run the GATK via -cp instead of only from -jar. Added an example of using a walker with Queue and a custom -classpath. Removed an unused import statement in NamedFileWrapper. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4143 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-27 16:25:59 +00:00
kshakir	0105e8d063	Updated Queue GATK generation to reflect -B and -I changes. To add support for "-I:tumor tumor.bam", the GATK argument import_file (-I) is now generated as a List of NamedFile objects. Could not get sugar working 100%. To activate sugar import the gatk package. This effectively adds a new method to java.io.File called toNamedFile. When adding a file to the list call countReads.import_file :+= myJavaFile.toNamedFile See scala/qscript/examples for actual examples. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4122 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-25 22:17:36 +00:00
chartl	6eb1559c1d	End-to-end calling works again (changes to walker arguments, and changes to queue, affect its validity, so it often goes out-of-date before I try to use it again) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4116 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-25 18:52:44 +00:00
kshakir	3aedd0055e	Updated firehose clean bam pipeline to pull firehose info and push back firehose clean bam. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4088 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-23 20:38:42 +00:00
chartl	0028b884d8	Reformatting and tweaks to the end-to-end pipeline git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4066 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-19 20:29:48 +00:00
kshakir	618c69f8dc	More updates to the CleanBamFile pipeline. Added the a CommandLineFunction.jobDependencies that will explicitly force a function to wait for a file, even if the value isn't otherwise listed on an @Input. More bug fixes and refactoring of functions. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4048 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-17 14:59:42 +00:00
chartl	3a4977c75e	Re-add the 1KG trigger as a comp as well git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4045 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-16 18:19:47 +00:00
depristo	c85ab9db37	functional recalibrate script git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4034 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-14 16:01:37 +00:00
kshakir	307c8ca027	Created a new playground script for cleaning bams in Firehose. Some refactoring of Queue extensions for reusability in scripts. Putting the extensions into the Queue.jar after building them. More updates to GATK walker arguments specifying @Input and @Output for Queue. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4032 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-13 23:52:24 +00:00
kshakir	542d394e09	Cleaning up Queue debugging output. -l DEBUG with local programs now prints out the stdout/stderr of the programs as they are run. More documentation in the examples with a new even simpler CountReads example. Took out unused option to build Queue GATK extensions separately. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4025 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-13 15:54:08 +00:00
kshakir	f39dce1082	Exposed CommandLineFunction defaults to the Queue.jar command line (see -help). Added ability to skip up-to-date jobs where the outputs are older than the inputs. Changed -T CountDuplicates --quiet to --quietLocus so that Queue GATK extensions can use both short and full argument names. Short names can be used to set values on Queue GATK extensions, for example: vf.XL :+= myFile Moved Hidden from the GATK to StingUtils. Updated ivy from 2.0.0 to 2.2.0-rc1 to fix sha1 issue: http://bit.ly/aX72w7 Added Queue to javadoc and testing build targets. Added first Queue unit test. Another pass at avoiding cycles in the DAG thanks to all function I/O being files. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4017 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-11 21:58:26 +00:00
depristo	cd2d051209	full path to Rscript git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3999 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-10 12:02:38 +00:00
depristo	9b432d0801	1kg script now works git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3998 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-10 12:01:18 +00:00
kshakir	4f51a02dea	Changed logging level to default at INFO instead of WARN. Changes to StingUtils command line for use in Queue, replacing Queue's use of property files. Updates to walkers used in existing QScripts to add @Input/@Output. RMD used in @Required/@Allows now has a new default equal to "any" type. New QueueGATKExtensions.jar generator for auto wrapping walkers as Queue CommandLineFunctions. Added hooks to modify the functions that perform the Scattering and Gathering (setting their jar files, other arguments, etc.) Removed dependency on BroadCore by porting LSF job submitter to scala. Ivy now pulls down module dependencies from maven. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3984 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-09 16:42:48 +00:00
chartl	5815348ebc	Switch to newer version of comp tracks (and make the trigger track a comp as well). Indel cleaning should override the interval list and only use the contig interval list; and also force jobs to go to long. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3941 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-04 20:05:27 +00:00
chartl	9132c98eec	Slightly smarter interval list dealing (whole exome intervals are .interval_list, whole genome are .interval.list). Also use BTI with the Genomic Annotator git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3904 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-29 22:04:02 +00:00
chartl	54d93f63d2	Hacky fix for LSF confusion -- submitted jobs check to see if their directory exists, despite depending on the job which creates said directory. Filter strings now have escaped quotes. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3903 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-29 21:35:50 +00:00
chartl	0f9baa2e94	Ha ha ha ha ha :( git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3902 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-29 20:48:35 +00:00
chartl	7a5ee485d2	Full pipeline now works through DAG creation. First draft; more work to do to make it cleaner and better command-line input handling (and properties handling); but the DAG is rendered and looks good. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3898 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-29 19:36:17 +00:00
chartl	4d4cf6e1dc	Updates to calling pipeline git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3896 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-29 18:37:20 +00:00
chartl	62a9217a61	A brute-force exome/genome independent end-to-end cleaning/calling pipeline using Queue git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3894 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-29 13:17:14 +00:00
depristo	25a27b78bc	1KG Table 1 counting pipeline. Useful example git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3819 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-17 22:30:56 +00:00
depristo	b0fc42906e	Better DOT support and updated recalibration pipeline git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3811 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-16 20:54:51 +00:00
depristo	81eef0d993	DOT visualization with Queue. More sophisticated recalibation queue script with scatter/gather git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3799 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-15 22:32:48 +00:00
depristo	530a320f28	Intermediate commit of scatter/gather recalibation pipeline git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3785 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-13 22:46:08 +00:00
kshakir	1d399aa2f3	Added a temporary gatkLoggingLevel field to the soon to be obsolete GatkFunction while finishing up the delayed generic gatk walker utility. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3757 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-11 03:27:32 +00:00
kshakir	7be8c35eb2	Workaround for scala trait erasing parameterized types: - Requiring explicit @ClassType on parameterized fields in traits. - Scatter / Gather functions are now abstract classes since @ClassType can't be used on parameterized fields with type parameters. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3726 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-07 03:15:10 +00:00
rpoplin	87470d5fe5	Checking in a simplistic VR qscript file for posterity's sake git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3705 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-01 18:53:17 +00:00
kshakir	894ad354fa	Fixed typo in the name of the shell directory. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3644 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-25 20:59:40 +00:00
kshakir	75c98c42b8	Started path of deprecation of Sting's @Argument by splitting the annotation into @Output and @Input. Anything that's not an @Output should be an @Input. Checked in example qscripts that are basically todo integration tests. Replaced use of queue @Input/@Output with Sting's new @Input/@Output. This means you'll now have to doc-ument the annotations. More work on dependency resolution cycles being created in the graph during scatter/gather. Filtering nulls to avoid NPE exceptions in scala's 'Collection'.hashCode. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3643 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-25 20:51:13 +00:00

1 2 3 4 5

249 Commits (7b452ea2b9ad2d2f3e8bbfcbfb0e818f0a87f18d)