gatk-3.8

Commit Graph

Author	SHA1	Message	Date
fromer	3c1a026c94	Updated script to properly bin DoC values so that down-sampling corresponds to range of DoC values obtainable git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5208 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-07 16:47:55 +00:00
depristo	c4707631e2	MethodsDevelopmentPipeline is now the test bed for large scale AWS_S3 logging. Can be disabled from command line if this is necessary git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5203 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-06 17:03:45 +00:00
fromer	8b8b4fced1	Removed explicit memoryLimit, so that memLimit given on the command-line will NOT be ignored... git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5202 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-06 01:55:17 +00:00
kshakir	cc5d695bcf	Renamed the IPFL Test to IPFL PipelineTest so that it'll be picked up by the PipelineTests. HACK: Turned off JNA autoRead() in the jobInfoEnt LSF structure to try and dodge the SIGSEGV during strlen calls during bmods. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5201 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-05 00:06:12 +00:00
depristo	fe4aa58d35	Removing unused class git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5197 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-04 22:22:28 +00:00
fromer	4cdc974c5f	Preliminary Qscript to run DoC for the purpose of CNV detection git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5194 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-04 21:25:59 +00:00
corin	cd6ace1b47	Includes UG version of indel genotyping rather than IGV2 git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5191 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-04 20:25:46 +00:00
chartl	bfc6ef1753	A successful attempt at a queue integration test, ensuring that the InProcessFunction libraries are working as expected. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5190 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-03 21:30:35 +00:00
carneiro	358a400474	made ApplyVariantCut a default part of the pipeline, added the -noCut option if you don't want to use it. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5189 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-03 19:29:36 +00:00
carneiro	7af003666d	added optional argument -cut to apply the variant cut to the ts recalibrated vcf. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5183 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-03 17:34:40 +00:00
chartl	5398cf620a	Bug fixes in the in process function (spoiled by python: was not closing my writers). SortByRef now works somewhat like the perl script does, rather than doing a memory-expensive sort. Adding a QTools qscript which is kinda clunky, and will be used mostly for integration tests of these IPFs, pending some better way to construct argument collections and function accessors at compile-time. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5182 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-03 17:32:46 +00:00
chartl	a9d0921529	That variable name could only lead to trouble. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5180 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-03 05:03:48 +00:00
chartl	9515f94242	Commiting a simple merge IPF for use with qscripts (currently use a long grep, awk, pipe command, which can be unsafe and is hard to extend). Tests for all these functions coming soon. Also, IntelliJ + intermittent VPN connection = botched repository. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5179 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-03 05:01:21 +00:00
carneiro	cf15819db5	updated to work with the new VariantEval. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5176 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-02 17:46:07 +00:00
rpoplin	47357b726e	Fixing import GenotypeCalculationModel since it doesn't exist anymore. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5175 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-02 15:39:43 +00:00
fromer	7605f0e6c1	Corrected input/output definitions for Queue git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5173 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-02 07:39:00 +00:00
fromer	3839fd1a25	Updated phasing pipeline to properly read samples from VCF and BAM files git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5172 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-02 07:16:05 +00:00
fromer	798955b006	After discussing with Mark, revert to "Master merging" of phase information from VCFs. This has the advantage of creating minimal phased VCFs from RBP, from which phase info is merged into the original "master VCF". Also, updated Genotype.sameGenotype() to be simpler and NOT REVERSE the ignorePhase flag in comparing Allele lists/sets git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5167 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-01 19:50:15 +00:00
fromer	a89400b20c	Simple implementation to retrieve relevant BAM files for each sample git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5152 348d0f76-0448-11de-a6fe-93d51630548a	2011-02-01 00:03:03 +00:00
kshakir	e74f28ad89	If there's an LSF queue maximum time limit set and the user hasn't specified one for this job, pass on the queue defined maximum limit with the job. Updated LibBatIntegrationTest to use proper networked temp directory accessible by local machines and nodes. Disabling the FCPTest until the VE3 is incorporated into the fullCallingPipeline. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5151 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-31 23:13:09 +00:00
fromer	f258363cfc	Minor bug fix git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5150 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-31 22:29:28 +00:00
fromer	742bd44728	Changed output file to be user-defined git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5149 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-31 22:15:26 +00:00
fromer	6c99dc4dab	Take (partial) ownership of phasing 1000G chr20 calls git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5147 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-31 21:49:41 +00:00
chartl	4d9bc84bd5	Initial commit of in-process helper functions for making the BCM more robust git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5144 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-31 19:18:31 +00:00
kshakir	d4f744a4d4	Checking if the interval files exist before using them to calculate the minimum scatter parts. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5143 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-31 18:07:34 +00:00
kshakir	57353294cc	Copying jobLimitSeconds to clones. Some cleanup and refactoring around copying values to clones. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5128 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-30 06:35:53 +00:00
kshakir	e19b5d17b4	Related to last checkin, need to create the directory when writing the yamlthe first time after an ant clean. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5127 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-29 20:45:44 +00:00
kshakir	23578b7402	Pipeline tests will only start from scratch after "ant clean", making it faster to debug downstream issues when re-running "ant pipelinetest -Dpipeline.run=run". Updated the FCP, the test, and the ADPR to handle an issue with the ADPR locating the yaml generated by the FCPTest. Does not solve the ADPR error: Error in dimnames(x) <- dn : length of 'dimnames' [1] not equal to array extent git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5126 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-29 19:44:03 +00:00
kshakir	b0a3c70f90	Updated paths to new bams. Metrics of the new bams have changed slightly but should still fall within test toleraneces. Will reset metrics in a later checkin after confirming changes. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5125 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-29 10:55:26 +00:00
kshakir	4ee4fd47e9	Moved the test name and the job queue into the spec. Defaulting to the hour queue for running pipeline tests. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5122 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-29 00:07:25 +00:00
kshakir	2ef66af903	Moved the maximum number of intervals check from FCP to the Queue core so that scatter gather will no longer blow up if you specify a scatter count that is too high. Moved the BamListWriter from FCP to ListWriterFunction in the Queue core. Added an ExampleCountLoci QScript along with an example pipeline integration test which checks MD5s. Added a few more utility methods to PipelineTest including a currentGATK variable that points to the GATK jar. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5121 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-28 23:33:58 +00:00
corin	b25d131481	updated to work with the new tearsheet git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5113 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-28 18:49:11 +00:00
carneiro	cae4b9b0de	quick update with the correct CEU trio bam file and it's final location. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5098 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-27 19:17:19 +00:00
ebanks	68729045ca	Always best to use the left-aligned version of the dbsnp vcf git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5091 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-26 20:21:50 +00:00
kshakir	df2e7bd355	Disabled FCPTest whilst we figure out where the C426 bams went. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5078 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-26 05:11:57 +00:00
kshakir	ce5b11317b	Moved some shutdown logic from the LSF job runner into the QGraph. Because of Java's type erasure JobManagers must provide runtime access to the runner class to shutdown. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5076 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-25 20:28:54 +00:00
kshakir	b3c9b9bfbe	+1 file that should have been with the last checkin. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5069 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-25 05:31:17 +00:00
kshakir	9923e05e0a	Moved MD5 utils from WalkerTest to BaseTest for use by PipelineTests. Moved VariantEval validation from FCPTest to PipelineTest. Cleaned up some duplicate code for writing temp files during tests. Moved FCPTest to playground namespace to match move for FCP.q. Added a basic HelloWorldPipelineTest for the HelloWorld QScript. Moved duplicated error handling from JobRunners into the FunctionEdge. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5068 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-25 04:11:49 +00:00
kshakir	76ee57639d	Updated FCPTest to match changes to UG in r5058. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5066 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-24 19:30:02 +00:00
delangel	fa0c476b82	Script for calling indels in all phase 1 samples - VQSR part still needs work but raw calling is done git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5052 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-22 14:07:10 +00:00
carneiro	a0731eaa81	updated NA12878 Trio gold standard data. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5048 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-21 18:48:31 +00:00
depristo	94b64ec54a	Moving scala script into analysis directory git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5047 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-21 18:42:18 +00:00
depristo	b45566760e	intermediate checkin git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5045 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-21 18:39:25 +00:00
kshakir	6fbd18c759	Cleaning up obsolete code. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5044 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-21 16:27:35 +00:00
kshakir	8d46cf3604	Testing a configuration change for build system. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5043 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-21 14:44:41 +00:00
rpoplin	b6497c404f	Moving Phase1Calling qscript over to using the cleaned, pre-BAQed bams git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5039 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-21 02:41:20 +00:00
carneiro	fc73569d62	Added NA12878 Trio dataset to the pipeline. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5037 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-20 23:15:33 +00:00
kshakir	8855f080c2	For the fullCallingPipeline.q: - Reading the refseq table from the YAML if not specified on the command line. - Removed obsolete -bigMemQueue now that CombineVariants runs in 4g. - Added a -mountDir /broad/software option to work around adpr automount issues. - Merged the LSF preexec used for automount into the shell script used to execute tasks. - Using the LSF C Library to determine when jobs are complete instead of postexec. - Updated queue.sh to match the changes above. - Updated the FCPTest to match the changes above. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5036 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-20 22:34:43 +00:00
depristo	41c8552d0a	Added implements HasGenomeLocation to all revelant classes. It's not possible to write generic code for working with objects that support the getLocation() function in HasGenomeLocation. Please, if you have an object that has a location, implement this interface and start using / writing generic functions to sort, compare, etc. these objects. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5031 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-20 12:54:03 +00:00
kshakir	4d611e53e7	Passing the ADPR R script to FCPTest. Changed the FCP.q to use an InProcessFunction work around the -runDir issue GSA-420. Tested the FCPTest using the following dotkits and "ant clean pipelinetest -Dpipeline.run=run": - R-2.11 - Oracle-full-client - .cx-oracle-5.0.2-python-2.6.5-oracle-full-client-11.1 git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5029 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-20 06:08:45 +00:00
kshakir	acc2f1c9fe	Updated FCPTest to use the new path to fullCallingPipeline.q changed in r5017. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5027 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-19 21:43:43 +00:00
corin	2824e8224c	removes unused titv argument git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5025 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-19 20:49:12 +00:00
corin	50fcebb0c4	Incorporates tearsheet and plot production with database access into standard pipeline. Note that the following dotkit packages must be run before the adpr will be correctly generated: R-2.10, Oracle-full-client, cx-oracle-5.0.2-python-2.6.5-oracle-full-client-11.1 This also removes the unused titv argument git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5024 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-19 20:48:42 +00:00
rpoplin	55eb0387ac	Another relevant qscript. I use this one to do thousands of variant recalibration jobs to search for optimal parameters. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5019 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-19 18:17:32 +00:00
chartl	a463dbcda1	Refactoring the qscript directory; oneoffs, playground, and core git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5017 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-19 15:23:40 +00:00
rpoplin	7db9601c9d	Checking in the 1000G phase1 cleaning and calling scripts for posterity's sake, but also to show everyone what the current best practices for VQSR training looks like. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5015 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-19 14:32:52 +00:00
rpoplin	457c59e737	Use the sites-only HapMap files in the Methods development pipeline git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5013 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-18 20:50:09 +00:00
rpoplin	00453919d2	VQSR now only uses the valid polymorphic sites for training and truth sensitivity calculations. Any number of tracks whose ROD binding begins with the name truth can be used as truth sensitivity tracks. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5012 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-18 20:48:19 +00:00
carneiro	35a4f1e366	.Added VariantEval as an optional step in the pipeline. .Lifted to HapMap 3.3 .Lifted to dbSNP 132 where possible. .Added the CEU-Trio WEx(hg19) dataset .Added some options to the pipeline You can now use : -dataset WEX -dataset HiSeq ... to choose which datasets to run through the pipeline. You can now without BAQ and indel mask: -noBAQ -noMASK Choose not to run the gold standard comparison analysis: -skipGoldStandard Activate the VariantEval walker analysis on the Recalibrated vcf: -eval The default behavior is to run exactly like it used to, so this version shouldn't change the way you used to use the pipeline. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5004 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-14 21:55:02 +00:00
kshakir	2355a55067	Added a QFunction.jobLimitSeconds for experimentation, currently only used as the equivalent of bsub -W. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5002 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-14 19:45:03 +00:00
kshakir	2163420942	Updated to reflect QD changes in r4984 git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4994 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-13 20:10:21 +00:00
carneiro	c4f9b262e5	removing the tech dev pipeline script from the repository to keep the methods development pipeline as the reference script. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4992 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-13 18:15:55 +00:00
carneiro	9e93091e9a	-baqGOP now takes phred scaled scores instead of probabilities in the command line. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4982 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-13 00:06:38 +00:00
kshakir	8ba3a5a43f	Command lines for locally run Queue jobs no longer have to be escaped differently than bsub'ed jobs. GSA-410 Local job runs now can run command lines longer than than 4096 on our linux machines. When determining if the help text and Queue extensions need to be rebuilt, use the .class files not the .java so that GATK oneoffs are picked up correctly. Added the most basic of all example QScripts for debugging, Hello World. Minor updates to copy/pasted LSF code to reduce ant javadoc warnings by a third. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4970 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-10 21:07:29 +00:00
kshakir	b34e2f733f	Removed stochasticity from IndelRealigner by random sampling using and seed based on the read list. Updated the Queue scatter/gather for read walkers to include -L unmapped on the last scatter job when intervals aren't specified, and to map it correctly when it is explicitly set. Simplified the build.xml/ivy.xml to fix a bug reported with "ant clean dist test" where the scalac target wasn't found. Now building all scala code at the same time, just like all java code is compiled at the same time. Sped up the build for everyone by uncommenting a small bit of classes so that javac/scalac will not constantly launch trying to build .class files that will never compile. Moved some source files to their expected location so that the .java/.scala -> .class is a one-to-one match, again keeping the compilers from wasting cycles. Used <uptodate> and <touch> to skip extracting the help text and generating the GATK Queue extensions when the source files haven't been modified. Fixed a couple errors when the <javadoc> task is run. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4963 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-07 22:03:36 +00:00
chartl	3e7802a3e0	Minor changes to a qscript and the GQ constants on PrivatePermutations git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4956 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-07 18:26:21 +00:00
carneiro	5e9a8f9cb3	Implemented a new argument (-DQS --defaultQualityScore) that allows GATK to deal with BAM files missing quality scores. If a value is specified, all reads are filled with the default quality score. Appropriate exception is thrown if -DQS is not provided and BAM file doesn't have quality scores for every base. Adding the first version of the techdev pipeline (tdPipeline) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4943 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-05 22:25:08 +00:00
rpoplin	20f29e4690	In the Methods development pipeline the call confidence threshold must be lowered from the default value for lowpass calling. What a bone-headed mistake! git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4941 348d0f76-0448-11de-a6fe-93d51630548a	2011-01-05 20:30:55 +00:00
corin	6d809321d3	Updating combien variants memory limit and dcov default for the full calling pipeline. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4907 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-24 03:06:50 +00:00
depristo	5265f943b0	phasing per sample. tmp checkin git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4898 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-22 20:14:06 +00:00
corin	e7569cfe6f	Updated dbsnp version usage. Calling with 132, but still using 129 for eval to maintain consistant known/novel eval git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4895 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-22 17:37:27 +00:00
chartl	2235245af0	PrivatePermutations generalized to compute transition counts and average probabilities (and thus was renamed). Changes in some pipelines to reflect the change. Bugfix in the batch merging pipeline (it would halt because the allele VCF for genotyping batches could become off-spec). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4894 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-22 15:16:15 +00:00
rpoplin	7185fcb47b	Committing my notes about the methods development pipeline so we stay synced up while I'm on vacation. Cheers. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4891 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-21 21:14:20 +00:00
chartl	80770dc032	Expanded target pipeline complete. Stop trying to be clever about scatter-gather; wait until functional SG is built-in to Q. Til then, a lazy version of the fullCallingPipeline. Seems to take a long time to generate the graph though... git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4888 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-21 00:56:16 +00:00
kshakir	758d14a261	Checking in scripts used for testing the linear index MAX_FEATURES_PER_BIN. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4887 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-20 21:25:36 +00:00
chartl	fc33901810	Graph structure must be known at compile time. Removing GroupIntervals until a future point where in-process-functions can predict their output based on inputs [though this is probably forever: the inputs may not exist at compile time!] git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4886 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-20 21:22:58 +00:00
chartl	61d5daa65c	EXTREME interval processing. Still undergoing testing. + GroupIntervals allows user-defined scattering (e.g. take an interval list file, split it into k smaller interval list files by number of lines) + ExpandIntervals expands the intervals, either by widening them, or allowing the definition for nearby intervals (e.g. flanks starting 1bp before and after, ending 10bp after that) + IntersectIntervals takes n interval lists, writes 1 interval list that is the n-way intersection of all of them git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4885 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-20 19:42:50 +00:00
rpoplin	4ca1da1d07	Updating the NA12878.HiSeq bam file to be the correct bam file in the methods development qscript. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4879 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-20 14:53:10 +00:00
rpoplin	8fac346ac1	Misc cleanup in Methods Development Qscript git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4878 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-20 04:24:25 +00:00
rpoplin	34ab5b4889	Turning on BAQ in Methods Development pipeline. A new dataset is added: 363 EUR samples from the November 1000G release. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4877 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-19 21:13:25 +00:00
chartl	8118a439c0	Commit for Khalid git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4876 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-18 22:24:18 +00:00
rpoplin	15a33545f4	Updating Methods development pipeline qscript with the bam lists for all the data sets. It is ready for people to start running with it. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4875 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-18 22:19:14 +00:00
chartl	6db86ae0c6	PAC sets the refseq file to optional. Additional ipf for expanding interval lists. Either there's heavy latency on the toro -- or it's not working properly yet (e.g. the system print in the same scope as the file print outputs a line, but no file shows up on the system) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4874 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-18 19:04:52 +00:00
corin	f0ab7b849a	Adding a window size variable to avoid indel genotyper error git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4873 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-18 04:19:54 +00:00
rpoplin	bdef4e775a	Initial checkin of methods development pipeline qscript. It allows the methods dev team to run an overnight job which calls and recalibrates a variety of data sets and allows for an end-to-end sanity check of final results for potential changes to the methods. It isn't meant to be used by anybody quite yet, but shows the general structure and flow. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4871 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-17 22:14:02 +00:00
kshakir	6f29a9dbb4	Updated pipeline test to match Eric's changes in r4846. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4868 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-17 19:52:09 +00:00
rpoplin	095fc1922a	By popular demand I'm adding the qscript I used to do the 660 bamfile 1000G calling for ASHG. It does cleaning, BAQing, and merging in 3mb chunks genome-wide then calls SNPs on those temporary bams. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4866 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-17 18:49:03 +00:00
depristo	32d5397c01	Experimental support for sided annotations. Currently not more/less valuable than two-tailed testing. Future experiments are needed git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4864 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-17 15:08:31 +00:00
chartl	0d18bd1011	Now that addAll() is in the superclass, no longer need this definition (which, without override, prevents the script from compiling anyway) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4862 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-17 05:36:31 +00:00
chartl	fd1d817d45	Cryptic implementation of base-string entropy. I suspect this scales ~linearly with length, so I may choose to normalize in the future. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4861 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-16 22:25:05 +00:00
kshakir	3a6d1dbcef	Fixed a class initializer crash on shutdown when the graph has nothing to run. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4860 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-16 18:56:55 +00:00
chartl	08710fc71e	A successor to the Design File Generator and GCContent walkers; allows for refseq/other metadata annotation of intervals, and calculates reference GC content and entropy of the interval. Compiles, but as yet untested and incomplete (but my repositories are kinda messed up so i'm committing this to blow 'em away and re checkout git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4857 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-16 18:14:03 +00:00
chartl	3e75431bc8	Thanks to mark: VCFInfoToTable removed in favor of a more flexible walker. Slight change to the argument structure of the walker to make it play more nicely with Queue: the field list parsing is pushed into the command line system (e.g. the variable is exposed as a List<String> and not a String, so Queue doesn't have to join a list into a string only to have it broken out again. This also allows the user to specify -F field1 -F field2 -F field3 if he/she so desires. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4842 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-15 03:33:36 +00:00
chartl	2217837845	Commit for Khalid -- should be a scala version of vcf2table but for some reason the run method isn't getting called. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4841 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-15 00:44:15 +00:00
chartl	f36861eeee	One more little bfix -- the issue was not the grep command, but instead the NFS in the awk; i changed it to ++count in the last commit which was really responsible for the fix. Then this ultra-escaping semi-broke teh grep again. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4831 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-13 20:36:14 +00:00
chartl	d34c5640d2	Bugfix for clf version of extract samples. Due to dynamic shell creation and bsubs and whatnot, the OR pipe for grep ("a\|b") needs to be super-escaped ("a\\\\\\\\\|b"). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4829 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-13 19:06:30 +00:00
chartl	f795b25c47	In-process versions of sample extraction and interval-list conversion for VCF files. Required an in-process-function branch of the queue library. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4827 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-13 17:36:53 +00:00
depristo	e219f6a4b5	Q script to run VQSR on a whole variety of common data sets. To be used as a basis for general methods development pipeline git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4826 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-13 16:55:52 +00:00
chartl	7bc2049031	Updates and bug fixes to private mutations qscript and pipeline libraries. Hand filter strings are now not busted (boo to having to escape quotes); convenience method added to VariantCalling to propagate standard trait data to a given GATK command line -- should be made more scala-esque in the future. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4824 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-13 04:55:13 +00:00
chartl	cf75caf653	java changes: VariantEvalWalker's logger is made public, so that variant eval modules can access it through the parent object. DesignFileGenerator comment lists how best to bind things to it, and the feature accessor is better refined to grab the genome loc. (old change) scala changes: convenience addAll( List[CommandLineFunction] ) added to QScript class (and thus removed from the fCPV2) useful command line functions added to a new library package for command line functions (these are fast simple VCF command lines) bug fixed in ProjectManagement for the class where there's only one batch to be batch-merged (not really part of the use-case, but an edge-condition that came up during pipeline testing) first draft of a private mutations pipeline which will be elaborated in future git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4823 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-12 05:10:45 +00:00
chartl	81290d238d	Restructuring my qscripts git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4821 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-11 20:58:45 +00:00
kshakir	56433ebf6b	Switched from LSF command line wrappers to JNA wrappers around the C API. Side effects: - bsub command line is no longer fully printed out. - extraBsubArgs hack is now a callback function updateJobRun. Updated FullCallingPipelineTest to reflect latest changes to fullCallingPipeline.q. Added a pipeline that tests the UGv2 runtimes at different bam counts and memory limits. Updated VE packages that live in oneoffs to compile to oneoffs. Added a hack to replace the deprecated symbol environ in Mac OS X 10.5+ which is needed by LSF7 on Mac. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4816 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-10 04:36:06 +00:00
corin	27acede64d	Removing old arguments. We'll now be running with the defaults. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4811 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-09 18:58:56 +00:00
chartl	f8dd59c1d1	Tightening of the batch merging pipeline. Optimized to run on hour queue, so please: if you run this, crush 'hour' with it. Testing is forthcoming, but it merged 700 samples overnight. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4805 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-08 14:36:23 +00:00
chartl	02de9a9764	With multi-sample genotyping must come scatter+gather. Also Khalid informed me of the .group(size) method, so removing my useless (but pretty) code. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4797 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-06 20:12:23 +00:00
chartl	f4c43f013f	Due to the overhead for reading VCF files (>32g for 700 5MB VCF files), batched merging has to generate likelihoods in batches. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4796 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-06 18:23:54 +00:00
chartl	0944184832	Major refactoring of library and full calling pipeline (v2) structure. Arguments to the full calling qscript (and indeed, any qscript that wants them) are now specified via the PipelineArgumentCollection Libraries require a Pipeline object for instantiation -- eliminating their previous dependence on yaml files Functions added to PipelineUtils to build out the proper Pipeline object from the PipelineArgumentCollection, which now contains additional arguments to specify pipeline properties (name, ref, bams, dbsnp, interval list); which are mutually exclusive with the yaml file. Pipeline length reduced to a mere 62 lines. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4790 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-05 02:33:54 +00:00
corin	bdc7516168	Taking out recalibrating for now, since having these files is confusing people and we've not gone to dbsnp 132 yet so cluster generation's broken with these command lines. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4786 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-03 22:12:09 +00:00
kshakir	c7dbf66d41	Added a javaMemoryLimit option for cases where the java -Xmx memory should be lower than the bsub memory limit. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4778 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-02 22:38:06 +00:00
chartl	670ae814b3	Get rid of files from the grep string git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4773 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-02 18:39:59 +00:00
chartl	220fb0c44a	Added a pipeline for merging batches. For now takes a file containing a list of VCFs, and a file containing a list of bams. Does not do anything smart (e.g. if you leave out some .bams or add some extra ones, you will not be warned). Heavy lifting done in (the beginnings of) a library for managing multi-batch or multi-project tasks. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4771 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-02 07:31:59 +00:00
chartl	9f03f09cc9	Changes to V2 pipeline and libraries. AB dropped. Cleaning enabled. Project name now properly propagated to intermediate files (instead of the string repr of the object). Indel mask is now expanded prior to filtering at indels. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4769 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-01 18:55:48 +00:00
chartl	06a0fb4489	Library-ized pipeline now functions git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4759 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-30 21:34:59 +00:00
kshakir	e21a66d876	Updated the Queue GATK generator and packaging to include more dependencies for fullCallingPipeline.q. Set the -bigMemQueue in the FullCallingPipelineTest to GSA to avoid waiting for the week queue when it is busy. Fixed the package definition of PipelineTest so that scalac won't recompile it every time. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4755 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-30 15:29:40 +00:00
ebanks	4413208c45	Removing unnecessary and incorrect includes git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4752 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-30 02:06:48 +00:00
ebanks	d89e17ec8c	Fare thee well, UGv1. Here come the days UGv2. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4747 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-29 21:51:19 +00:00
kshakir	6f8cd97673	Added a ten sample 1000G whole exome test along with SimpleMetricsBySample to the pipeline validation. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4737 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-26 23:17:23 +00:00
corin	6b70cde0b9	Adding a forgotten quote mark git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4729 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-24 16:38:27 +00:00
corin	e15d18129c	Adding by sample metrics. Not sure why we didn't have this in here in the first place git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4723 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-23 21:36:03 +00:00
corin	fe28f8da9c	Removing Uniquify from main pipeline indel merge, since the pipeline isn't merging from samples with the same name anyway. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4721 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-23 17:25:22 +00:00
kshakir	787e5d85e9	Added the ability to test pipelines in dry or live mode via 'ant pipelinetest' and 'ant pipelinetest -Dpipeline.run=run'. Added an initial test for genotyping chr20 on ten 1000G bams. Since tribble needs logging support too, for now setting the logging level and appending the console logger to the root logger, not just to "org.broadinstitute.sting". Updated IntervalUtilsUnitTest to output to a temp directory and not the SVN controlled testdata directory. Added refseq tables and dbsnps to validation data in BaseTest. Now waiting up to two minutes for gather parts to propagate over NFS before attempting to merge the files. Setting scatter/gather directories relative to the -run directory instead of the current directory that queue is running. Fixed a bug where escaping test expressions didn't handle delimiters at the beginning or end of the String. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4717 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-22 22:59:42 +00:00
kiran	28805d17ca	Commenting out allele-balance for now. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4715 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-22 16:48:08 +00:00
corin	8dca5bd861	Putting the annotation back in, both to the filters and to UG git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4709 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-18 21:02:15 +00:00
corin	da1fe5bb37	Removing the AB filter given that we don't have that in the VCF anymore git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4708 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-18 20:22:05 +00:00
kshakir	79725f2d9c	Excluding the QFunction log files from the set of files to delete on completion. When a QGraph is empty displaying a warning instead of crashing with an JGraph internal assertion error. Cleaned up code using the Log4J root logger and explicitly talking to a logger for Sting. When integration tests are run detecting that the logger has already been setup so that messages aren't logged twice. Updated from Ivy 2.2.0-rc1 to 2.2.0. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4707 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-18 20:22:01 +00:00
hanna	302cc13735	Trying out Queue for the first time. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4705 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-18 18:29:12 +00:00
corin	5466365575	Fixing a silly typo git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4680 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-15 18:16:51 +00:00
corin	a64f693b20	Updated pipeline script to include dbSnp for UG git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4679 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-15 18:09:47 +00:00
kshakir	302e8f0239	Fixed bug where the command directory was not being set to an absolute path, leading LSF to write some .done files to /tmp. No longer using the command directory for temporary .done files, and instead using the user specified temporary directory. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4678 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-15 17:59:39 +00:00
kshakir	801c562909	Now actually checking in the integration test mentioned in the prior commit: compiles the full calling pipeline. Removed QScript usages of VariantRecalibrator's -reportDatFile, --report_dat_file git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4668 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-14 04:27:10 +00:00
kshakir	673fa841a4	Updated PluginManager so that during testing Queue can dynamically compile and load separately multiple class directories into the same class loader. Removed obsolete usages of PackageUtils with updated PluginManager. Ported Queue interval utilities written in scala over to Sting's java IntervalUtils. Added a very basic intergration test to ensure that the fullCallingPipeline.q compiles. Added options to specify the temporary directories without having to use -Djava.io.tmpdir (useful during the above integration test). While adding tempDir added options to specify the run directory from the command line, for example "-runDir v1". Upgraded to scala 2.8.1 and updated calls to deprecated functions. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4661 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-12 20:14:28 +00:00
kshakir	f35d1aa43f	Moving all file cleanup to IOUtils for easier debugging. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4646 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-10 21:00:58 +00:00
hanna	8e36a07bea	Convert GenomeLocParser into an instance variable. This change is required for anything that needs to be simultaneously aware of multiple references, eg Queue's interval sharding code, liftover support, distributed GATK etc. GenomeLocParser instances must now be used to create/parse GenomeLocs. GenomeLocParser instances are available in walkers by calling either -getToolkit().getGenomeLocParser() or -refContext.getGenomeLocParser() This is an intermediate change; GenomeLocParser will eventually be merged with the reference, but we're not clear exactly how to do that yet. This will become clearer when contig aliasing is implemented. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4642 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-10 17:59:50 +00:00
chartl	c19f567424	Sometimes, inputs are really outputs in disguise. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4631 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-05 19:51:16 +00:00
chartl	0e40321a52	Brütall hack: make the bam list creator job wait for the interval creator job, so that there is an implicit dependency of UG on the interval list, by way of the bam list git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4628 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-04 20:43:11 +00:00
chartl	cb0b2f9811	My analysis script for private mutations. I'm committing it because it contains a number of specialized command line functions that could prove useful in the future. (For example: ConcatVCF and ExtractSample) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4626 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-04 19:57:27 +00:00
chartl	42e9987e69	Bug fix to GenotypeConcordance. AC metrics get instantiated based on number of eval samples; if Comp has more samples, we can see AC indeces outside the bounds of the array. Bug fix to LiftoverVariants - no barfing at reference sites. AlleleFrequencyComparison - local changes added to make sure parsing works properly Added HammingDistance annotation. Mostly useless. But only mostly. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4622 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-03 19:23:03 +00:00
hanna	861ee3e37a	Changing testing framework from junit -> testng, for its enhanced configurability. Initial test to see how Bamboo will respond. More detailed email to follow. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4609 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-01 21:31:44 +00:00
kshakir	d768c6558d	Now that the user is required to set the java temp directory, it is safer for the LsfJobRunner to write to the java temp directory instead of the command directory. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4593 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-28 15:00:21 +00:00
kshakir	5cdd7a7ba4	There's no such thing as a sam index, so the GATK extension generator doesn't need to add an @Input for them. Updated a call to swapExt to specify the directory. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4586 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-27 20:39:03 +00:00
hanna	4c23b1fe9c	Get rid of the static cache of ArgumentTypeDescriptors by making them an integral part of the parsing engine. Hugely lowers our memory footprint in integrationtests, but not yet enough to run Mark's new parallelized VariantEvalIntegrationTests. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4585 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-27 19:44:55 +00:00
corin	6d7ed5781c	Added Dbsnp to Indel Realigner; added known indels rod-binding to realigner. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4576 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-26 22:22:28 +00:00
kshakir	8211cee0b2	Queue UI Improvements: - Forcing user to set the temp directory via -Djava.io.tmpdir to avoid filling up /tmp. - By default deleting job outputs tagged as intermediate. - Defaulting pipeline to scatter count 1 (no reads deleted). - Cleaning up temp classes even when scripting fails. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4573 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-26 19:49:08 +00:00
kshakir	80259b9e20	Changed fullCallingPipeline to output all contigs in the refence if scattering. When the cleaner interval scatter count is set to one explicitly setting the intrevals to Nil. TODO: Need to add an option that lets the user choose from the command line to scatter all contigs or just those in the intervals list. For now can get relatively the same behavior by setting the interval scatter count equal to the number of contigs+1, assuming the random contigs come at the end of the sequence dictionary. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4565 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-24 03:01:06 +00:00
kshakir	e9c6f681a4	Instead of the pipeline's cleaner only writing BAMs with the target intervals, now pulling the list of contigs from the target intervals and outputing reads in those contigs. Added a brute force -retry <count> option to Queue for transient errors. Waiting up to 2 minutes for the LSF logs to appear before trying to display the errors from the logs. Updates to the local job runner error logging when a job fails. Refactored QGraph's settings as duplicate code was getting out of control. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4563 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-22 22:22:30 +00:00
kshakir	b954a5a4d5	- After removing special code for intervals, instead of being of type File they are generated as List[File]. Changed previous checkin that was appending to this list and instead assigning a singleton list. - More cleanup including removing the temporary classes and intermediate error files. Quieting any errors using Apache Commons IO 2.0. - Counting the contigs during the QScript generation instead of the end user having to pass a separate contig interval list. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4539 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-21 06:37:28 +00:00
kshakir	88a0d77433	Changed parsing engine to store the order the argument bindings based on their definition in the class, moving "-T" to the front of Queue command lines. Queue GATK generated .intervals is now a List(File) again removing special case handling in the generator. Instead of using @Scatter annotation, using ScatterFunction instance to determine if a job can be scattered. Implemented special VcfGatherFunction which only uses the header from the first file, even if the other files differ in their headers. Added a -deleteIntermediates to Queue to delete the outputs from intermediate commands after a successful run. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4536 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-20 21:43:52 +00:00
kshakir	81479229e1	QScript authors can now tag functions as intermediate. Functions tagged as intermediate will be skipped unless another function in the graph needs their output. Re-logging the failed jobs and the path to their log files at the end of a run. Added a parameter -bigMemQueue for the fullCallingPipeline.q instead of hardcoding gsa (gsa was backed up and it was actually faster to run on week). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4520 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-18 22:11:14 +00:00
chartl	2bc5971ca1	Added - a tool to fix reference bases of a VCF. The OMNI had a couple of sites with incorrect reference bases (look to be legacy from other chips), and a few more that had ref and alt flipped. GAP should probably take care of it, but since I need results by monday, I'm doing it. Modified - SelectVariants: Hook up to VariantContextUtils to recalculate AC/AF/AN, which uses the accessor in VariantContext to do this. Somehow sites that were selected down to hom-ref genotypes only wound up getting positive AC. IMPORTANT I kind of need input here. The header of a file used for an integration test specifies AC as being an integer. Recalculating it casts it into an integer list (which it should be, as it allows for alternate alleles). However this appears to clash with what the jexl expression is looking for? For now, the integration test itself needed to be changed -- it's unclear what to do when the header specifies AC of being one class, but recalculating it casts to another class, and I'm not sure what to do. I'm committing my omni_qc pipeline because I'm almost certain 2 months down the road I'm going to wonder what the heck I did to generate my results. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4511 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-17 03:18:01 +00:00
kshakir	9dc2e931b6	Saving the order functions are added to in the QScript. Using the order during submission of ready jobs (but not currently dryrun) and during -status. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4508 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-15 20:00:35 +00:00

1 2 3 4 5 ...

357 Commits (7b452ea2b9ad2d2f3e8bbfcbfb0e818f0a87f18d)