gatk-3.8

Commit Graph

Author	SHA1	Message	Date
Mauricio Carneiro	945cf03889	IntelliJ ate my import!	2012-01-23 21:46:45 -05:00
Mauricio Carneiro	2bb9525e7f	Don't set base qualities if fastQ is provided * Pacbio Processing pipeline now works with the new fastQ files outputted by the Pacbio instrument	2012-01-23 17:57:29 -05:00
Khalid Shakir	c18beadbdb	Device files like /dev/null are now tracked as special by Queue and are not used to generate .out file paths, scattered into a temporary directory, gathered, deleted, etc. Attempted workaround for xdr_resourceInfoReq unsatisfied link during loading of libbat.so.	2012-01-23 16:17:04 -05:00
Ryan Poplin	75f87db468	Replacing Mills file with new gold standard indel set in the resource bundle for release with v1.5	2012-01-17 15:02:45 -05:00
Mauricio Carneiro	5bf960deb8	adding dbsnp to indel VQSR	2012-01-10 12:38:49 -05:00
Mauricio Carneiro	6f2abd76df	Updating the MDCP with the new indel gold standard from Ryan.	2012-01-09 15:31:18 -05:00
Khalid Shakir	5793625592	No more "Q-<pid>@<host>". Generated log file names now use the first output + ".out" (ex. my.vcf.out) or the name of the first QScript plus the order the function was added (ex. MyScript-1.out). The same function added twice with the same outputs will now have the same default logs, meaning the 2nd instance of the function won't be added to the graph twice. QScript accessor to QSettings to specify a default runName and other default function settings. Because log files are no longer pseudo-random their presense can be used to tell if a job without other file outputs is "done". For now still using the log's .done file in addition to original outputs. Gathered log files concatenate all log files together into the stdout. InProcessFunctions now have PrintStreams for stdout and stderr. Updated ivy to use commons-io 2.1 for copying logs to the stdout PrintStream. Removed snakeyaml. During graph tracking of outputs the Index files, and now BAM MD5s, are tracked with the gathering of the original file. In Queue generated wrappers for the GATK the Index and MD5s used for tracking are switched to private scope. Added more detailed output when running with -l DEBUG. Simplified graphviz visualization for additional debugging. Switched usage of the scala class 'List' to the trait 'Seq' (think java.util.ArrayList vs. using the interface java.util.List) Minor cleanup to build including sending ant gsalib to R's default libloc.	2012-01-08 12:11:55 -05:00
Mauricio Carneiro	f6a18aea63	Updated MDCP with INDEL best practices * chose 90.0 indel cut target for most datasets (this is arbitrary).	2012-01-06 17:21:59 -05:00
Khalid Shakir	7486696c07	When using bam list mode in HSP deriving VCF name from bam list instead of requiring an additional parameter. Creating a single temporary directory per ant test run instead of a putting temp files across all runs in the same directory. Updated various tests for above items and other small fixes.	2011-12-16 18:09:25 -05:00
Mauricio Carneiro	663184ee9d	Added test mode to PPP * in test mode, no @PG tags are output to the final bam file * updated pipeline test to use -test mode. * MD5s updated accordingly	2011-12-12 18:29:06 -05:00
Mauricio Carneiro	a3c3d72313	Added test mode to DPP * in test mode, no @PG tags are output to the final bam file * updated pipeline test to use -test mode. * MD5s are now dependent on BWA version	2011-12-12 18:29:06 -05:00
Mauricio Carneiro	cca8a18608	PPP pipeline test * added a pipeline test to the Pacbio Processing Pipeline. * updated exampleBAM with more complete RG information so we can use it in a wider variety of pipeline tests * added exampleDBSNP.vcf file with only chromosome 1 in the range of the exampleFASTA.fasta reference for pipeline tests	2011-12-11 17:32:21 -05:00
Mauricio Carneiro	21ac3b59d7	Merged bug fix from Stable into Unstable	2011-12-09 16:51:46 -05:00
Mauricio Carneiro	13905c00b3	Updating PacbioProcessingPipeline to new Queue standards	2011-12-09 16:51:02 -05:00
David Roazen	d014c7faf9	Queue now properly escapes all shell arguments in generated shell scripts This has implications for both Qscript authors and CommandLineFunction authors. Qscript authors: You no longer need to (and in fact must not) manually escape String values to avoid interpretation by the shell when setting up Walker parameters. Queue will safely escape all of your Strings for you so that they'll be interpreted literally. Eg., Old way: filterSNPs.filterExpression = List("\"QD<2.0\"", "\"MQ<40.0\"", "\"HaplotypeScore>13.0\"") New way: filterSNPs.filterExpression = List("QD<2.0", "MQ<40.0", "HaplotypeScore>13.0") CommandLineFunction authors: If you're writing a one-off CommandLineFunction in a Qscript and don't really care about quoting issues, just keep doing things the direct, simple way: def commandLine = "cat %s \| grep -v \"#\" > %s".format(files, out) If you're writing a CommandLineFunction that will become part of Queue and will be used by other QScripts, however, it's advisable to do things the newer, safer way, ie.: When you construct your commandLine, you should do so ONLY using the API methods required(), optional(), conditional(), and repeat(). These will manage quoting and whitespace separation for you, so you shouldn't insert quotes/extraneous whitespace in your Strings. By default you get both (quoting and whitespace separation), but you can disable either of these via parameters. Eg., override def commandLine = super.commandLine + required("eff") + conditional(verbose, "-v") + optional("-c", config) + required("-i", "vcf") + required("-o", "vcf") + required(genomeVersion) + required(inVcf) + required(">", escape=false) + // This will be shell-interpreted required(outVcf) I've ported the Picard/Samtools/SnpEff CommandLineFunction classes to the new system, so you'll get free shell escaping when you use those in Qscripts just like with walkers.	2011-12-01 18:13:44 -05:00
Mauricio Carneiro	dbd8c25787	No more R resources in the DPP updating the DPP to conform with Analyze Covariates changes.	2011-10-28 16:57:01 -04:00
Mauricio Carneiro	86305a5dcf	Adjusting the memory limits of the MDCP Indel caller needs more than 3G for large datasets.	2011-10-21 17:41:52 -04:00
Mauricio Carneiro	c9d8b22092	Added BWASW support to the pipeline Data Processing Pipeline can now use BWASW for realigning the reads. Useful for Ion Torrent data.	2011-10-20 18:36:28 -04:00
Mauricio Carneiro	093cd95c5d	Merged bug fix from Stable into Unstable	2011-10-20 17:03:22 -04:00
Mauricio Carneiro	d7367c152a	Fixing 'revert' when not realigning RevertSam was reverting the alignment information and that was screwing up the pipeline if you didn't want to run it with BWA. Fixed.	2011-10-20 17:01:54 -04:00
Mauricio Carneiro	ed402588cc	Adding the "gold standard NA12878" target	2011-10-20 16:19:13 -04:00
Mauricio Carneiro	0939d16a8d	String not empty bug Apparently var X: String = _ is not the same as var X: String = "". :(	2011-10-13 13:22:05 -04:00
Mauricio Carneiro	66b5646f95	Adding hidden options to the DPP controlling the default platform parameter to Count Covariates and the number of scatter gather jobs to generate are now available under hidden parameters	2011-10-11 13:56:00 -04:00
Mark DePristo	a91509e7dd	Shouldn't be public	2011-10-05 15:22:57 -07:00
Mauricio Carneiro	d3cc25454c	Updating the MDCP	2011-09-22 11:27:40 -04:00
Mauricio Carneiro	623c49765d	NO BAQ ON EXOMES! says the boss.	2011-09-22 11:13:40 -04:00
Ryan Poplin	5d0f284305	Fixing exome specific arguments to the VQSR in the methods development calling pipeline	2011-09-21 20:26:28 -04:00
Mauricio Carneiro	758ecf2d43	Bringing latest updates of ReduceReads to the master repository	2011-09-20 16:35:09 -04:00
Mauricio Carneiro	08ffb18b96	Renaming datasets in the MDCP Making dataset names and files generated by the MDCP more uniform.	2011-09-20 11:02:51 -04:00
Eric Banks	ba150570f3	Updating to use new rod system syntax plus name change for CountRODs	2011-09-19 13:30:32 -04:00
Eric Banks	85626e7a5d	We no longer want people to use the August 2010 Dindel calls for indel realignment but instead Guillermo's new whole genome bi-allelic indel calls; updating the bundle accordingly. Also, there was some confusion by the 1000G data processing folks as to exactly what these indel files are, so I've renamed them so that it's clear. Wiki updated too.	2011-09-19 12:24:05 -04:00
Khalid Shakir	33967a4e0c	Fixed issue reported by chartl where cloned functions lost tags on @Inputs. Updated ExampleUnifiedGenotyper.scala with new syntax.	2011-09-16 12:46:07 -04:00
Ryan Poplin	981b78ea50	Changing the VQSR command line syntax back to the parsed tags approach. This cleans up the code and makes sure we won't be parsing the same rod file multiple times. I've tried to update the appropriate qscripts.	2011-09-12 12:17:43 -04:00
Mauricio Carneiro	7f9000382e	Making indel calls default in the MDCP You can turn off indel calling by using -noIndels.	2011-09-09 14:09:26 -04:00
Mauricio Carneiro	ee9d599558	Just cleaning up clean up old commented code from tha data processing pipeline.	2011-09-07 13:32:40 -04:00
Mauricio Carneiro	28d782b4c7	Allowing multiple dnsnp and indel files in the DPP	2011-09-02 13:38:47 -04:00
Mauricio Carneiro	ad4ea0b80b	Merged bug fix from Stable into Unstable	2011-09-01 18:14:45 -04:00
Mauricio Carneiro	e253f6f05d	Fixing typo in DPP platform and library were exchanged when rebuilding the read group information	2011-09-01 18:13:52 -04:00
Mauricio Carneiro	d2a33beff7	Added WGS/WEX b37-decoy CEU trio datasets	2011-09-01 13:14:40 -04:00
Mauricio Carneiro	16caca0822	BLASR BAMs and new BWA parameters Added the functions to turn a BLASR generated BAM file into a usable BAM file. Modified the bwa parameters according to test results from NA12878 pb2k dataset.	2011-08-24 17:04:07 -04:00
Mauricio Carneiro	dc8398e165	fixing bai output for indel cleaning.	2011-08-24 15:58:34 -04:00
Mauricio Carneiro	cd12f7f286	Fixed list dependency Instead of creating a bam list file, I dynamically create a scala list and pass as parameters. This way the intermediate bam files don't get deleted before they should.	2011-08-24 11:12:46 -04:00
Mauricio Carneiro	219252a566	Adapting to the new RodBinding framework	2011-08-24 11:12:46 -04:00
Mauricio Carneiro	136f0eb685	Creating sample-bam list instead of joining This should save us at least one day in the trio decoy processing.	2011-08-22 18:03:39 -04:00
Mauricio Carneiro	04d8bcaf19	Fixed bai removal on picard tools BAM index files were not being deleted because picard replaces the name of the file with bai instead of appending to it.	2011-08-22 18:03:39 -04:00
Mauricio Carneiro	caebc88e9a	Consensus mode and new RodBinding framework. The DPP was not using the parameter correctly. It didn't matter for the default option (which is the only one we have been testing) but it would not work for knowns only or smith waterman. It is fixed now. It now complies with the new rod binding framework.	2011-08-22 18:03:39 -04:00
Ryan Poplin	f93a554b01	updating exome specific parameters in MDCP	2011-08-21 10:25:36 -04:00
Ryan Poplin	b008676878	fixing the previous fix	2011-08-20 21:21:55 -04:00
Ryan Poplin	539e157ecd	Fixing misc parameters in MDCP. The pipeline now does VariantEval of output by default. Fix for NaN vqslod values in VQSR	2011-08-20 11:28:48 -04:00
Ryan Poplin	ddb5045e14	Updating the methods development calling pipeline for the new rod binding syntax and the new best practices.	2011-08-19 19:29:51 -04:00
Mauricio Carneiro	b0ff5b1ff7	a better name for the pacbio processing pipeline	2011-08-10 16:16:53 -04:00
Mauricio Carneiro	481630da00	BWA parameters added	2011-08-09 17:05:24 -04:00
Mauricio Carneiro	22d2563823	added BWA SW alignment The pipeline now accepts fasta/fastq files and aligns them using BWA SW, adds default basequalities, creates read groups and performs BQSR.	2011-08-09 17:05:24 -04:00
Mauricio Carneiro	bd1cf4c7bc	Pacbio Pipeline Added the base quality "filling" step to allow the pipeline to handle raw pacbio BAM files. This is the first step towards a generic pacbio data processing pipeline.	2011-08-09 17:05:24 -04:00
Ryan Poplin	8072bd9831	Updating resource bundle generation qscript for changeover to git	2011-08-08 12:35:39 -04:00
Mauricio Carneiro	2fd101135c	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/stable	2011-08-08 10:49:43 -04:00
Mauricio Carneiro	4d6cb33612	removing temporary bam index The clean bai file was left behind after the data processing pipeline was done	2011-08-08 10:49:28 -04:00
Ryan Poplin	21dc9a5543	Adding mills/devine indel dataset to the resource bundle	2011-08-04 12:31:28 -04:00
Mauricio Carneiro	aff681e407	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/stable	2011-08-04 11:05:25 -04:00
Mauricio Carneiro	23ec5b94cf	fixed a missing check for null There was a missed check for the case when you don't provide an indels vcf for the cleaner.	2011-08-04 09:50:02 -04:00
Mauricio Carneiro	8981367307	Updating memory usage for picard programs	2011-08-03 15:48:28 -04:00
Khalid Shakir	a587f38808	Fixed example unified genotyper pipeline to wrap filter expressions with quotes and use rod binding name "variant" instead of "vcf".	2011-08-03 02:21:01 -04:00
Mauricio Carneiro	2d94037ad0	Remove temporary index files (*.bai) some temporary index files were not being removed.	2011-07-30 02:05:22 -04:00
Mauricio Carneiro	dcf21f379a	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/stable	2011-07-23 12:59:53 -04:00
Mauricio Carneiro	f0a6dd27a1	Renaming the plot output directory names.	2011-07-23 12:59:37 -04:00
Mauricio Carneiro	4f78025b0b	Merged bug fix from Stable into Unstable	2011-07-22 14:42:04 -04:00
Mauricio Carneiro	4080e2cd88	* Added the decoy reference to the bundle under the b37 resources. * Updated the -svn argument to -ver since we don't use svn anymore (also updated the wiki).	2011-07-22 14:41:22 -04:00
Mauricio Carneiro	9ad5c7dfa4	Resolving simple conflicts in the data processing pipeline. Conflicts: public/scala/qscript/org/broadinstitute/sting/queue/qscripts/DataProcessingPipeline.scala	2011-07-19 08:05:11 -04:00
Mauricio Carneiro	7688bda1a6	better progress report for the DPP	2011-07-18 23:39:47 -04:00
Mauricio Carneiro	2b465ab43b	* added optional 'no validation' for the Data Processing pipeline. * some simplifications on the picard classes	2011-07-18 23:30:31 -04:00
Mauricio Carneiro	4cf7a2af23	Removed broad specific default paths so people from outside the broad can use it.	2011-07-18 23:25:21 -04:00
Mauricio Carneiro	5cb5a4ec75	Merged bug fix from Stable into Unstable	2011-07-16 00:23:59 -04:00
Mauricio Carneiro	dd92a14b40	Made extra indel VCF optional but DBSNP mandatory.	2011-07-16 00:23:35 -04:00
Mauricio Carneiro	2fa5dbb0fe	Merged bug fix from Stable into Unstable	2011-07-16 00:15:19 -04:00
Mauricio Carneiro	ed55182a4c	Removing Broad specific paths from parameters and making them required. This should make it unambiguous for people inside and outside the Broad to use the DataProcessingPipeline (as per request in the GetSatisfaction)	2011-07-16 00:09:00 -04:00
Mauricio Carneiro	43bd45fcad	Merged bug fix from Stable into Unstable	2011-07-15 19:40:02 -04:00
Mauricio Carneiro	fd1df31ef0	changing the output directory names for Analyze Covariates	2011-07-15 19:39:42 -04:00
Mauricio Carneiro	aa30f416a3	Resolving conflicts Conflicts: private/scala/qscript/depristo/ExomePostQCEval.scala private/scala/qscript/depristo/PostCallingQC.scala private/scala/qscript/org/broadinstitute/sting/queue/qscripts/archive/ExomePostQCEval.scala	2011-07-15 16:21:42 -04:00
Mauricio Carneiro	7b7d40d5d9	A better name for the qscript utilities. Throw here every method you find yourself repeatedly implementing in your qscripts! Refactoring appropriately.	2011-07-15 14:34:50 -04:00
Mauricio Carneiro	a670d6420a	Refactoring Qscript utils into queue general utils package.	2011-07-15 14:31:43 -04:00
Mauricio Carneiro	f19862a643	Fixing conflicts.	2011-07-14 17:13:31 -04:00
Mauricio Carneiro	43c6a8565b	looks better now.	2011-07-14 17:10:44 -04:00
Mauricio Carneiro	09ffe277ae	Added a qscripts util package with some utility functions commonly shared across queue scripts. Refactored some of my public scripts to use it in an effort to make queue scripts more reusable and "supportable".	2011-07-14 17:09:35 -04:00
Mauricio Carneiro	4f8230c750	Merged bug fix from Stable into Unstable	2011-07-14 16:44:57 -04:00
Mauricio Carneiro	9f5180ab05	Recalibrates a list of bam files allowing multiple bams to be recalibrated out of a single 'mother' queue job.	2011-07-14 16:42:17 -04:00
Mauricio Carneiro	df996a1a73	more progress report for the Data Processing Pipeline. Bam lists can now have empty lines, comments and whitespaces anywhere.	2011-07-13 14:53:58 -04:00
Mauricio Carneiro	ff4e31c554	Changing the file names as per Kris request.	2011-07-13 12:59:18 -04:00
Mauricio Carneiro	5298e3a942	Making the outputDir optional. Default = ./	2011-07-05 16:30:41 -04:00
Mauricio Carneiro	7d3dfdfdf2	Updating the MDCP to use the classpath for the GATK jar, removing -gatk parameter.	2011-07-05 16:30:10 -04:00
Mauricio Carneiro	b0fb63e20a	moving the example scala scripts to the qscripts package.	2011-07-01 16:14:59 -04:00
Mauricio Carneiro	d19351f71a	Added capability of running multiple bam files in the same directory.	2011-07-01 16:02:28 -04:00
Mauricio Carneiro	197b7141c1	Added an optional argument -bt <num_threads> for BWA to run multithreaded.	2011-06-30 14:41:57 -04:00
Mauricio Carneiro	f4463d38ca	BWA requires pair ended reads to be sorted by read names when operating over BAM files, but Picard sorts by coordinate, so in case we use BWA in pair ended reads, the pipeline now resorts the BAM in read name order, realigns it then sorts it in coordinate order.	2011-06-30 14:29:21 -04:00
Mauricio Carneiro	efd99c3c11	new home for the core qscripts	2011-06-30 11:32:06 -04:00

1 2 3

144 Commits (cdfd07f9eb4e2ca18b2b6b10d00797cd3a156ebd)