Commit Graph

330 Commits (cdfd07f9eb4e2ca18b2b6b10d00797cd3a156ebd)

Author SHA1 Message Date
Mauricio Carneiro 758ecf2d43 Bringing latest updates of ReduceReads to the master repository 2011-09-20 16:35:09 -04:00
Mauricio Carneiro 08ffb18b96 Renaming datasets in the MDCP
Making dataset names and files generated by the MDCP more uniform.
2011-09-20 11:02:51 -04:00
Eric Banks ba150570f3 Updating to use new rod system syntax plus name change for CountRODs 2011-09-19 13:30:32 -04:00
Eric Banks 095f75ff7d Merge branch 'master' of ssh://gsa1.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-19 12:24:12 -04:00
Eric Banks 85626e7a5d We no longer want people to use the August 2010 Dindel calls for indel realignment but instead Guillermo's new whole genome bi-allelic indel calls; updating the bundle accordingly. Also, there was some confusion by the 1000G data processing folks as to exactly what these indel files are, so I've renamed them so that it's clear. Wiki updated too. 2011-09-19 12:24:05 -04:00
Mark DePristo 6ea57bf036 Merge branch 'master' into sgintervals 2011-09-19 09:50:19 -04:00
Khalid Shakir 33967a4e0c Fixed issue reported by chartl where cloned functions lost tags on @Inputs.
Updated ExampleUnifiedGenotyper.scala with new syntax.
2011-09-16 12:46:07 -04:00
Ryan Poplin 981b78ea50 Changing the VQSR command line syntax back to the parsed tags approach. This cleans up the code and makes sure we won't be parsing the same rod file multiple times. I've tried to update the appropriate qscripts. 2011-09-12 12:17:43 -04:00
Mauricio Carneiro 7f9000382e Making indel calls default in the MDCP
You can turn off indel calling by using -noIndels.
2011-09-09 14:09:26 -04:00
Mark DePristo 06cb20f2a5 Intermediate commit cleaning up scatter intervals
-- Adding unit tests to ensure uniformity of intervals
2011-09-09 12:56:45 -04:00
Khalid Shakir 510d5e7730 Merged bug fix from Stable into Unstable 2011-09-09 01:34:55 -04:00
Khalid Shakir 367bbee25a Fixed typo when printing the contents or last N lines of a file. Thanks to larryns. 2011-09-09 01:33:25 -04:00
Mauricio Carneiro ee9d599558 Just cleaning up
clean up old commented code from tha data processing pipeline.
2011-09-07 13:32:40 -04:00
Mauricio Carneiro 28d782b4c7 Allowing multiple dnsnp and indel files in the DPP 2011-09-02 13:38:47 -04:00
Mauricio Carneiro ad4ea0b80b Merged bug fix from Stable into Unstable 2011-09-01 18:14:45 -04:00
Mauricio Carneiro e253f6f05d Fixing typo in DPP
platform and library were exchanged when rebuilding the read group information
2011-09-01 18:13:52 -04:00
Mauricio Carneiro d2a33beff7 Added WGS/WEX b37-decoy CEU trio datasets 2011-09-01 13:14:40 -04:00
Mark DePristo 61633c95a8 Default jobreport is now jobPrefix, so you see logs like Q-2508.jobreport.txt 2011-08-28 19:19:45 -04:00
Mark DePristo b38de1fa35 Now captures the exechost in the job report
-- Works for in process, shell, and LSF runners
-- Cleanup of debugging output
2011-08-28 12:05:56 -04:00
Mark DePristo e37a638e09 Fix for disallowed characters in GATKReportTable
-- Illegal characters are automatically replaced with _
2011-08-26 13:24:06 -04:00
Mark DePristo 0cb1605df0 Clean documentation for JobRunInfo 2011-08-26 09:22:58 -04:00
Mark DePristo 415d5d5301 LSF long times are in seconds, convert to milliseconds to meet standard 2011-08-26 09:18:28 -04:00
Mark DePristo eef1ac415a Merge branch 'master' into rodTesting
Conflicts:
	public/java/src/org/broadinstitute/sting/gatk/walkers/variantutils/VariantsToTable.java
2011-08-26 00:35:41 -04:00
Mark DePristo e03dfdb0ab Automatic iteration field addition works properly. 2011-08-25 16:59:02 -04:00
Mark DePristo e01273ca7c Queue now writes out queueJobReport.pdf
-- General purpose RScript executor in java (please use when invoking RScripts)
-- Removed groupName.  This is now analysisName
-- Explicitly added capability to enable/disable individual QFunction
2011-08-25 16:57:11 -04:00
Mark DePristo 0f4be2c4a4 Argument to disable queueJobReport entirely
-- Minor improvements to RodPerformanceGoals
2011-08-25 13:32:03 -04:00
Mark DePristo d65faf509c Default output name for Queue JobReport is queue_jobreport.gatkreport.txt 2011-08-25 13:15:20 -04:00
Mark DePristo a7d6946b22 Refactored QJobReport and QFunction, which is now automatically tracked
-- All QFunctions, including sg ones, are tracked
-- Removed memory information
2011-08-25 13:13:55 -04:00
Mauricio Carneiro 16caca0822 BLASR BAMs and new BWA parameters
*Added the functions to turn a BLASR generated BAM file into a usable BAM file.
*Modified the bwa parameters according to test results from NA12878 pb2k dataset.
2011-08-24 17:04:07 -04:00
Mauricio Carneiro e3f5d7067a Added ReorderSam queue binding 2011-08-24 17:03:11 -04:00
Mark DePristo 08fb21f127 Removing hostname 2011-08-24 16:45:50 -04:00
Mauricio Carneiro dc8398e165 fixing bai output for indel cleaning. 2011-08-24 15:58:34 -04:00
Mark DePristo 06e30a81d1 Fixes throughout for getting job information
-- no more hostname -- it's just not going to be important
2011-08-24 15:30:09 -04:00
Mark DePristo 4918519a58 No more NPE in getRuntime() when you cntr-c out of Queue 2011-08-24 14:14:01 -04:00
Mark DePristo 16d8360592 QJobReport is now the official capability name 2011-08-24 13:59:14 -04:00
Mark DePristo d047c19ad1 Writes output to file 2011-08-24 13:52:05 -04:00
Mark DePristo 3ae68e2397 JobLogging trait now writes out GATKReport log of jobs 2011-08-24 13:36:39 -04:00
Mauricio Carneiro cd12f7f286 Fixed list dependency
Instead of creating a bam list file, I dynamically create a scala list and pass as parameters. This way the intermediate bam files don't get deleted before they should.
2011-08-24 11:12:46 -04:00
Mauricio Carneiro 219252a566 Adapting to the new RodBinding framework 2011-08-24 11:12:46 -04:00
Mark DePristo b8bc03bb42 JobRunInfo improvements
-- dry-run now adds some info, for testing
-- InProcessRunner adds some, but not all, of the information we want
2011-08-23 17:11:22 -04:00
Mark DePristo 31ec6e316c First implementation of JobRunInfo
-- onExecutionDone(Map(QFunction, JobRunInfo)) is the new signature, so that you can walk over your jobs and inspect their success/failure and runtime characteristics
2011-08-23 16:51:54 -04:00
Mark DePristo a9ba945595 onExecutionDone(jobs, successFlag) added to QScript.
-- This function is called when the Qscript ends, so scripts can overload this function if they want to run some code after all of the jobs have completed
2011-08-23 10:09:51 -04:00
Mauricio Carneiro 136f0eb685 Creating sample-bam list instead of joining
This should save us at least one day in the trio decoy processing.
2011-08-22 18:03:39 -04:00
Mauricio Carneiro 04d8bcaf19 Fixed bai removal on picard tools
BAM index files were not being deleted because picard replaces the name of the file with bai instead of appending to it.
2011-08-22 18:03:39 -04:00
Mauricio Carneiro 8aed151a71 Created RevertSam queue class
Class for the picard tool RevertSam with all the options for queue scripts.
2011-08-22 18:03:39 -04:00
Mauricio Carneiro caebc88e9a Consensus mode and new RodBinding framework.
The DPP was not using the parameter correctly. It didn't matter for the default option (which is the only one we have been testing) but it would not work for knowns only or smith waterman. It is fixed now.

It now complies with the new rod binding framework.
2011-08-22 18:03:39 -04:00
Khalid Shakir c4c90c8826 Updates to JobRunners from the Queue developer community and from running the WholeGenomePipeline:
- Ability to pass a different resident memory reservation and limits. Useful for large pileups of low pass genome data that sometimes need high -Xmx6g but usually don't exceed 2-3g in actual heap size.
- Fixed jobPriority to work for all job runners. Now must be a integer between 0 and 100- even for GridEngine- and will be mapped to the correct values.
- Passing parallel environment and job resource requests to LSF and GridEngine. Useful for passing tokens like iodine_io=1 and -pe pe_slots 8
- Refactored GridEngine JobRunner to also provide basic support for other job dispatchers with DRMAA implementations such as Torque/PBS. Should work for basic running but advanced users must pass their own jobNativeArgs from the command line or in customized QScripts until someone maps properties like jobQueue, jobPriority, residentRequest, etc. into a Torque/PBS/etc. dispatcher.
2011-08-22 15:13:27 -04:00
Ryan Poplin f93a554b01 updating exome specific parameters in MDCP 2011-08-21 10:25:36 -04:00
Ryan Poplin b008676878 fixing the previous fix 2011-08-20 21:21:55 -04:00
Ryan Poplin 539e157ecd Fixing misc parameters in MDCP. The pipeline now does VariantEval of output by default. Fix for NaN vqslod values in VQSR 2011-08-20 11:28:48 -04:00
Ryan Poplin ddb5045e14 Updating the methods development calling pipeline for the new rod binding syntax and the new best practices. 2011-08-19 19:29:51 -04:00
Mauricio Carneiro 46051c36c6 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-10 16:57:34 -04:00
Mauricio Carneiro b0ff5b1ff7 a better name for the pacbio processing pipeline 2011-08-10 16:16:53 -04:00
Mark DePristo 9e53fd6880 Fixed VCFGatherFunction to not provide incorrect rod_priority_list
-- simply don't provide one, since you are just 'cating' the files together and genotypes never overlap
2011-08-10 07:28:35 -04:00
Mauricio Carneiro 481630da00 BWA parameters added 2011-08-09 17:05:24 -04:00
Mauricio Carneiro 22d2563823 added BWA SW alignment
The pipeline now accepts fasta/fastq files and aligns them using BWA SW, adds default basequalities, creates read groups and performs BQSR.
2011-08-09 17:05:24 -04:00
Mauricio Carneiro bd1cf4c7bc Pacbio Pipeline
Added the base quality "filling" step to allow the pipeline to handle raw pacbio BAM files. This is the first step towards a generic pacbio data processing pipeline.
2011-08-09 17:05:24 -04:00
Eric Banks 5a3c99b7b9 Fixing 'variants' change in qscript 2011-08-09 12:30:46 -04:00
Khalid Shakir cb28875c2a Updated rod binding syntax usage on CombineVariants from .rodBind to .variants. 2011-08-09 00:46:39 -04:00
Mark DePristo f8a56bc64b Merge branch 'master' into rodRefactor 2011-08-08 16:58:18 -04:00
Mark DePristo 383bb6f0e0 Merge branch 'master' into rodRefactor 2011-08-08 15:25:55 -04:00
Ryan Poplin 99e3a72343 Merged bug fix from Stable into Unstable 2011-08-08 12:36:17 -04:00
Ryan Poplin 8072bd9831 Updating resource bundle generation qscript for changeover to git 2011-08-08 12:35:39 -04:00
Mauricio Carneiro 0db46d0648 Merged bug fix from Stable into Unstable 2011-08-08 10:50:09 -04:00
Mauricio Carneiro 2fd101135c Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/stable 2011-08-08 10:49:43 -04:00
Mauricio Carneiro 4d6cb33612 removing temporary bam index
The clean bai file was left behind after the data processing pipeline was done
2011-08-08 10:49:28 -04:00
Mark DePristo e5fde0d16b Merge branch 'master' into rodRefactor 2011-08-08 10:08:43 -04:00
Ryan Poplin 6693407bd8 Merged bug fix from Stable into Unstable 2011-08-07 17:39:03 -04:00
Ryan Poplin 738e94efcb Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/stable 2011-08-07 17:36:45 -04:00
Khalid Shakir f534c2e7bb Merged bug fix from Stable into Unstable 2011-08-06 10:43:52 -04:00
Khalid Shakir eaa2f16d83 When a job finishes successfully in the ShellJobRunner, mark it as DONE instead of FAILED. 2011-08-06 10:42:04 -04:00
Mark DePristo 58a60d4901 Merge branch 'master' into rodRefactor 2011-08-04 12:48:56 -04:00
Ryan Poplin 21dc9a5543 Adding mills/devine indel dataset to the resource bundle 2011-08-04 12:31:28 -04:00
Mauricio Carneiro 0739b7f75b Merged bug fix from Stable into Unstable 2011-08-04 11:07:25 -04:00
Mauricio Carneiro aff681e407 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/stable 2011-08-04 11:05:25 -04:00
Mauricio Carneiro fa97bd8ac1 Merged bug fix from Stable into Unstable 2011-08-04 09:52:10 -04:00
Mauricio Carneiro 23ec5b94cf fixed a missing check for null
There was a missed check for the case when you don't provide an indels vcf for the cleaner.
2011-08-04 09:50:02 -04:00
Mauricio Carneiro 8981367307 Updating memory usage for picard programs 2011-08-03 15:48:28 -04:00
Mark DePristo 79e4a8f6d3 Merge
Conflicts:
	private/java/src/org/broadinstitute/sting/gatk/walkers/qc/TestVariantContextWalker.java
	public/java/src/org/broadinstitute/sting/gatk/walkers/phasing/PhaseByTransmission.java
	public/java/src/org/broadinstitute/sting/gatk/walkers/variantrecalibration/VariantDataManager.java
	public/java/src/org/broadinstitute/sting/gatk/walkers/variantutils/SelectVariants.java
	public/java/src/org/broadinstitute/sting/gatk/walkers/variantutils/VariantValidationAssessor.java
	public/java/test/org/broadinstitute/sting/gatk/walkers/recalibration/RecalibrationWalkersIntegrationTest.java
	public/java/test/org/broadinstitute/sting/gatk/walkers/recalibration/RecalibrationWalkersPerformanceTest.java
	public/java/test/org/broadinstitute/sting/gatk/walkers/varianteval/VariantEvalIntegrationTest.java
	public/java/test/org/broadinstitute/sting/utils/variantcontext/VariantContextIntegrationTest.java
2011-08-03 15:09:47 -04:00
Khalid Shakir 3e043a633c Merged bug fix from Stable into Unstable 2011-08-03 02:23:16 -04:00
Khalid Shakir a587f38808 Fixed example unified genotyper pipeline to wrap filter expressions with quotes and use rod binding name "variant" instead of "vcf". 2011-08-03 02:21:01 -04:00
Khalid Shakir 5dcac7b064 GATKReport v0.2:
- Floating point column widths are measured correctly
- Using fixed width columns instead of white space separated which allows spaces embedded in cell values
- Legacy support for parsing white space separated v0.1 tables where the columns may not be fixed width
- Enforcing that table descriptions do not contain newlines so that tables can be parsed correctly
Replaced GATKReportTableParser with existing functionality in GATKReport
2011-08-03 00:24:47 -04:00
Mark DePristo 03741fb640 Merge branch 'master' into rodRefactor
Conflicts:
	public/java/src/org/broadinstitute/sting/gatk/walkers/annotator/VariantAnnotatorEngine.java
	public/java/test/org/broadinstitute/sting/gatk/walkers/indels/IndelRealignerIntegrationTest.java
	public/java/test/org/broadinstitute/sting/gatk/walkers/indels/IndelRealignerPerformanceTest.java
	public/java/test/org/broadinstitute/sting/utils/variantcontext/VariantContextIntegrationTest.java
2011-08-02 14:21:58 -04:00
Mark DePristo 4f8d830960 Updated to reflect new parse() function 2011-07-30 15:34:20 -04:00
Mauricio Carneiro 2d94037ad0 Remove temporary index files (*.bai)
some temporary index files were not being removed.
2011-07-30 02:05:22 -04:00
Mauricio Carneiro dcf21f379a Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/stable 2011-07-23 12:59:53 -04:00
Mauricio Carneiro f0a6dd27a1 Renaming the plot output directory names. 2011-07-23 12:59:37 -04:00
Mauricio Carneiro 4f78025b0b Merged bug fix from Stable into Unstable 2011-07-22 14:42:04 -04:00
Mauricio Carneiro 4080e2cd88 * Added the decoy reference to the bundle under the b37 resources.
* Updated the -svn argument to -ver since we don't use svn anymore (also updated the wiki).
2011-07-22 14:41:22 -04:00
Khalid Shakir 59eb1f4663 Memory limits changed from Int to Double.
Updated LSF calls to read memory units from config along with tweaks to select hosts.
Moved some common code from GridEngine and LSF to super classes.
2011-07-21 22:57:18 -04:00
Mauricio Carneiro 9ad5c7dfa4 Resolving simple conflicts in the data processing pipeline.
Conflicts:
	public/scala/qscript/org/broadinstitute/sting/queue/qscripts/DataProcessingPipeline.scala
2011-07-19 08:05:11 -04:00
Mauricio Carneiro 7688bda1a6 better progress report for the DPP 2011-07-18 23:39:47 -04:00
Mauricio Carneiro 2b465ab43b * added optional 'no validation' for the Data Processing pipeline.
* some simplifications on the picard classes
2011-07-18 23:30:31 -04:00
Mauricio Carneiro 4cf7a2af23 Removed broad specific default paths so people from outside the broad can use it. 2011-07-18 23:25:21 -04:00
Mark DePristo 449bf1b539 Testdata for diffObjects.
PipelineTest updated to point to MD5DB.java
2011-07-18 10:47:03 -04:00
Mauricio Carneiro ecc8726f63 Merged bug fix from Stable into Unstable 2011-07-17 18:10:18 -04:00
Mauricio Carneiro 1af76736b9 Guarantees that the list of files will always be in the same order. 2011-07-17 11:41:34 -04:00
Mauricio Carneiro 5cb5a4ec75 Merged bug fix from Stable into Unstable 2011-07-16 00:23:59 -04:00
Mauricio Carneiro dd92a14b40 Made extra indel VCF optional but DBSNP mandatory. 2011-07-16 00:23:35 -04:00
Mauricio Carneiro 2fa5dbb0fe Merged bug fix from Stable into Unstable 2011-07-16 00:15:19 -04:00
Mauricio Carneiro ed55182a4c Removing Broad specific paths from parameters and making them required. This should make it unambiguous for people inside and outside the Broad to use the DataProcessingPipeline (as per request in the GetSatisfaction) 2011-07-16 00:09:00 -04:00
Mauricio Carneiro 43bd45fcad Merged bug fix from Stable into Unstable 2011-07-15 19:40:02 -04:00
Mauricio Carneiro fd1df31ef0 changing the output directory names for Analyze Covariates 2011-07-15 19:39:42 -04:00
Mauricio Carneiro aa30f416a3 Resolving conflicts
Conflicts:
	private/scala/qscript/depristo/ExomePostQCEval.scala
	private/scala/qscript/depristo/PostCallingQC.scala
	private/scala/qscript/org/broadinstitute/sting/queue/qscripts/archive/ExomePostQCEval.scala
2011-07-15 16:21:42 -04:00
Mauricio Carneiro 224d373997 No need to double overload the file constructor 2011-07-15 15:19:10 -04:00
Mauricio Carneiro 7b7d40d5d9 A better name for the qscript utilities. Throw here every method you find yourself repeatedly implementing in your qscripts!
Refactoring appropriately.
2011-07-15 14:34:50 -04:00
Mauricio Carneiro a670d6420a Refactoring Qscript utils into queue general utils package. 2011-07-15 14:31:43 -04:00
Mauricio Carneiro f19862a643 Fixing conflicts. 2011-07-14 17:13:31 -04:00
Mauricio Carneiro 43c6a8565b looks better now. 2011-07-14 17:10:44 -04:00
Mauricio Carneiro 09ffe277ae Added a qscripts util package with some utility functions commonly shared across queue scripts. Refactored some of my public scripts to use it in an effort to make queue scripts more reusable and "supportable". 2011-07-14 17:09:35 -04:00
Mauricio Carneiro 4f8230c750 Merged bug fix from Stable into Unstable 2011-07-14 16:44:57 -04:00
Mauricio Carneiro 9f5180ab05 Recalibrates a list of bam files allowing multiple bams to be recalibrated out of a single 'mother' queue job. 2011-07-14 16:42:17 -04:00
Mauricio Carneiro df996a1a73 more progress report for the Data Processing Pipeline.
Bam lists can now have empty lines, comments and whitespaces anywhere.
2011-07-13 14:53:58 -04:00
Mauricio Carneiro e2f2917bd2 Merged bug fix from Stable into Unstable 2011-07-13 13:00:55 -04:00
Mauricio Carneiro ff4e31c554 Changing the file names as per Kris request. 2011-07-13 12:59:18 -04:00
Khalid Shakir e93052a51e When generating the QGraph, don't regenerate if there aren't scatter/gather jobs.
Fixed a display issue with the number of milliseconds that Queue has tried to contact LSF.
2011-07-11 19:17:58 -04:00
Mauricio Carneiro 5298e3a942 Making the outputDir optional. Default = ./ 2011-07-05 16:30:41 -04:00
Mauricio Carneiro 7d3dfdfdf2 Updating the MDCP to use the classpath for the GATK jar, removing -gatk parameter. 2011-07-05 16:30:10 -04:00
Khalid Shakir b6bc64a0c8 Cleanup of the utils.broad package.
Using Picard IoUtils on sample names.
2011-07-01 20:47:03 -04:00
David Roazen 546e7777fa Re-fixing paths in pipeline tests after example qscripts got moved. 2011-07-01 16:39:10 -04:00
Mauricio Carneiro b0fb63e20a moving the example scala scripts to the qscripts package. 2011-07-01 16:14:59 -04:00
Mauricio Carneiro d19351f71a Added capability of running multiple bam files in the same directory. 2011-07-01 16:02:28 -04:00
David Roazen 11d4af0e75 Path-related fixes to the private queue pipeline tests. 2011-07-01 13:41:34 -04:00
David Roazen 9644f104c4 Fixes to the queue pipeline tests to account for the new directory structure. 2011-07-01 13:13:24 -04:00
Mauricio Carneiro 64048a67e8 cleaning up ghost scala scripts. Deleting clearly unused one and moving others to qscripts.archive 2011-06-30 15:20:43 -04:00
Mauricio Carneiro 197b7141c1 Added an optional argument -bt <num_threads> for BWA to run multithreaded. 2011-06-30 14:41:57 -04:00
Mauricio Carneiro f4463d38ca BWA requires pair ended reads to be sorted by read names when operating over BAM files, but Picard sorts by coordinate, so in case we use BWA in pair ended reads, the pipeline now resorts the BAM in read name order, realigns it then sorts it in coordinate order. 2011-06-30 14:29:21 -04:00
Mauricio Carneiro efd99c3c11 new home for the core qscripts 2011-06-30 11:32:06 -04:00
Mauricio Carneiro 1085df8b7b Making the BQSR pipeline publicly available and supported.
this is for all the Pacbio validation that is going on right no in the cancer group. They are all using this script, and I'm happy to support it.
2011-06-29 16:05:32 -04:00
David Roazen 3c9497788e Reorganized the codebase beneath top-level public and private directories,
removing the playground and oneoffprojects directories in the process. Updated
build.xml accordingly.
2011-06-28 06:55:19 -04:00