Mauricio Carneiro
758ecf2d43
Bringing latest updates of ReduceReads to the master repository
2011-09-20 16:35:09 -04:00
Mauricio Carneiro
08ffb18b96
Renaming datasets in the MDCP
...
Making dataset names and files generated by the MDCP more uniform.
2011-09-20 11:02:51 -04:00
Eric Banks
ba150570f3
Updating to use new rod system syntax plus name change for CountRODs
2011-09-19 13:30:32 -04:00
Eric Banks
095f75ff7d
Merge branch 'master' of ssh://gsa1.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-19 12:24:12 -04:00
Eric Banks
85626e7a5d
We no longer want people to use the August 2010 Dindel calls for indel realignment but instead Guillermo's new whole genome bi-allelic indel calls; updating the bundle accordingly. Also, there was some confusion by the 1000G data processing folks as to exactly what these indel files are, so I've renamed them so that it's clear. Wiki updated too.
2011-09-19 12:24:05 -04:00
Mark DePristo
6ea57bf036
Merge branch 'master' into sgintervals
2011-09-19 09:50:19 -04:00
Khalid Shakir
33967a4e0c
Fixed issue reported by chartl where cloned functions lost tags on @Inputs.
...
Updated ExampleUnifiedGenotyper.scala with new syntax.
2011-09-16 12:46:07 -04:00
Ryan Poplin
981b78ea50
Changing the VQSR command line syntax back to the parsed tags approach. This cleans up the code and makes sure we won't be parsing the same rod file multiple times. I've tried to update the appropriate qscripts.
2011-09-12 12:17:43 -04:00
Mauricio Carneiro
7f9000382e
Making indel calls default in the MDCP
...
You can turn off indel calling by using -noIndels.
2011-09-09 14:09:26 -04:00
Mark DePristo
06cb20f2a5
Intermediate commit cleaning up scatter intervals
...
-- Adding unit tests to ensure uniformity of intervals
2011-09-09 12:56:45 -04:00
Khalid Shakir
510d5e7730
Merged bug fix from Stable into Unstable
2011-09-09 01:34:55 -04:00
Khalid Shakir
367bbee25a
Fixed typo when printing the contents or last N lines of a file. Thanks to larryns.
2011-09-09 01:33:25 -04:00
Mauricio Carneiro
ee9d599558
Just cleaning up
...
clean up old commented code from tha data processing pipeline.
2011-09-07 13:32:40 -04:00
Mauricio Carneiro
28d782b4c7
Allowing multiple dnsnp and indel files in the DPP
2011-09-02 13:38:47 -04:00
Mauricio Carneiro
ad4ea0b80b
Merged bug fix from Stable into Unstable
2011-09-01 18:14:45 -04:00
Mauricio Carneiro
e253f6f05d
Fixing typo in DPP
...
platform and library were exchanged when rebuilding the read group information
2011-09-01 18:13:52 -04:00
Mauricio Carneiro
d2a33beff7
Added WGS/WEX b37-decoy CEU trio datasets
2011-09-01 13:14:40 -04:00
Mark DePristo
61633c95a8
Default jobreport is now jobPrefix, so you see logs like Q-2508.jobreport.txt
2011-08-28 19:19:45 -04:00
Mark DePristo
b38de1fa35
Now captures the exechost in the job report
...
-- Works for in process, shell, and LSF runners
-- Cleanup of debugging output
2011-08-28 12:05:56 -04:00
Mark DePristo
e37a638e09
Fix for disallowed characters in GATKReportTable
...
-- Illegal characters are automatically replaced with _
2011-08-26 13:24:06 -04:00
Mark DePristo
0cb1605df0
Clean documentation for JobRunInfo
2011-08-26 09:22:58 -04:00
Mark DePristo
415d5d5301
LSF long times are in seconds, convert to milliseconds to meet standard
2011-08-26 09:18:28 -04:00
Mark DePristo
eef1ac415a
Merge branch 'master' into rodTesting
...
Conflicts:
public/java/src/org/broadinstitute/sting/gatk/walkers/variantutils/VariantsToTable.java
2011-08-26 00:35:41 -04:00
Mark DePristo
e03dfdb0ab
Automatic iteration field addition works properly.
2011-08-25 16:59:02 -04:00
Mark DePristo
e01273ca7c
Queue now writes out queueJobReport.pdf
...
-- General purpose RScript executor in java (please use when invoking RScripts)
-- Removed groupName. This is now analysisName
-- Explicitly added capability to enable/disable individual QFunction
2011-08-25 16:57:11 -04:00
Mark DePristo
0f4be2c4a4
Argument to disable queueJobReport entirely
...
-- Minor improvements to RodPerformanceGoals
2011-08-25 13:32:03 -04:00
Mark DePristo
d65faf509c
Default output name for Queue JobReport is queue_jobreport.gatkreport.txt
2011-08-25 13:15:20 -04:00
Mark DePristo
a7d6946b22
Refactored QJobReport and QFunction, which is now automatically tracked
...
-- All QFunctions, including sg ones, are tracked
-- Removed memory information
2011-08-25 13:13:55 -04:00
Mauricio Carneiro
16caca0822
BLASR BAMs and new BWA parameters
...
*Added the functions to turn a BLASR generated BAM file into a usable BAM file.
*Modified the bwa parameters according to test results from NA12878 pb2k dataset.
2011-08-24 17:04:07 -04:00
Mauricio Carneiro
e3f5d7067a
Added ReorderSam queue binding
2011-08-24 17:03:11 -04:00
Mark DePristo
08fb21f127
Removing hostname
2011-08-24 16:45:50 -04:00
Mauricio Carneiro
dc8398e165
fixing bai output for indel cleaning.
2011-08-24 15:58:34 -04:00
Mark DePristo
06e30a81d1
Fixes throughout for getting job information
...
-- no more hostname -- it's just not going to be important
2011-08-24 15:30:09 -04:00
Mark DePristo
4918519a58
No more NPE in getRuntime() when you cntr-c out of Queue
2011-08-24 14:14:01 -04:00
Mark DePristo
16d8360592
QJobReport is now the official capability name
2011-08-24 13:59:14 -04:00
Mark DePristo
d047c19ad1
Writes output to file
2011-08-24 13:52:05 -04:00
Mark DePristo
3ae68e2397
JobLogging trait now writes out GATKReport log of jobs
2011-08-24 13:36:39 -04:00
Mauricio Carneiro
cd12f7f286
Fixed list dependency
...
Instead of creating a bam list file, I dynamically create a scala list and pass as parameters. This way the intermediate bam files don't get deleted before they should.
2011-08-24 11:12:46 -04:00
Mauricio Carneiro
219252a566
Adapting to the new RodBinding framework
2011-08-24 11:12:46 -04:00
Mark DePristo
b8bc03bb42
JobRunInfo improvements
...
-- dry-run now adds some info, for testing
-- InProcessRunner adds some, but not all, of the information we want
2011-08-23 17:11:22 -04:00
Mark DePristo
31ec6e316c
First implementation of JobRunInfo
...
-- onExecutionDone(Map(QFunction, JobRunInfo)) is the new signature, so that you can walk over your jobs and inspect their success/failure and runtime characteristics
2011-08-23 16:51:54 -04:00
Mark DePristo
a9ba945595
onExecutionDone(jobs, successFlag) added to QScript.
...
-- This function is called when the Qscript ends, so scripts can overload this function if they want to run some code after all of the jobs have completed
2011-08-23 10:09:51 -04:00
Mauricio Carneiro
136f0eb685
Creating sample-bam list instead of joining
...
This should save us at least one day in the trio decoy processing.
2011-08-22 18:03:39 -04:00
Mauricio Carneiro
04d8bcaf19
Fixed bai removal on picard tools
...
BAM index files were not being deleted because picard replaces the name of the file with bai instead of appending to it.
2011-08-22 18:03:39 -04:00
Mauricio Carneiro
8aed151a71
Created RevertSam queue class
...
Class for the picard tool RevertSam with all the options for queue scripts.
2011-08-22 18:03:39 -04:00
Mauricio Carneiro
caebc88e9a
Consensus mode and new RodBinding framework.
...
The DPP was not using the parameter correctly. It didn't matter for the default option (which is the only one we have been testing) but it would not work for knowns only or smith waterman. It is fixed now.
It now complies with the new rod binding framework.
2011-08-22 18:03:39 -04:00
Khalid Shakir
c4c90c8826
Updates to JobRunners from the Queue developer community and from running the WholeGenomePipeline:
...
- Ability to pass a different resident memory reservation and limits. Useful for large pileups of low pass genome data that sometimes need high -Xmx6g but usually don't exceed 2-3g in actual heap size.
- Fixed jobPriority to work for all job runners. Now must be a integer between 0 and 100- even for GridEngine- and will be mapped to the correct values.
- Passing parallel environment and job resource requests to LSF and GridEngine. Useful for passing tokens like iodine_io=1 and -pe pe_slots 8
- Refactored GridEngine JobRunner to also provide basic support for other job dispatchers with DRMAA implementations such as Torque/PBS. Should work for basic running but advanced users must pass their own jobNativeArgs from the command line or in customized QScripts until someone maps properties like jobQueue, jobPriority, residentRequest, etc. into a Torque/PBS/etc. dispatcher.
2011-08-22 15:13:27 -04:00
Ryan Poplin
f93a554b01
updating exome specific parameters in MDCP
2011-08-21 10:25:36 -04:00
Ryan Poplin
b008676878
fixing the previous fix
2011-08-20 21:21:55 -04:00
Ryan Poplin
539e157ecd
Fixing misc parameters in MDCP. The pipeline now does VariantEval of output by default. Fix for NaN vqslod values in VQSR
2011-08-20 11:28:48 -04:00
Ryan Poplin
ddb5045e14
Updating the methods development calling pipeline for the new rod binding syntax and the new best practices.
2011-08-19 19:29:51 -04:00
Mauricio Carneiro
46051c36c6
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-08-10 16:57:34 -04:00
Mauricio Carneiro
b0ff5b1ff7
a better name for the pacbio processing pipeline
2011-08-10 16:16:53 -04:00
Mark DePristo
9e53fd6880
Fixed VCFGatherFunction to not provide incorrect rod_priority_list
...
-- simply don't provide one, since you are just 'cating' the files together and genotypes never overlap
2011-08-10 07:28:35 -04:00
Mauricio Carneiro
481630da00
BWA parameters added
2011-08-09 17:05:24 -04:00
Mauricio Carneiro
22d2563823
added BWA SW alignment
...
The pipeline now accepts fasta/fastq files and aligns them using BWA SW, adds default basequalities, creates read groups and performs BQSR.
2011-08-09 17:05:24 -04:00
Mauricio Carneiro
bd1cf4c7bc
Pacbio Pipeline
...
Added the base quality "filling" step to allow the pipeline to handle raw pacbio BAM files. This is the first step towards a generic pacbio data processing pipeline.
2011-08-09 17:05:24 -04:00
Eric Banks
5a3c99b7b9
Fixing 'variants' change in qscript
2011-08-09 12:30:46 -04:00
Khalid Shakir
cb28875c2a
Updated rod binding syntax usage on CombineVariants from .rodBind to .variants.
2011-08-09 00:46:39 -04:00
Mark DePristo
f8a56bc64b
Merge branch 'master' into rodRefactor
2011-08-08 16:58:18 -04:00
Mark DePristo
383bb6f0e0
Merge branch 'master' into rodRefactor
2011-08-08 15:25:55 -04:00
Ryan Poplin
99e3a72343
Merged bug fix from Stable into Unstable
2011-08-08 12:36:17 -04:00
Ryan Poplin
8072bd9831
Updating resource bundle generation qscript for changeover to git
2011-08-08 12:35:39 -04:00
Mauricio Carneiro
0db46d0648
Merged bug fix from Stable into Unstable
2011-08-08 10:50:09 -04:00
Mauricio Carneiro
2fd101135c
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/stable
2011-08-08 10:49:43 -04:00
Mauricio Carneiro
4d6cb33612
removing temporary bam index
...
The clean bai file was left behind after the data processing pipeline was done
2011-08-08 10:49:28 -04:00
Mark DePristo
e5fde0d16b
Merge branch 'master' into rodRefactor
2011-08-08 10:08:43 -04:00
Ryan Poplin
6693407bd8
Merged bug fix from Stable into Unstable
2011-08-07 17:39:03 -04:00
Ryan Poplin
738e94efcb
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/stable
2011-08-07 17:36:45 -04:00
Khalid Shakir
f534c2e7bb
Merged bug fix from Stable into Unstable
2011-08-06 10:43:52 -04:00
Khalid Shakir
eaa2f16d83
When a job finishes successfully in the ShellJobRunner, mark it as DONE instead of FAILED.
2011-08-06 10:42:04 -04:00
Mark DePristo
58a60d4901
Merge branch 'master' into rodRefactor
2011-08-04 12:48:56 -04:00
Ryan Poplin
21dc9a5543
Adding mills/devine indel dataset to the resource bundle
2011-08-04 12:31:28 -04:00
Mauricio Carneiro
0739b7f75b
Merged bug fix from Stable into Unstable
2011-08-04 11:07:25 -04:00
Mauricio Carneiro
aff681e407
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/stable
2011-08-04 11:05:25 -04:00
Mauricio Carneiro
fa97bd8ac1
Merged bug fix from Stable into Unstable
2011-08-04 09:52:10 -04:00
Mauricio Carneiro
23ec5b94cf
fixed a missing check for null
...
There was a missed check for the case when you don't provide an indels vcf for the cleaner.
2011-08-04 09:50:02 -04:00
Mauricio Carneiro
8981367307
Updating memory usage for picard programs
2011-08-03 15:48:28 -04:00
Mark DePristo
79e4a8f6d3
Merge
...
Conflicts:
private/java/src/org/broadinstitute/sting/gatk/walkers/qc/TestVariantContextWalker.java
public/java/src/org/broadinstitute/sting/gatk/walkers/phasing/PhaseByTransmission.java
public/java/src/org/broadinstitute/sting/gatk/walkers/variantrecalibration/VariantDataManager.java
public/java/src/org/broadinstitute/sting/gatk/walkers/variantutils/SelectVariants.java
public/java/src/org/broadinstitute/sting/gatk/walkers/variantutils/VariantValidationAssessor.java
public/java/test/org/broadinstitute/sting/gatk/walkers/recalibration/RecalibrationWalkersIntegrationTest.java
public/java/test/org/broadinstitute/sting/gatk/walkers/recalibration/RecalibrationWalkersPerformanceTest.java
public/java/test/org/broadinstitute/sting/gatk/walkers/varianteval/VariantEvalIntegrationTest.java
public/java/test/org/broadinstitute/sting/utils/variantcontext/VariantContextIntegrationTest.java
2011-08-03 15:09:47 -04:00
Khalid Shakir
3e043a633c
Merged bug fix from Stable into Unstable
2011-08-03 02:23:16 -04:00
Khalid Shakir
a587f38808
Fixed example unified genotyper pipeline to wrap filter expressions with quotes and use rod binding name "variant" instead of "vcf".
2011-08-03 02:21:01 -04:00
Khalid Shakir
5dcac7b064
GATKReport v0.2:
...
- Floating point column widths are measured correctly
- Using fixed width columns instead of white space separated which allows spaces embedded in cell values
- Legacy support for parsing white space separated v0.1 tables where the columns may not be fixed width
- Enforcing that table descriptions do not contain newlines so that tables can be parsed correctly
Replaced GATKReportTableParser with existing functionality in GATKReport
2011-08-03 00:24:47 -04:00
Mark DePristo
03741fb640
Merge branch 'master' into rodRefactor
...
Conflicts:
public/java/src/org/broadinstitute/sting/gatk/walkers/annotator/VariantAnnotatorEngine.java
public/java/test/org/broadinstitute/sting/gatk/walkers/indels/IndelRealignerIntegrationTest.java
public/java/test/org/broadinstitute/sting/gatk/walkers/indels/IndelRealignerPerformanceTest.java
public/java/test/org/broadinstitute/sting/utils/variantcontext/VariantContextIntegrationTest.java
2011-08-02 14:21:58 -04:00
Mark DePristo
4f8d830960
Updated to reflect new parse() function
2011-07-30 15:34:20 -04:00
Mauricio Carneiro
2d94037ad0
Remove temporary index files (*.bai)
...
some temporary index files were not being removed.
2011-07-30 02:05:22 -04:00
Mauricio Carneiro
dcf21f379a
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/stable
2011-07-23 12:59:53 -04:00
Mauricio Carneiro
f0a6dd27a1
Renaming the plot output directory names.
2011-07-23 12:59:37 -04:00
Mauricio Carneiro
4f78025b0b
Merged bug fix from Stable into Unstable
2011-07-22 14:42:04 -04:00
Mauricio Carneiro
4080e2cd88
* Added the decoy reference to the bundle under the b37 resources.
...
* Updated the -svn argument to -ver since we don't use svn anymore (also updated the wiki).
2011-07-22 14:41:22 -04:00
Khalid Shakir
59eb1f4663
Memory limits changed from Int to Double.
...
Updated LSF calls to read memory units from config along with tweaks to select hosts.
Moved some common code from GridEngine and LSF to super classes.
2011-07-21 22:57:18 -04:00
Mauricio Carneiro
9ad5c7dfa4
Resolving simple conflicts in the data processing pipeline.
...
Conflicts:
public/scala/qscript/org/broadinstitute/sting/queue/qscripts/DataProcessingPipeline.scala
2011-07-19 08:05:11 -04:00
Mauricio Carneiro
7688bda1a6
better progress report for the DPP
2011-07-18 23:39:47 -04:00
Mauricio Carneiro
2b465ab43b
* added optional 'no validation' for the Data Processing pipeline.
...
* some simplifications on the picard classes
2011-07-18 23:30:31 -04:00
Mauricio Carneiro
4cf7a2af23
Removed broad specific default paths so people from outside the broad can use it.
2011-07-18 23:25:21 -04:00
Mark DePristo
449bf1b539
Testdata for diffObjects.
...
PipelineTest updated to point to MD5DB.java
2011-07-18 10:47:03 -04:00
Mauricio Carneiro
ecc8726f63
Merged bug fix from Stable into Unstable
2011-07-17 18:10:18 -04:00
Mauricio Carneiro
1af76736b9
Guarantees that the list of files will always be in the same order.
2011-07-17 11:41:34 -04:00
Mauricio Carneiro
5cb5a4ec75
Merged bug fix from Stable into Unstable
2011-07-16 00:23:59 -04:00
Mauricio Carneiro
dd92a14b40
Made extra indel VCF optional but DBSNP mandatory.
2011-07-16 00:23:35 -04:00
Mauricio Carneiro
2fa5dbb0fe
Merged bug fix from Stable into Unstable
2011-07-16 00:15:19 -04:00
Mauricio Carneiro
ed55182a4c
Removing Broad specific paths from parameters and making them required. This should make it unambiguous for people inside and outside the Broad to use the DataProcessingPipeline (as per request in the GetSatisfaction)
2011-07-16 00:09:00 -04:00
Mauricio Carneiro
43bd45fcad
Merged bug fix from Stable into Unstable
2011-07-15 19:40:02 -04:00
Mauricio Carneiro
fd1df31ef0
changing the output directory names for Analyze Covariates
2011-07-15 19:39:42 -04:00
Mauricio Carneiro
aa30f416a3
Resolving conflicts
...
Conflicts:
private/scala/qscript/depristo/ExomePostQCEval.scala
private/scala/qscript/depristo/PostCallingQC.scala
private/scala/qscript/org/broadinstitute/sting/queue/qscripts/archive/ExomePostQCEval.scala
2011-07-15 16:21:42 -04:00
Mauricio Carneiro
224d373997
No need to double overload the file constructor
2011-07-15 15:19:10 -04:00
Mauricio Carneiro
7b7d40d5d9
A better name for the qscript utilities. Throw here every method you find yourself repeatedly implementing in your qscripts!
...
Refactoring appropriately.
2011-07-15 14:34:50 -04:00
Mauricio Carneiro
a670d6420a
Refactoring Qscript utils into queue general utils package.
2011-07-15 14:31:43 -04:00
Mauricio Carneiro
f19862a643
Fixing conflicts.
2011-07-14 17:13:31 -04:00
Mauricio Carneiro
43c6a8565b
looks better now.
2011-07-14 17:10:44 -04:00
Mauricio Carneiro
09ffe277ae
Added a qscripts util package with some utility functions commonly shared across queue scripts. Refactored some of my public scripts to use it in an effort to make queue scripts more reusable and "supportable".
2011-07-14 17:09:35 -04:00
Mauricio Carneiro
4f8230c750
Merged bug fix from Stable into Unstable
2011-07-14 16:44:57 -04:00
Mauricio Carneiro
9f5180ab05
Recalibrates a list of bam files allowing multiple bams to be recalibrated out of a single 'mother' queue job.
2011-07-14 16:42:17 -04:00
Mauricio Carneiro
df996a1a73
more progress report for the Data Processing Pipeline.
...
Bam lists can now have empty lines, comments and whitespaces anywhere.
2011-07-13 14:53:58 -04:00
Mauricio Carneiro
e2f2917bd2
Merged bug fix from Stable into Unstable
2011-07-13 13:00:55 -04:00
Mauricio Carneiro
ff4e31c554
Changing the file names as per Kris request.
2011-07-13 12:59:18 -04:00
Khalid Shakir
e93052a51e
When generating the QGraph, don't regenerate if there aren't scatter/gather jobs.
...
Fixed a display issue with the number of milliseconds that Queue has tried to contact LSF.
2011-07-11 19:17:58 -04:00
Mauricio Carneiro
5298e3a942
Making the outputDir optional. Default = ./
2011-07-05 16:30:41 -04:00
Mauricio Carneiro
7d3dfdfdf2
Updating the MDCP to use the classpath for the GATK jar, removing -gatk parameter.
2011-07-05 16:30:10 -04:00
Khalid Shakir
b6bc64a0c8
Cleanup of the utils.broad package.
...
Using Picard IoUtils on sample names.
2011-07-01 20:47:03 -04:00
David Roazen
546e7777fa
Re-fixing paths in pipeline tests after example qscripts got moved.
2011-07-01 16:39:10 -04:00
Mauricio Carneiro
b0fb63e20a
moving the example scala scripts to the qscripts package.
2011-07-01 16:14:59 -04:00
Mauricio Carneiro
d19351f71a
Added capability of running multiple bam files in the same directory.
2011-07-01 16:02:28 -04:00
David Roazen
11d4af0e75
Path-related fixes to the private queue pipeline tests.
2011-07-01 13:41:34 -04:00
David Roazen
9644f104c4
Fixes to the queue pipeline tests to account for the new directory structure.
2011-07-01 13:13:24 -04:00
Mauricio Carneiro
64048a67e8
cleaning up ghost scala scripts. Deleting clearly unused one and moving others to qscripts.archive
2011-06-30 15:20:43 -04:00
Mauricio Carneiro
197b7141c1
Added an optional argument -bt <num_threads> for BWA to run multithreaded.
2011-06-30 14:41:57 -04:00
Mauricio Carneiro
f4463d38ca
BWA requires pair ended reads to be sorted by read names when operating over BAM files, but Picard sorts by coordinate, so in case we use BWA in pair ended reads, the pipeline now resorts the BAM in read name order, realigns it then sorts it in coordinate order.
2011-06-30 14:29:21 -04:00
Mauricio Carneiro
efd99c3c11
new home for the core qscripts
2011-06-30 11:32:06 -04:00
Mauricio Carneiro
1085df8b7b
Making the BQSR pipeline publicly available and supported.
...
this is for all the Pacbio validation that is going on right no in the cancer group. They are all using this script, and I'm happy to support it.
2011-06-29 16:05:32 -04:00
David Roazen
3c9497788e
Reorganized the codebase beneath top-level public and private directories,
...
removing the playground and oneoffprojects directories in the process. Updated
build.xml accordingly.
2011-06-28 06:55:19 -04:00