Mauricio Carneiro
945cf03889
IntelliJ ate my import!
2012-01-23 21:46:45 -05:00
Mauricio Carneiro
2bb9525e7f
Don't set base qualities if fastQ is provided
...
* Pacbio Processing pipeline now works with the new fastQ files outputted by the Pacbio instrument
2012-01-23 17:57:29 -05:00
Khalid Shakir
c18beadbdb
Device files like /dev/null are now tracked as special by Queue and are not used to generate .out file paths, scattered into a temporary directory, gathered, deleted, etc.
...
Attempted workaround for xdr_resourceInfoReq unsatisfied link during loading of libbat.so.
2012-01-23 16:17:04 -05:00
Ryan Poplin
75f87db468
Replacing Mills file with new gold standard indel set in the resource bundle for release with v1.5
2012-01-17 15:02:45 -05:00
Mauricio Carneiro
5bf960deb8
adding dbsnp to indel VQSR
2012-01-10 12:38:49 -05:00
Mauricio Carneiro
6f2abd76df
Updating the MDCP with the new indel gold standard from Ryan.
2012-01-09 15:31:18 -05:00
Khalid Shakir
5793625592
No more "Q-<pid>@<host>". Generated log file names now use the first output + ".out" (ex. my.vcf.out) or the name of the first QScript plus the order the function was added (ex. MyScript-1.out). The same function added twice with the same outputs will now have the same default logs, meaning the 2nd instance of the function won't be added to the graph twice.
...
QScript accessor to QSettings to specify a default runName and other default function settings.
Because log files are no longer pseudo-random their presense can be used to tell if a job without other file outputs is "done". For now still using the log's .done file in addition to original outputs.
Gathered log files concatenate all log files together into the stdout.
InProcessFunctions now have PrintStreams for stdout and stderr.
Updated ivy to use commons-io 2.1 for copying logs to the stdout PrintStream. Removed snakeyaml.
During graph tracking of outputs the Index files, and now BAM MD5s, are tracked with the gathering of the original file.
In Queue generated wrappers for the GATK the Index and MD5s used for tracking are switched to private scope.
Added more detailed output when running with -l DEBUG.
Simplified graphviz visualization for additional debugging.
Switched usage of the scala class 'List' to the trait 'Seq' (think java.util.ArrayList vs. using the interface java.util.List)
Minor cleanup to build including sending ant gsalib to R's default libloc.
2012-01-08 12:11:55 -05:00
Mauricio Carneiro
f6a18aea63
Updated MDCP with INDEL best practices
...
* chose 90.0 indel cut target for most datasets (this is arbitrary).
2012-01-06 17:21:59 -05:00
Khalid Shakir
7486696c07
When using bam list mode in HSP deriving VCF name from bam list instead of requiring an additional parameter.
...
Creating a single temporary directory per ant test run instead of a putting temp files across all runs in the same directory.
Updated various tests for above items and other small fixes.
2011-12-16 18:09:25 -05:00
Mauricio Carneiro
663184ee9d
Added test mode to PPP
...
* in test mode, no @PG tags are output to the final bam file
* updated pipeline test to use -test mode.
* MD5s updated accordingly
2011-12-12 18:29:06 -05:00
Mauricio Carneiro
a3c3d72313
Added test mode to DPP
...
* in test mode, no @PG tags are output to the final bam file
* updated pipeline test to use -test mode.
* MD5s are now dependent on BWA version
2011-12-12 18:29:06 -05:00
Mauricio Carneiro
cca8a18608
PPP pipeline test
...
* added a pipeline test to the Pacbio Processing Pipeline.
* updated exampleBAM with more complete RG information so we can use it in a wider variety of pipeline tests
* added exampleDBSNP.vcf file with only chromosome 1 in the range of the exampleFASTA.fasta reference for pipeline tests
2011-12-11 17:32:21 -05:00
Mauricio Carneiro
21ac3b59d7
Merged bug fix from Stable into Unstable
2011-12-09 16:51:46 -05:00
Mauricio Carneiro
13905c00b3
Updating PacbioProcessingPipeline to new Queue standards
2011-12-09 16:51:02 -05:00
David Roazen
d014c7faf9
Queue now properly escapes all shell arguments in generated shell scripts
...
This has implications for both Qscript authors and CommandLineFunction authors.
Qscript authors:
You no longer need to (and in fact must not) manually escape String values to
avoid interpretation by the shell when setting up Walker parameters. Queue will
safely escape all of your Strings for you so that they'll be interpreted literally. Eg.,
Old way:
filterSNPs.filterExpression = List("\"QD<2.0\"", "\"MQ<40.0\"", "\"HaplotypeScore>13.0\"")
New way:
filterSNPs.filterExpression = List("QD<2.0", "MQ<40.0", "HaplotypeScore>13.0")
CommandLineFunction authors:
If you're writing a one-off CommandLineFunction in a Qscript and don't really
care about quoting issues, just keep doing things the direct, simple way:
def commandLine = "cat %s | grep -v \"#\" > %s".format(files, out)
If you're writing a CommandLineFunction that will become part of Queue and
will be used by other QScripts, however, it's advisable to do things the
newer, safer way, ie.:
When you construct your commandLine, you should do so ONLY using the API methods
required(), optional(), conditional(), and repeat(). These will manage quoting
and whitespace separation for you, so you shouldn't insert quotes/extraneous
whitespace in your Strings. By default you get both (quoting and whitespace
separation), but you can disable either of these via parameters. Eg.,
override def commandLine = super.commandLine +
required("eff") +
conditional(verbose, "-v") +
optional("-c", config) +
required("-i", "vcf") +
required("-o", "vcf") +
required(genomeVersion) +
required(inVcf) +
required(">", escape=false) + // This will be shell-interpreted
required(outVcf)
I've ported the Picard/Samtools/SnpEff CommandLineFunction classes to the new
system, so you'll get free shell escaping when you use those in Qscripts just
like with walkers.
2011-12-01 18:13:44 -05:00
Mauricio Carneiro
dbd8c25787
No more R resources in the DPP
...
updating the DPP to conform with Analyze Covariates changes.
2011-10-28 16:57:01 -04:00
Mauricio Carneiro
86305a5dcf
Adjusting the memory limits of the MDCP
...
Indel caller needs more than 3G for large datasets.
2011-10-21 17:41:52 -04:00
Mauricio Carneiro
c9d8b22092
Added BWASW support to the pipeline
...
Data Processing Pipeline can now use BWASW for realigning the reads. Useful for Ion Torrent data.
2011-10-20 18:36:28 -04:00
Mauricio Carneiro
093cd95c5d
Merged bug fix from Stable into Unstable
2011-10-20 17:03:22 -04:00
Mauricio Carneiro
d7367c152a
Fixing 'revert' when not realigning
...
RevertSam was reverting the alignment information and that was screwing up the pipeline if you didn't want to run it with BWA. Fixed.
2011-10-20 17:01:54 -04:00
Mauricio Carneiro
ed402588cc
Adding the "gold standard NA12878" target
2011-10-20 16:19:13 -04:00
Mauricio Carneiro
0939d16a8d
String not empty bug
...
Apparently var X: String = _ is not the same as var X: String = "". :(
2011-10-13 13:22:05 -04:00
Mauricio Carneiro
66b5646f95
Adding hidden options to the DPP
...
controlling the default platform parameter to Count Covariates and the number of scatter gather jobs to generate are now available under hidden parameters
2011-10-11 13:56:00 -04:00
Mark DePristo
a91509e7dd
Shouldn't be public
2011-10-05 15:22:57 -07:00
Mauricio Carneiro
d3cc25454c
Updating the MDCP
2011-09-22 11:27:40 -04:00
Mauricio Carneiro
623c49765d
NO BAQ ON EXOMES!
...
says the boss.
2011-09-22 11:13:40 -04:00
Ryan Poplin
5d0f284305
Fixing exome specific arguments to the VQSR in the methods development calling pipeline
2011-09-21 20:26:28 -04:00
Mauricio Carneiro
758ecf2d43
Bringing latest updates of ReduceReads to the master repository
2011-09-20 16:35:09 -04:00
Mauricio Carneiro
08ffb18b96
Renaming datasets in the MDCP
...
Making dataset names and files generated by the MDCP more uniform.
2011-09-20 11:02:51 -04:00
Eric Banks
ba150570f3
Updating to use new rod system syntax plus name change for CountRODs
2011-09-19 13:30:32 -04:00
Eric Banks
85626e7a5d
We no longer want people to use the August 2010 Dindel calls for indel realignment but instead Guillermo's new whole genome bi-allelic indel calls; updating the bundle accordingly. Also, there was some confusion by the 1000G data processing folks as to exactly what these indel files are, so I've renamed them so that it's clear. Wiki updated too.
2011-09-19 12:24:05 -04:00
Khalid Shakir
33967a4e0c
Fixed issue reported by chartl where cloned functions lost tags on @Inputs.
...
Updated ExampleUnifiedGenotyper.scala with new syntax.
2011-09-16 12:46:07 -04:00
Ryan Poplin
981b78ea50
Changing the VQSR command line syntax back to the parsed tags approach. This cleans up the code and makes sure we won't be parsing the same rod file multiple times. I've tried to update the appropriate qscripts.
2011-09-12 12:17:43 -04:00
Mauricio Carneiro
7f9000382e
Making indel calls default in the MDCP
...
You can turn off indel calling by using -noIndels.
2011-09-09 14:09:26 -04:00
Mauricio Carneiro
ee9d599558
Just cleaning up
...
clean up old commented code from tha data processing pipeline.
2011-09-07 13:32:40 -04:00
Mauricio Carneiro
28d782b4c7
Allowing multiple dnsnp and indel files in the DPP
2011-09-02 13:38:47 -04:00
Mauricio Carneiro
ad4ea0b80b
Merged bug fix from Stable into Unstable
2011-09-01 18:14:45 -04:00
Mauricio Carneiro
e253f6f05d
Fixing typo in DPP
...
platform and library were exchanged when rebuilding the read group information
2011-09-01 18:13:52 -04:00
Mauricio Carneiro
d2a33beff7
Added WGS/WEX b37-decoy CEU trio datasets
2011-09-01 13:14:40 -04:00
Mauricio Carneiro
16caca0822
BLASR BAMs and new BWA parameters
...
*Added the functions to turn a BLASR generated BAM file into a usable BAM file.
*Modified the bwa parameters according to test results from NA12878 pb2k dataset.
2011-08-24 17:04:07 -04:00
Mauricio Carneiro
dc8398e165
fixing bai output for indel cleaning.
2011-08-24 15:58:34 -04:00
Mauricio Carneiro
cd12f7f286
Fixed list dependency
...
Instead of creating a bam list file, I dynamically create a scala list and pass as parameters. This way the intermediate bam files don't get deleted before they should.
2011-08-24 11:12:46 -04:00
Mauricio Carneiro
219252a566
Adapting to the new RodBinding framework
2011-08-24 11:12:46 -04:00
Mauricio Carneiro
136f0eb685
Creating sample-bam list instead of joining
...
This should save us at least one day in the trio decoy processing.
2011-08-22 18:03:39 -04:00
Mauricio Carneiro
04d8bcaf19
Fixed bai removal on picard tools
...
BAM index files were not being deleted because picard replaces the name of the file with bai instead of appending to it.
2011-08-22 18:03:39 -04:00
Mauricio Carneiro
caebc88e9a
Consensus mode and new RodBinding framework.
...
The DPP was not using the parameter correctly. It didn't matter for the default option (which is the only one we have been testing) but it would not work for knowns only or smith waterman. It is fixed now.
It now complies with the new rod binding framework.
2011-08-22 18:03:39 -04:00
Ryan Poplin
f93a554b01
updating exome specific parameters in MDCP
2011-08-21 10:25:36 -04:00
Ryan Poplin
b008676878
fixing the previous fix
2011-08-20 21:21:55 -04:00
Ryan Poplin
539e157ecd
Fixing misc parameters in MDCP. The pipeline now does VariantEval of output by default. Fix for NaN vqslod values in VQSR
2011-08-20 11:28:48 -04:00
Ryan Poplin
ddb5045e14
Updating the methods development calling pipeline for the new rod binding syntax and the new best practices.
2011-08-19 19:29:51 -04:00
Mauricio Carneiro
b0ff5b1ff7
a better name for the pacbio processing pipeline
2011-08-10 16:16:53 -04:00
Mauricio Carneiro
481630da00
BWA parameters added
2011-08-09 17:05:24 -04:00
Mauricio Carneiro
22d2563823
added BWA SW alignment
...
The pipeline now accepts fasta/fastq files and aligns them using BWA SW, adds default basequalities, creates read groups and performs BQSR.
2011-08-09 17:05:24 -04:00
Mauricio Carneiro
bd1cf4c7bc
Pacbio Pipeline
...
Added the base quality "filling" step to allow the pipeline to handle raw pacbio BAM files. This is the first step towards a generic pacbio data processing pipeline.
2011-08-09 17:05:24 -04:00
Ryan Poplin
8072bd9831
Updating resource bundle generation qscript for changeover to git
2011-08-08 12:35:39 -04:00
Mauricio Carneiro
2fd101135c
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/stable
2011-08-08 10:49:43 -04:00
Mauricio Carneiro
4d6cb33612
removing temporary bam index
...
The clean bai file was left behind after the data processing pipeline was done
2011-08-08 10:49:28 -04:00
Ryan Poplin
21dc9a5543
Adding mills/devine indel dataset to the resource bundle
2011-08-04 12:31:28 -04:00
Mauricio Carneiro
aff681e407
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/stable
2011-08-04 11:05:25 -04:00
Mauricio Carneiro
23ec5b94cf
fixed a missing check for null
...
There was a missed check for the case when you don't provide an indels vcf for the cleaner.
2011-08-04 09:50:02 -04:00
Mauricio Carneiro
8981367307
Updating memory usage for picard programs
2011-08-03 15:48:28 -04:00
Khalid Shakir
a587f38808
Fixed example unified genotyper pipeline to wrap filter expressions with quotes and use rod binding name "variant" instead of "vcf".
2011-08-03 02:21:01 -04:00
Mauricio Carneiro
2d94037ad0
Remove temporary index files (*.bai)
...
some temporary index files were not being removed.
2011-07-30 02:05:22 -04:00
Mauricio Carneiro
dcf21f379a
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/stable
2011-07-23 12:59:53 -04:00
Mauricio Carneiro
f0a6dd27a1
Renaming the plot output directory names.
2011-07-23 12:59:37 -04:00
Mauricio Carneiro
4f78025b0b
Merged bug fix from Stable into Unstable
2011-07-22 14:42:04 -04:00
Mauricio Carneiro
4080e2cd88
* Added the decoy reference to the bundle under the b37 resources.
...
* Updated the -svn argument to -ver since we don't use svn anymore (also updated the wiki).
2011-07-22 14:41:22 -04:00
Mauricio Carneiro
9ad5c7dfa4
Resolving simple conflicts in the data processing pipeline.
...
Conflicts:
public/scala/qscript/org/broadinstitute/sting/queue/qscripts/DataProcessingPipeline.scala
2011-07-19 08:05:11 -04:00
Mauricio Carneiro
7688bda1a6
better progress report for the DPP
2011-07-18 23:39:47 -04:00
Mauricio Carneiro
2b465ab43b
* added optional 'no validation' for the Data Processing pipeline.
...
* some simplifications on the picard classes
2011-07-18 23:30:31 -04:00
Mauricio Carneiro
4cf7a2af23
Removed broad specific default paths so people from outside the broad can use it.
2011-07-18 23:25:21 -04:00
Mauricio Carneiro
5cb5a4ec75
Merged bug fix from Stable into Unstable
2011-07-16 00:23:59 -04:00
Mauricio Carneiro
dd92a14b40
Made extra indel VCF optional but DBSNP mandatory.
2011-07-16 00:23:35 -04:00
Mauricio Carneiro
2fa5dbb0fe
Merged bug fix from Stable into Unstable
2011-07-16 00:15:19 -04:00
Mauricio Carneiro
ed55182a4c
Removing Broad specific paths from parameters and making them required. This should make it unambiguous for people inside and outside the Broad to use the DataProcessingPipeline (as per request in the GetSatisfaction)
2011-07-16 00:09:00 -04:00
Mauricio Carneiro
43bd45fcad
Merged bug fix from Stable into Unstable
2011-07-15 19:40:02 -04:00
Mauricio Carneiro
fd1df31ef0
changing the output directory names for Analyze Covariates
2011-07-15 19:39:42 -04:00
Mauricio Carneiro
aa30f416a3
Resolving conflicts
...
Conflicts:
private/scala/qscript/depristo/ExomePostQCEval.scala
private/scala/qscript/depristo/PostCallingQC.scala
private/scala/qscript/org/broadinstitute/sting/queue/qscripts/archive/ExomePostQCEval.scala
2011-07-15 16:21:42 -04:00
Mauricio Carneiro
7b7d40d5d9
A better name for the qscript utilities. Throw here every method you find yourself repeatedly implementing in your qscripts!
...
Refactoring appropriately.
2011-07-15 14:34:50 -04:00
Mauricio Carneiro
a670d6420a
Refactoring Qscript utils into queue general utils package.
2011-07-15 14:31:43 -04:00
Mauricio Carneiro
f19862a643
Fixing conflicts.
2011-07-14 17:13:31 -04:00
Mauricio Carneiro
43c6a8565b
looks better now.
2011-07-14 17:10:44 -04:00
Mauricio Carneiro
09ffe277ae
Added a qscripts util package with some utility functions commonly shared across queue scripts. Refactored some of my public scripts to use it in an effort to make queue scripts more reusable and "supportable".
2011-07-14 17:09:35 -04:00
Mauricio Carneiro
4f8230c750
Merged bug fix from Stable into Unstable
2011-07-14 16:44:57 -04:00
Mauricio Carneiro
9f5180ab05
Recalibrates a list of bam files allowing multiple bams to be recalibrated out of a single 'mother' queue job.
2011-07-14 16:42:17 -04:00
Mauricio Carneiro
df996a1a73
more progress report for the Data Processing Pipeline.
...
Bam lists can now have empty lines, comments and whitespaces anywhere.
2011-07-13 14:53:58 -04:00
Mauricio Carneiro
ff4e31c554
Changing the file names as per Kris request.
2011-07-13 12:59:18 -04:00
Mauricio Carneiro
5298e3a942
Making the outputDir optional. Default = ./
2011-07-05 16:30:41 -04:00
Mauricio Carneiro
7d3dfdfdf2
Updating the MDCP to use the classpath for the GATK jar, removing -gatk parameter.
2011-07-05 16:30:10 -04:00
Mauricio Carneiro
b0fb63e20a
moving the example scala scripts to the qscripts package.
2011-07-01 16:14:59 -04:00
Mauricio Carneiro
d19351f71a
Added capability of running multiple bam files in the same directory.
2011-07-01 16:02:28 -04:00
Mauricio Carneiro
197b7141c1
Added an optional argument -bt <num_threads> for BWA to run multithreaded.
2011-06-30 14:41:57 -04:00
Mauricio Carneiro
f4463d38ca
BWA requires pair ended reads to be sorted by read names when operating over BAM files, but Picard sorts by coordinate, so in case we use BWA in pair ended reads, the pipeline now resorts the BAM in read name order, realigns it then sorts it in coordinate order.
2011-06-30 14:29:21 -04:00
Mauricio Carneiro
efd99c3c11
new home for the core qscripts
2011-06-30 11:32:06 -04:00