Commit Graph

159 Commits (f22ab033f6de11053a33bb7bbfa2e2e856d5ee57)

Author SHA1 Message Date
Phillip Dexheimer 296bcc7fb1 Changed name of jobs submitted to cluster job runners
-- Added 'jobRunnerJobName' definition to QFunction, defaults to value of shortDescription
-- Edited Lsf and Drmaa JobRunners to use this string instead of description for naming jobs in the scheduler

Signed-off-by: Joel Thibault <thibault@broadinstitute.org>
2013-11-12 14:34:56 -05:00
Louis Bergelson 9498950b1c Adding more specific error message when one of the scripts doesn't exist.
--Previously it gave a cryptic message:
----IO error while decoding blarg.script with UTF-8
----Please try specifying another one using the -encoding option
2013-10-21 14:57:42 -04:00
Mauricio Carneiro efbfdb64fe Qscript to Downsample and analyze an exome BAM
this script downsamples an exome BAM several times and makes a coverage distribution
analysis (of bases that pass filters) as well as haplotype caller calls with a NA12878
Knowledge Base assessment with comparison against multi-sample calling
with the UG.

This script was used for the "downsampling the exome" presentation
2013-10-10 14:37:33 -04:00
Louis Bergelson c05208ecec Resolving warnings
--specifying exception types in cases where none was already specified
----mostly changed to catch Exception instead of Throwable
----EmailMessage has a point where it should only be expecting a RetryException but was catching everything

--changing build.xml so that it prints scala feature warning details

--added necessary imports needed to remove feature warnings

--updating a newly deprecated enum declaration to match the new syntax
2013-09-23 12:42:22 -04:00
Louis Bergelson b32ad99d3f Changing from scala 2.9.2 to 2.10.2.
--modified ivy dependencies
--modified scala classpath in build.xml to include scala-reflect

--changed imports to point to the new scala scala.reflect.internal.util

--set the bootclasspath in QScriptManager as well as the classpath variable.

--removing Set[File] <-> Set[String] conversions
----Set is invariant now and the conversions broke
--removing unit tests for Set[File] <-> Set[String] conversions
2013-09-23 12:42:22 -04:00
Eric Banks e1174a582d Merge pull request #379 from broadinstitute/mc_dpp_updates_part2
Including SplitByRG in the FullProcessingPipeline
2013-08-19 18:42:12 -07:00
Michael McCowan c3a933ce84 Adaptations to accomodate Tribble API changes, comprising mostly of the following.
* Refactoring implementations of readHeader(LineReader) -> readActualHeader(LineIterator), including nullary implementations where applicable.
* Galvanizing fo generic types.
* Test fixups, mostly to pass around LineIterators instead of LineReaders.
* New rev of tribble, which incorporates a fix that addresses a problem with TribbleIndexedFeatureReader reading a header twice in some instances.
* New rev of sam, to make AbstractIterator visible (was moved from picard -> sam in Tribble API refactor).
2013-08-19 15:52:47 -04:00
Mauricio Carneiro e991307eb5 Including SplitByRG in the FullProcessingPipeline
Why wasn't it there before, you ask
----------------------------------

Before I was running it separately (by hand), but now it's integrated in
the FullProcessingPipeline.

Integration was a pain because of Queue's limitation of only allowing 1
@Output file. This forced me to write the ugliest piece of code of my
life, but it's working and it's processing the YRI from scratch using
that right now. So I'm happy... somewhat.

Other changes to the pipeline
-----------------------------

   * Add --filter_bases_not_stored to the IndelRealigner step -- sometimes BAM files have reads with no bases stored in the unmapped section (no idea why) but this disrupts the pipeline.
   * Change adaptor marking parameter to "dual indexed" instead of "pair-ended" -- for PCR Free data.
2013-08-18 00:51:32 -04:00
Mauricio Carneiro 765f5450ac Updated Full Processing Pipeline
* add interleaved fastq option to sam2fastq
    * add optional adapter trimming path
    * add "skip_revert" option to skip reverting the bams (sometimes useful -- hidden parameter)
    * add a walker that reads in one bam file and outputs N bam files, one for each read group in the original bam. This is a very important step in any BAM reprocessing pipeline.

I am using this new pipeline to process the CEU and YRI PCR Free WGS
trios.
2013-08-13 23:35:32 -04:00
lbergelson af36c7ce9a Update QScript.scala
Relaxing addAll parameter type from Seq to Traversable to make it slightly more flexible.
2013-08-02 14:09:26 -04:00
David Roazen c3d59d890d Update licenses for new PbsEngine* classes 2013-07-01 15:50:20 -04:00
Francesco acf90ca027 corrected number of arguments passed to PbsEngineJobRunner when requesting multiple cores
Signed-off-by: Khalid Shakir <kshakir@broadinstitute.org>
2013-07-01 15:08:15 -04:00
Francesco 948b2fca20 added PbsEngine plugin into engine folders, to be called in Queue with -jobRunner PbsEngine; the plugin is written modifying the existing GridEngine plugin, used as a template
Signed-off-by: Khalid Shakir <kshakir@broadinstitute.org>
2013-07-01 15:08:14 -04:00
Guillermo del Angel f6025d25ae Feature requested by Reich lab and Paavo lab in Leipzig for ancient DNA processing:
-- When doing cross-species comparisons and studying population history and ancient DNA data, having SOME measure of confidence is needed at every single site that doesn't depend on the reference base, even in a naive per-site SNP mode. Old versions of GATK provided GQ and some wrong PL values at reference sites but these were wrong. This commit addresses this need by adding a new UG command line argument, -allSitePLs, that, if enabled will:
a) Emit all 3 ALT snp alleles in the ALT column.
b) Emit all corresponding 10 PL values.
It's up to the user to process these PL values downstream to make sense of these. Note that, in order to follow VCF spec, the QUAL field in a reference call when there are non-null ALT alleles present will be zero, so QUAL will be useless and filtering will need to be done based on other fields.
-- Tweaks and fixes to processing pipelines for Reich lab.
2013-06-17 13:21:09 -04:00
Guillermo del Angel c9d3c67a9b Small Queue/scala improvements, and commiting pipeline scripts developed for ancient DNA processing for posterity:
-- Picard extension so Queue scripts can use FastqToSam
-- Single-sample BAM processing: merge/trim reads + BWA + IR + MD + BQSR. Mostly identical to standard pipeline,
except for the adaptor trimming/merging which is critical for short-insert libraries.
-- Single-sample calling (experimental, work in progress): standard UG run but outputting at all sites, meant for
deep whole genomes.

New scripts
2013-04-08 11:52:13 -04:00
Geraldine Van der Auwera f972963918 Fixed issues raised by Appistry QA (mostly small fixes, corrections & clarifications to GATKDocs)
GATK-73 updated docs for bqsr args
GATK-9 differentiate CountRODs from CountRODsByRef
GATK-76 generate GATKDoc for CatVariants
GATK-4 made resource arg required
GATK-10 added -o, some docs to CountMales; some docs to CountLoci
GATK-11 fixed by MC's -o change; straightened out the docs.
GATK-77 fixed references to wiki
GATK-76 Added Ami's doc block
GATK-14 Added note that these annotations can only be used with VariantAnnotator
GATK-15 specified required=false for two arguments
GATK-23 Added documentation block
GATK-33 Added documentation
GATK-34 Added documentation
GATK-32 Corrected arg name and docstring in DiffObjects
GATK-32 Added note to DO doc about reference (required but unused)
GATK-29 Added doc block to CountIntervals
GATK-31 Added @Output PrintStream to enable -o
GATK-35 Touched up docs
GATK-36 Touched up docs, specified verbosity is optional
GATK-60 Corrected GContent annot module location in gatkdocs
GATK-68 touched up docs and arg docstrings
GATK-16 Added note of caution about calling RODRequiringAnnotations as a group
GATK-61 Added run requirements (num samples, min genotype quality)
Tweaked template and generic doc block formatting (h2 to h3 titles)
GATK-62 Added a caveat to HR annot
Made experimental annotation hidden
GATK-75 Added setup info regarding BWA
GATK-22 Clarified some argument requirements
GATK-48 Clarified -G doc comments
GATK-67 Added arg requirement
GATK-58 Added annotation and usage docs
GSATDG-96 Corrected doc
Updated MD5 for DiffObjectsIntegrationTests (only change is link in table title)
2013-03-12 10:57:14 -04:00
Tad Jordan eb847fa102 Message "script failed" moved to the correct place in the code
GSA-719 fixed
2013-02-04 15:37:23 -05:00
Mauricio Carneiro e7c9e3639e Making metrics a required parameter in MarkDuplicates
As requested by user (forum)
2013-01-25 17:49:49 -05:00
Khalid Shakir c58e02a3bd Added a QFunction.jobLocalDir for optionally tracking a node local directory that may have faster intermediate storage, with SGF ensuring that if the directory happens to be on the same machine that it get's a clone specific sub-directory to avoid collisions. 2013-01-25 14:28:04 -05:00
Ami Levy-Moonshine 0fb7b73107 Merge branch 'master' of github.com:broadinstitute/gsa-unstable 2013-01-18 15:03:42 -05:00
Ami Levy-Moonshine 826c29827b change the default VCFs gatherer of the GATK (not just the UG) 2013-01-18 15:03:12 -05:00
Khalid Shakir 4ffb43079f Re-committing the following changes from Dec 18:
Refactored interval specific arguments out of GATKArgumentCollection into InvtervalArgumentCollection such that it can be used in other CommandLinePrograms.
Updated SelectHeaders to print out full interval arguments.
Added RemoteFile.createUrl(Date expiration) to enable creation of presigned URLs for download over http: or file:.
2013-01-16 12:43:15 -05:00
Ami Levy-Moonshine 352cb831d0 Merge branch 'master' of github.com:broadinstitute/gsa-unstable 2013-01-10 21:27:06 -05:00
Ami Levy-Moonshine fac0bce916 add RunCoveredByNSamplesSites; changes in CoveredByNSamplesSites so it can work in parallel; also, move it to diagnostics 2013-01-10 21:26:49 -05:00
Mauricio Carneiro ea8c8573d2 Fixing ParseLicense script for scala syntax
- Scala allows package objects in its syntax, so the script needs to be aware of that and not add "*/" every time it sees it.

GSATDG-5
2013-01-10 18:24:24 -05:00
Mauricio Carneiro e5913e50b2 Updating licenses for all scala files
GSATDG-5
2013-01-10 17:46:10 -05:00
Ami Levy-Moonshine b5faf00fce same commit as yesterday, since I moved to he new computer 2013-01-07 13:55:29 -05:00
Ami Levy-Moonshine 81eef3aa37 merge development branchs of log-less HMM and FastGatherer to master 2013-01-06 23:01:58 -05:00
Ami Levy-Moonshine fe427cdd77 add few queue script and the CatVariantsGatherer scala class 2012-12-26 13:06:36 -05:00
David Roazen 07b369ca7e Move VCF/BCF2/VariantContext to new standalone org.broadinstitute.variant package
This is an intermediate commit so that there is a record of these changes in our
commit history. Next step is to isolate the test classes as well, and then move
the entire package to the Picard repository and replace it with a jar in our repo.

-Removed all dependencies on org.broadinstitute.sting (still need to do the test classes,
though)

-Had to split some of the utility classes into "GATK-specific" vs generic methods
(eg., GATKVCFUtils vs. VCFUtils)

-Placement of some methods and choice of exception classes to replace the StingExceptions
and UserExceptions may need to be tweaked until everyone is happy, but this can be
done after the move.
2012-12-19 10:25:22 -05:00
Mauricio Carneiro 6d22f4f737 Bringing latest performance updates from the GATK to CMI 2012-12-05 21:40:03 -05:00
kshakir 61bde6210b Restored RemoteFile push and pull in base QScript. 2012-12-04 12:34:07 -05:00
Joel Thibault 97d29f203e Add walltime changes to LSF
- Check whether the specified attribute is available
- Add pipeline test (disabled due to missing attribute)
2012-11-29 15:23:37 -05:00
Johan Dahlberg daf6269b65 Setting the walltime
Signed-off-by: Joel Thibault <thibault@broadinstitute.org>
2012-11-29 15:23:36 -05:00
kshakir a6c1fcd151 Removed default use of @Output syntax.
If compile completes for QScripts, sending runtime errors during execute.
2012-11-29 13:40:36 -05:00
Menachem Fromer c8be7c3102 Keep SNPs and indels separately for batch merging; Add options to DepthOfCoverage to count fragments (to not double-count overlapping reads of same fragment); DepthOfCoverage should now support ReducedReads; Replace recusrion with loop in DoC/package.scala (for lists longer than 5000 elements) 2012-11-21 15:56:53 -05:00
Menachem Fromer 9111966261 Merge branch 'master' of github.com:broadinstitute/gsa-unstable 2012-11-20 12:19:58 -05:00
kshakir 6d59dd3455 Scala classes were only returning direct subclasses (confirmed when inspected in debugger) so changed PluginManager to allow specifying the explicit subclass.
Removed some generics from PluginManager for now until able to figure out syntax for requesting explicit subclass.
QStatusMessenger uses a slightly more primitive Map[String, Seq[RemoteFile]] instead of Map[ArgumentSource, Seq[RemoteFile]].
Added an QCommandPlugin.initScript utility method for handling specialized script types.
2012-11-14 10:33:20 -05:00
Menachem Fromer cde4f037d3 Begin moving XHMM scripts to public 2012-10-25 16:18:25 -04:00
kshakir 8dfa24df7b Sending a version of per job status messages.
In addition to outputs, inputs are passed to QStatusMessenger.done()
CloneFunction.cloneIndex has a new CloneFunction.cloneCount companion useful for display purposes.
2012-10-23 15:55:47 -04:00
Guillermo del Angel 5fac5bf12e Fixed issues with Queue packaging of Picard QC classes: separate jar's are needed fromPicard. User needs to specify the -picardBase argument to point to input path for jars.
> Also, reenable joint cleaning as now it works.
> DEV-125 #resolve
> DEV-90 #resolve
2012-10-23 14:08:31 -04:00
kshakir 0cce1ae8b2 When gathering VCFs, using CombineVariants from the current classpath, and not the GATK used to run the command. This was a concern for external modules that bundled the engine but not CombineVariants. 2012-10-23 12:44:06 -04:00
Mauricio Carneiro c210b7cde4 Merge GATK repo into CMI-GATK
Bringing in the following relevant changes:
	* Fixes the indel realigner N-Way out null pointer exception DEV-10
	* Optimizations to ReduceReads that bring the run time to 1/3rd.

Conflicts:
	protected/java/src/org/broadinstitute/sting/gatk/walkers/compression/reducereads/SlidingWindow.java

DEV-10 #resolve #time 2m
2012-10-23 10:59:11 -04:00
Guillermo del Angel 7860ff7981 a) Resolve [#DEV-56] - test data with indels in new directory private/testdata/CMITestData/. b) Skeleton (not yet working) of fastq-BAM unit test, c) misc bug fixes for QC functions to work (not done yet) 2012-10-22 19:59:15 -04:00
Khalid Shakir fd59e7d5f6 Better error message when generic types are erased from scala collections. 2012-10-22 16:27:31 -04:00
Khalid Shakir 2ef456d51a Added explicit @ClassType annotations to @Argument for Option[Int] or Option[Double] since scala seems to change the reflected type to Option[Object] on some systems.
Changed ReflectionUtils.getGenericTypes' order of looking for @ClassType since the primitive generic wasn't completely erased, only changed to Object which is incorrect.
More fixes to @Arguments labeled as java.io.File via incorrect @Input annotation.
Put in a default undocumented implementation of @Argument doc() to match the one added to @Input.
2012-10-19 13:20:29 -04:00
Guillermo del Angel 4f768e2f58 redo QC picard parts 2012-10-19 12:25:46 -04:00
kshakir 55ac4ba70b Added another utility that can convert to RemoteFiles.
QScripts will now generate remote versions of files if the caller has not already passed in remote versions (or the QScript replaces the passed in remote references... not good)
Instead of having yet another plugin, combined QStatusMessenger and RemoteFileConverter under general QCommandPlugin trait.
2012-10-17 20:00:03 -04:00
kshakir 0196dbeaca Added more logging to push/pull of RemoteFiles. 2012-10-17 09:52:17 -04:00
kshakir f93b279151 Moved the class field caching from QScript to a ClassFieldCache utility.
Using ClassFieldCache to pull values from QScript for passing to done() method of QStatusMessenger.
2012-10-16 18:49:31 -04:00