Commit Graph

180 Commits (d5fce22d78db9638b2ebda858f1ca8a67f928e33)

Author SHA1 Message Date
Khalid Shakir 23e7f1bed9 When an interval list specifies overlapping intervals merge them before scattering. 2012-02-08 02:12:16 -05:00
David Roazen d5199db8ec Be explicit about setting the snpEff -onlyCoding option in the pipeline
When run without an explicit -onlyCoding option, as we've been doing up to
now, snpEff automatically sets -onlyCoding to "true" provided that there is
at least one transcript marked as "protein_coding", which will always be the
case for us in practice (and indeed, all pipeline runs so far with snpEff
2.0.5 have run with -onlyCoding auto-set to "true").

However, given the disastrous effect on annotation quality setting
"-onlyCoding false" has, we wish to be explicit with this option
rather than relying on snpEff's auto-detection logic.
2012-01-17 20:04:27 -05:00
Khalid Shakir ef50e77ee2 When running Queue jobs locally, merge the stderr to the stdout log if the error file is NOT specified.
Updated VE strats in the HSP for plotting Ka/Ks by AC.
2012-01-10 16:10:25 -05:00
Mauricio Carneiro 3358c132a8 Updating the MD5s
Clipping adaptor boundaries changed the results of CountCovariates which affected the PPP output.
a few more loci were visible to locus walkers.
2011-12-21 15:14:05 -05:00
Mark DePristo 0cc5c3d799 General improvements to Queue
-- Support for collecting resources info from DRMAA runners
-- Disabled the non-standard mem_free argument so that we can actually use our own SGE cluster gsa4
-- NCoresRequest is a testing queue script for this.
-- Added two command line arguments:
  -- multiCoreJerk: don't request multiple cores for jobs with nt > 1.  This was the old behavior but it's really not the best way to run parallel jobs.  Now with queue if you run nt = 4 the system requests 4 cores on your host.  If this flag is thrown, though, it will only request 1 and you'll just use 4, like a jerk
  -- job_parallel_env: parallel environment named used with SGE to request multicore jobs.  Equivalent to -pe job_parallel_env NT for NT > 1 jobs
2011-12-20 14:05:09 -05:00
Khalid Shakir 6059ca76e8 Removing cruft that snuck in last commit. 2011-12-16 23:00:16 -05:00
Khalid Shakir 7486696c07 When using bam list mode in HSP deriving VCF name from bam list instead of requiring an additional parameter.
Creating a single temporary directory per ant test run instead of a putting temp files across all runs in the same directory.
Updated various tests for above items and other small fixes.
2011-12-16 18:09:25 -05:00
Mark DePristo 550fb498be Support for NT testing (default up to 4) for CC and UG
-- Added convenience function addJobReportBinding to just new binding to the map (x -> y) as well
2011-12-14 18:45:00 -05:00
Mauricio Carneiro 663184ee9d Added test mode to PPP
* in test mode, no @PG tags are output to the final bam file
* updated pipeline test to use -test mode.
* MD5s updated accordingly
2011-12-12 18:29:06 -05:00
Mauricio Carneiro a3c3d72313 Added test mode to DPP
* in test mode, no @PG tags are output to the final bam file
* updated pipeline test to use -test mode.
* MD5s are now dependent on BWA version
2011-12-12 18:29:06 -05:00
Mauricio Carneiro 52c64b971f Updating MD5s -- really dont know why it didn't update before 2011-12-12 09:48:58 -05:00
Mauricio Carneiro ed91461c49 Data Processing Pipeline Test
* Added standard pipeline test for the DPP
* Added a full BWA pipeline test for the DPP
* Included the extra files for the reference needed by BWA (to be used by DPP and PPP tests)
2011-12-12 00:24:51 -05:00
Mauricio Carneiro cca8a18608 PPP pipeline test
* added a pipeline test to the Pacbio Processing Pipeline.
* updated exampleBAM with more complete RG information so we can use it in a wider variety of pipeline tests
* added exampleDBSNP.vcf file with only chromosome 1 in the range of the exampleFASTA.fasta reference for pipeline tests
2011-12-11 17:32:21 -05:00
Mauricio Carneiro 21ac3b59d7 Merged bug fix from Stable into Unstable 2011-12-09 16:51:46 -05:00
Mauricio Carneiro 13905c00b3 Updating PacbioProcessingPipeline to new Queue standards 2011-12-09 16:51:02 -05:00
David Roazen 1ba03a5e72 Use optional() instead of required() to construct javaMemoryLimit argument in JavaCommandLineFunction 2011-12-05 14:06:00 -05:00
David Roazen d014c7faf9 Queue now properly escapes all shell arguments in generated shell scripts
This has implications for both Qscript authors and CommandLineFunction authors.

Qscript authors:
You no longer need to (and in fact must not) manually escape String values to
avoid interpretation by the shell when setting up Walker parameters. Queue will
safely escape all of your Strings for you so that they'll be interpreted literally. Eg.,

Old way:
filterSNPs.filterExpression = List("\"QD<2.0\"", "\"MQ<40.0\"", "\"HaplotypeScore>13.0\"")

New way:
filterSNPs.filterExpression = List("QD<2.0", "MQ<40.0", "HaplotypeScore>13.0")

CommandLineFunction authors:
If you're writing a one-off CommandLineFunction in a Qscript and don't really
care about quoting issues, just keep doing things the direct, simple way:

def commandLine = "cat %s | grep -v \"#\" > %s".format(files, out)

If you're writing a CommandLineFunction that will become part of Queue and
will be used by other QScripts, however, it's advisable to do things the
newer, safer way, ie.:

When you construct your commandLine, you should do so ONLY using the API methods
required(), optional(), conditional(), and repeat(). These will manage quoting
and whitespace separation for you, so you shouldn't insert quotes/extraneous
whitespace in your Strings. By default you get both (quoting and whitespace
separation), but you can disable either of these via parameters. Eg.,

override def commandLine = super.commandLine +
                           required("eff") +
                           conditional(verbose, "-v") +
                           optional("-c", config) +
                           required("-i", "vcf") +
                           required("-o", "vcf") +
                           required(genomeVersion) +
                           required(inVcf) +
                           required(">", escape=false) +  // This will be shell-interpreted
                           required(outVcf)

I've ported the Picard/Samtools/SnpEff CommandLineFunction classes to the new
system, so you'll get free shell escaping when you use those in Qscripts just
like with walkers.
2011-12-01 18:13:44 -05:00
David Roazen fdd90825a1 Queue now outputs a GATK-like header with version number, build timestamp, etc. 2011-11-23 14:28:35 -05:00
Khalid Shakir c50274e02e During flanking interval creation merging overlapping flanks so that on scatter the list doesn't accidentally genotype the same site twice.
Moved flanking interval utilies to IntervalUtils with UnitTests.
2011-11-17 13:56:42 -05:00
Mark DePristo 0111e58d4e Don't generate PDF unless you have -run specified 2011-11-09 14:45:40 -05:00
Mark DePristo 849c0757f2 Bug fix for LocusScatterFunction when no intervals are provided
-- Now correctly grabs reference contigs and cuts them all up, rather than NPE as intervalString == null.
2011-11-04 10:55:09 -04:00
Mark DePristo bd977c2d92 Bug fix to avoid infinite loop in GATKScatterFunction 2011-11-02 16:20:42 -04:00
Mark DePristo c1da8cd5e7 Final version of bp-resolved locus scatter/gather
-- Minor refactoring to allow LocusScatterFunction to have maxIntervals be the original scatter count, rather than capping this by the interval count as Contig and Interval do
2011-11-02 11:26:34 -04:00
Mark DePristo c2b97030a4 IntervalUtils for completely balanced locus-based scatter/gather
-- scatterLocusIntervals master utility
-- Moved around some general functionality from GenomeLocSortedSet to GenomeLoc
-- Util function for reversing a list (List<T> -> List<T>, unlike Collections version)
-- DoC is PartitionType.INTERVAL
-- Significant unit tests on new functionality (all passing)
-- Ready for real-world testing, as soon as I can get LocusScatterFunction.scala to actually work
2011-11-02 10:49:40 -04:00
Mark DePristo 5fc613f972 Better default partition types for walkers
-- Added PartitionType.READ, and associated ReadScatterFunction.  ReadScatterFunction is literally just ContigScatterFunction until someone wants to implement something better
-- LocusWalkers (and subclasses RodWalkers and RefWalkers) are by default PartitionType.LOCUS.
2011-11-01 19:47:10 -04:00
Mauricio Carneiro dbd8c25787 No more R resources in the DPP
updating the DPP to conform with Analyze Covariates changes.
2011-10-28 16:57:01 -04:00
Khalid Shakir e25d40882a Swapping Thread.sleep(0) with Object.wait(0) caused Queue to lock up. Thanks to rpoplin for pointing it out. 2011-10-28 15:51:03 -04:00
Khalid Shakir b80d407dc7 No more hunting down R "resources". As a tradeoff Rscript cannot be specified on the commandline and will be found in the environment path.
Other minor cleanup.
2011-10-27 14:17:07 -04:00
Eric Banks b39fcb1bea Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-10-26 15:44:25 -04:00
Eric Banks 3273c20c98 Added integration tests for Tribble-based intervals and fixed up some of the other tests based on some method changes. 2011-10-26 15:29:18 -04:00
Khalid Shakir fac9932938 Embedding gsalib source and queueJobReport R scripts in the dist and package jars.
Moved gsalib and queueJobReport.R to embeddable namespaced locations.
Updated packager dependencies/dir to add an @includes which filters the embedded fileset.
RScriptExecutor can now JIT compiles the gsalib.
RScriptExecutor uses ProcessController and sends the Rscript output to java's stdout when run under -l DEBUG.
Refactored ProcessController and IOUtils from Queue to Sting Utils.
Added more unit tests to ProcessController along with a utility class to hard stop OutputStreams at a specified byte count.
Replaced uses of some IOUtils with Apache Commons IO.
ShellJobRunner refactored to use direct ProcessController and now kills jobs on shutdown.
Better QGraph responsiveness on shutdown by using Object.wait() instead of Thread.sleep().
2011-10-24 15:58:34 -04:00
Mauricio Carneiro 86305a5dcf Adjusting the memory limits of the MDCP
Indel caller needs more than 3G for large datasets.
2011-10-21 17:41:52 -04:00
Mauricio Carneiro 9f867d77ca no sort order
subtle bug fixed.
2011-10-20 18:44:09 -04:00
Mauricio Carneiro c9d8b22092 Added BWASW support to the pipeline
Data Processing Pipeline can now use BWASW for realigning the reads. Useful for Ion Torrent data.
2011-10-20 18:36:28 -04:00
Mauricio Carneiro 093cd95c5d Merged bug fix from Stable into Unstable 2011-10-20 17:03:22 -04:00
Mauricio Carneiro d7367c152a Fixing 'revert' when not realigning
RevertSam was reverting the alignment information and that was screwing up the pipeline if you didn't want to run it with BWA. Fixed.
2011-10-20 17:01:54 -04:00
Mauricio Carneiro ed402588cc Adding the "gold standard NA12878" target 2011-10-20 16:19:13 -04:00
Mauricio Carneiro c27e2fb676 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-10-18 15:23:05 -04:00
Menachem Fromer e5fc828546 With Khalid's implicit approval, I have removed this line that overrides the memory limit of the VCF-gathering function, so that the inherited limit remains 2011-10-18 14:47:39 -04:00
Mauricio Carneiro 0939d16a8d String not empty bug
Apparently var X: String = _ is not the same as var X: String = "".  :(
2011-10-13 13:22:05 -04:00
Mauricio Carneiro 66b5646f95 Adding hidden options to the DPP
controlling the default platform parameter to Count Covariates and the number of scatter gather jobs to generate are now available under hidden parameters
2011-10-11 13:56:00 -04:00
Mark DePristo 73f9d1f217 GATK read group requirement iron hand
-- The GATK will now throw a user exception if it opens a SAM/BAM file that doesn't have at least one RG defined
-- LIBS again throws an error if the complete list of samples isn't provided
-- Updating ExmpleCountLociPipeline test to use the well-formated versions of the exampleBAM and exampleFASTA files in testdata, instead of the old broken ones in validation_data.
-- Convenience constructors for UserExceptions.MalformedBAM
2011-10-06 08:40:35 -07:00
Mark DePristo a91509e7dd Shouldn't be public 2011-10-05 15:22:57 -07:00
Khalid Shakir 84bd355690 Merged bug fix from Stable into Unstable 2011-09-27 14:34:39 -04:00
Khalid Shakir b090751f62 Fixed Ant / PluginManager issue where reflections was picking up all class files under current working directory due to "." in jar manifest classpaths.
Updates to HybridSelectionPipeline:
- Added annotations back via snpEff
- Minor updates to VQSR paths and lowered memory
2011-09-27 14:33:57 -04:00
Khalid Shakir 77ba59e30a Merge branch 'master' of ssh://gsa3.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-27 00:51:45 -04:00
Khalid Shakir 648b959361 Minor change to log an info message when a signal such as Ctrl-C is caught. 2011-09-27 00:50:19 -04:00
Mauricio Carneiro d3cc25454c Updating the MDCP 2011-09-22 11:27:40 -04:00
Mauricio Carneiro 623c49765d NO BAQ ON EXOMES!
says the boss.
2011-09-22 11:13:40 -04:00
Ryan Poplin 5d0f284305 Fixing exome specific arguments to the VQSR in the methods development calling pipeline 2011-09-21 20:26:28 -04:00