Commit Graph

8426 Commits (7204fcc2c353315f4c990a91761ba0a6afabdbf9)

Author SHA1 Message Date
Mark DePristo e17a1923fb Plots runtimes by analysis name and exechosts
Useful to understand the performance of analysis jobs by hosts,
and to debug problematic nodes
2011-12-07 09:24:47 -05:00
Mark DePristo 5d2212bc8e Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-12-07 09:03:17 -05:00
Mark DePristo 6bf18899df Fix for variant summary -- now treats all 50 bp deletions or insertions as CNVs 2011-12-07 09:02:49 -05:00
Matt Hanna 5869a87e48 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-12-06 18:12:12 -05:00
Matt Hanna c9b2cd8ba5 Fix for chartl's stale null representation issue. 2011-12-06 18:05:17 -05:00
Eric Banks 79d18dc078 Fixing indexing bug on the ACsets. Added unit tests for the Exact model code. 2011-12-06 16:17:18 -05:00
Khalid Shakir b4b7ae1bd9 Revved Picard to incorporate tfennell's AsyncSAMFileWriter.
Removed DbSnpFileGenerator and related files as they were removed from PPP r2063 by ktibbett.
2011-12-06 10:37:42 -05:00
Matt Hanna f5b977fc88 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-12-06 10:11:35 -05:00
Matt Hanna 4001c22a11 Better file count / buffering variation in test suite. Parameterized read shard buffering. Misc cleanup. 2011-12-06 10:10:38 -05:00
Khalid Shakir 677bea0abd Right aligning GATKReport numeric columns and updated MD5s in tests.
PreQC parses file with spaces in sample names by using tabs only.
PostQC allows passing the file names for the evals so that flanks can be evaled.
BaseTest's network temp dir now adds the user name to the path so files aren't created in the root.
HybridSelectionPipeline:
- Updated to latest versions of reference data.
- Refactored Picard parsing code replacing YAML.
2011-12-05 23:22:15 -05:00
Eric Banks 7a0f6feda4 Make sure that too many alternate alleles aren't being passed to the genotyper (10 for now) and exit with a UserError if there are. 2011-12-05 16:18:52 -05:00
Eric Banks 7fac4afab3 Fixed priors (now initialized upon engine startup in a multi-dimensional array) and cell coefficients (properly handles the generalized closed form representation for multiple alleles). 2011-12-05 15:57:25 -05:00
David Roazen 1ba03a5e72 Use optional() instead of required() to construct javaMemoryLimit argument in JavaCommandLineFunction 2011-12-05 14:06:00 -05:00
Eric Banks a7cb941417 The posteriors vector is now 2 dimensional so that it supports multiple alleles (although the UG is still hard-coded to use only array[0] for now); the exact model now collapses probabilities for all conformations over a given AC into the posteriors array (in the appropriate dimension). Fixed a bug where the priors and posteriors were being passed in swapped. 2011-12-04 13:02:53 -05:00
Eric Banks eab2b76c9b Added loads of comments for future reference 2011-12-03 23:54:42 -05:00
Eric Banks 29662be3d7 Fixed bug where k=2N case wasn't properly being computed. Added optimization for BB genotype case not in old model. At this point, integration tests pass except for 1 case where QUALs differ by 0.01 (this is okay because I occasionally need to compute extra cells in the matrix which affects the approximations) and 2 cases where multi-allelic indels are being genotyped (some work still needs to be done to support them). 2011-12-03 23:12:04 -05:00
Eric Banks 71f793b71b First partially working version of the multi-allelic version of the Exact AF calculation 2011-12-02 14:13:14 -05:00
David Roazen d014c7faf9 Queue now properly escapes all shell arguments in generated shell scripts
This has implications for both Qscript authors and CommandLineFunction authors.

Qscript authors:
You no longer need to (and in fact must not) manually escape String values to
avoid interpretation by the shell when setting up Walker parameters. Queue will
safely escape all of your Strings for you so that they'll be interpreted literally. Eg.,

Old way:
filterSNPs.filterExpression = List("\"QD<2.0\"", "\"MQ<40.0\"", "\"HaplotypeScore>13.0\"")

New way:
filterSNPs.filterExpression = List("QD<2.0", "MQ<40.0", "HaplotypeScore>13.0")

CommandLineFunction authors:
If you're writing a one-off CommandLineFunction in a Qscript and don't really
care about quoting issues, just keep doing things the direct, simple way:

def commandLine = "cat %s | grep -v \"#\" > %s".format(files, out)

If you're writing a CommandLineFunction that will become part of Queue and
will be used by other QScripts, however, it's advisable to do things the
newer, safer way, ie.:

When you construct your commandLine, you should do so ONLY using the API methods
required(), optional(), conditional(), and repeat(). These will manage quoting
and whitespace separation for you, so you shouldn't insert quotes/extraneous
whitespace in your Strings. By default you get both (quoting and whitespace
separation), but you can disable either of these via parameters. Eg.,

override def commandLine = super.commandLine +
                           required("eff") +
                           conditional(verbose, "-v") +
                           optional("-c", config) +
                           required("-i", "vcf") +
                           required("-o", "vcf") +
                           required(genomeVersion) +
                           required(inVcf) +
                           required(">", escape=false) +  // This will be shell-interpreted
                           required(outVcf)

I've ported the Picard/Samtools/SnpEff CommandLineFunction classes to the new
system, so you'll get free shell escaping when you use those in Qscripts just
like with walkers.
2011-12-01 18:13:44 -05:00
Matt Hanna b5b5ffe71d Parameterize intervals. 2011-12-01 13:18:18 -05:00
Matt Hanna ef81224dcf Parameter to customize # CPU threads. 2011-11-30 23:47:46 -05:00
Matt Hanna c9eae32f6e Revving Tribble to actually close file handles when close() is called. 2011-11-30 22:42:21 -05:00
Mark DePristo 3060a4a15e Support for list of known CNVs in VariantEval
-- VariantSummary now includes novelty of CNVs by reciprocal overlap detection using the standard variant eval -knownCNVs argument
-- Genericizes loading for intervals into interval tree by chromosome
-- GenomeLoc methods for reciprocal overlap detection, with unit tests
2011-11-30 17:05:16 -05:00
Matt Hanna b65db6a854 First draft of a test script for I/O performance with the new asynchronous I/O processing.
Also includes convenience parameters for specifying the IO/CPU threading balance outside of a tag.  Will be killed when
Queue gets better support for tagged arguments (hopefully soon).
2011-11-30 13:13:16 -05:00
Laurent Francioli 1d5d200790 Cleaned up unused import statements 2011-11-30 15:30:30 +01:00
Mark DePristo 28b286ad39 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-11-30 09:11:53 -05:00
Laurent Francioli 20bffe0430 Adapted for the new version of MendelianViolation 2011-11-30 14:46:38 +01:00
Laurent Francioli 1cb5e9e149 Removed outdated (and unused) -familyStr commandline argument 2011-11-30 14:45:04 +01:00
Laurent Francioli 9574be0394 Updated MendelianViolationEvaluator integration test 2011-11-30 14:44:15 +01:00
Laurent Francioli f49dc5c067 Added functionality to get all children that have both parents (useful when trios are needed) 2011-11-30 14:43:37 +01:00
Laurent Francioli a4606f9cfe Merge branch 'MendelianViolation'
Conflicts:
	public/java/src/org/broadinstitute/sting/utils/MendelianViolation.java
2011-11-30 11:13:15 +01:00
Laurent Francioli b279ae4ead Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-11-30 10:10:21 +01:00
Laurent Francioli 7d58db626e Added MendelianViolationEvaluator integration test 2011-11-30 10:09:20 +01:00
Ryan Poplin 91413cf0d9 Merged bug fix from Stable into Unstable 2011-11-29 14:01:23 -05:00
Ryan Poplin cb284eebde Further updating VQSR tutorial wiki docs to reflect the bundle 2011-11-29 14:00:57 -05:00
Ryan Poplin dcb889665d Merged bug fix from Stable into Unstable 2011-11-29 09:58:49 -05:00
Ryan Poplin 447e9bff9e Updating VQSR tutorial wiki docs to reflect the bundle 2011-11-29 09:57:45 -05:00
Ryan Poplin 110298322c Adding Transmission Disequilibrium Test annotation to VariantAnnotator and integration test to test it. 2011-11-29 09:29:18 -05:00
Laurent Francioli ab67011791 Corrected bug introduced in the last update and causing no families to be returned by getFamilies in case the samples were not specified 2011-11-29 11:18:15 +01:00
Eric Banks d7d8b8e380 Tribble v42 changes the Codec.canDecode method to take in a String instead of a File; this is something that Jim was adamant about (because Tribble can handle streams other than files). I didn't want the next person who needed to rev Tribble to deal with this change additionally, so I took care of updating the GATK now. 2011-11-28 14:18:28 -05:00
Laurent Francioli a09c01fcec Removed walker argument FamilyStructure as this is now supported by the engine (ped file) 2011-11-28 17:18:11 +01:00
Laurent Francioli 795c99d693 Adapted MendelianViolation to the new ped family representation. Adapted all classes using MendelianViolation too.
MendelianViolationEvaluator was added a number of useful metrics on allele transmission and MVs
2011-11-28 17:13:14 +01:00
Laurent Francioli e877db8f42 Changed visibility of getSampleDB from protected to public as the sampleDB needs to be accessible from Annotators and Evaluators too. 2011-11-28 17:11:30 +01:00
Laurent Francioli 5c2595701c Added a function to get families only for a given list of samples. 2011-11-28 17:10:33 +01:00
Mark DePristo 3c36428a20 Bug fix for TiTv calculation -- shouldn't be rounding 2011-11-28 10:20:34 -05:00
Eric Banks 436b4dc855 Updated docs 2011-11-28 08:59:48 -05:00
Laurent Francioli b1dd632d5d Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
Conflicts:
	public/java/src/org/broadinstitute/sting/gatk/walkers/phasing/PhaseByTransmission.java
2011-11-25 16:16:44 +01:00
Mark DePristo e60272975a Fix for changed MD5 in streaming VCF test 2011-11-23 19:01:33 -05:00
Mark DePristo 12f09d88f9 Removing references to SimpleMetricsByAC 2011-11-23 16:08:18 -05:00
David Roazen fdd90825a1 Queue now outputs a GATK-like header with version number, build timestamp, etc. 2011-11-23 14:28:35 -05:00
Mark DePristo e319079c32 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-11-23 13:02:11 -05:00