Commit Graph

4407 Commits (04b4adafda8921b50518cc091ce75398e0f5dfd6)

Author SHA1 Message Date
depristo 04b4adafda File reports are now sorted in order
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4447 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-06 21:54:23 +00:00
fromer 652a3e8de5 Added integration tests for ReadBackedPhasing
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4446 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-06 20:50:32 +00:00
fromer f8f1cc45a3 Now ReadBackedPhasing caps Base Quality by Mapping Quality
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4445 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-06 20:48:57 +00:00
scalvo bda427f078 Change specification of AnnotationInputTable, and fix 2 bugs.
Previous output spec contained 3 columns:
 haplotypeReference,haplotypeAlternate,haplotypeStrand
where haplotypeReference was always on the + strand, and haplotypeAlternate was on the strand specified by haplotypeStrand.

The new specification contains 3 columns:
 haplotypeReference,haplotypeAlternate,transcriptStrand
where haplotypeRef and haplotypeAlt are required to be on the + strand.  transcriptStrand now specifies the strand of the transcript, which is needed for interpreting the haplotypes.

Bugfix #1: fix incorrect assignment of variantCodon and variantAA
(Previously variantCodon was incorrectly set to referenceCodon)

Bugfix #2: fix incorrect codingCoordStr values for - strands (bug reported by Giulio Genovese), and incorrect usage of "m." for mitochondrial transcripts (bug reported by Steve Hershman)



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4444 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-06 20:46:09 +00:00
scalvo b5c127e643 Removed HAPLOTYPE_STRAND_COLUMN; Previously, GenomicAnnotation allowed a user to specify the strand of the haplotypeAlternate, and would reverseComplement the haplotypeAlternate if HAPLOTYPE_STRAND_COLUMN was "-". The new specification does not allow this functionality, and instead requires both the reference and the alternate haplotypes to be on the + strand (as in VCF format).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4443 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-06 20:37:41 +00:00
kshakir ca5db821ce Added the ability to Queue to run scala functions inside the JVM. NOTE: Extend from InProcessFunction instead of CommandLineFunction to use this functionality.
Queue now submits new LSF jobs only after previous functions have completed successfully.
When the Queue process is shutdown (ex: via Control-C) sends a bkill command for any running jobs.
Ported commands like creating directories and scatter/gather interval list to scala functions.
Updates to LSF status tracking by porting the python to internally generated bash scripts.
Temporarily disabled job name submission to LSF.  Plus side is that the full command is now available in "bjobs -w".  TODO: Put back jobName passing to LSF based on an option?
Changed BaseTest to allow scala to access paths to references.
Changed the extension generator to default the analysis name to the walker "name".

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4442 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-06 18:29:56 +00:00
ebanks 3c5dc675ab For Guillermo: only decide that something is a clear reference call if it is at least 10 times as likely as the next best genotype
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4441 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-06 15:16:41 +00:00
depristo effcd26977 Shorter outputs, new summary mode
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4440 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-06 14:34:50 +00:00
depristo d841f260eb minor improvements to queue status
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4439 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-06 14:34:35 +00:00
depristo 0508dd0c31 Better reporting -- figured out how to drop unused levels in subset
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4438 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-06 14:31:51 +00:00
depristo 00491fcd2e Only see not writing GATK Run Report if you are running with debug enabled
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4437 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-06 14:09:21 +00:00
rpoplin 69485d6a7a Added command line argument for the max value of the allele count prior in VariantRecalibrator (--max_ac_prior). Default value increased to 0.99 from 0.95.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4436 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-06 14:00:53 +00:00
ebanks c56c2641a8 /broad/1KG doesn't exist
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4435 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-06 12:54:38 +00:00
ebanks 3d564f4a29 reverting an accidental change from the dindel merge
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4434 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-06 03:08:09 +00:00
chartl 28ac1d325e Commit for Ryan
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4433 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-05 19:04:10 +00:00
depristo e8af776b99 Fixes for example bam files
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4432 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-05 18:43:02 +00:00
ebanks b5e148140b Officially fixed the UG priors; updated the default min MQ/BQs to pipeline values of q20 and min calling threshold to Q50
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4431 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-05 18:35:36 +00:00
fromer c6668bd49c Fixed bug in phasing, where mapping probability was incorrectly raised to the power of number of non-null bases [instead, it is just multiplied into phasing probability once]
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4430 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-05 17:07:31 +00:00
ebanks 7f1e44b764 update the example: /broad/1KG doesn't exist anymore
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4429 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-05 13:50:24 +00:00
hanna 250c18e679 Error message fixes for the following issues:
nvjpM4yOwQAu3fNGxi4oXLuVpKn6aAlf,1GL0OuXK2xKQfvbu34tWYgbojSVSLo0l,
ehEGBJOfgc4V7qj8W0Homf5ICuVK5Sm3,cZsreLm1CbY3aYKZhV7DOSvQNwur41zp,
GlrlyGEyP9kJDIRCQNFQp7BGJBXSzdDJ,hyz1uiHXr39ANmdZu9K1epOSX8EL3mDw,
q0n4EucZESCI4LZhQik306zD4VAuH2cb.  

Messages:
camrhG5tHzlY9WUSEVpVZGkU1tyJqKb5,s0OX2g7nYRctJxyFoQCa6clac9IsjHyi,
THIAtjllvYNlnTmiMnJEIHd2Ju4gqQIO,jwVk3JYZJNHloW7HO4LeGxFexknqro0v,
BFNRGOGmGGJNNPZqgeF1ikTNFfskbyLc,...

Were fixed in 4392.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4428 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-05 03:37:13 +00:00
corin e340be34d8 upping mem limit since something was unhappy with the lower limit
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4427 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-05 02:38:17 +00:00
kshakir bb44044ce0 Fixed re-builds of queue so that previously compiled classes are included. Fixes redundant case of "ant queue test" vs. "ant test".
Refactored temp directory utils.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4426 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-04 21:12:07 +00:00
kshakir 4dfed62e7d Generating the Queue GATK extensions using java, then compiling all the Queue scala code at once to allow circular dependencies between existing and generated scala code.
Will see how this behaves for those using IntelliJ as generated source code will disappear during an ant clean.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4425 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-04 19:38:29 +00:00
kiran 24cf6f9e36 Fix to handle situation where there are no filtered variants.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4424 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-04 18:34:01 +00:00
ebanks aa00801108 remove reference to -mrl
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4423 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-04 17:27:01 +00:00
chartl f978c25b9d Perhaps both, Eric. Perhaps both.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4422 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-04 13:56:04 +00:00
chartl 0eb777612a Swap "." over to VCFConstants.MISSING_DEPTH_v3
Why v3, you ask? Why not? Simply because v2 was a String so old and clunky, the sun would fizzle out and grow cold before any VCF could be successfully parsed.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4421 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-04 13:41:41 +00:00
chartl 74087c44ae Fixed a bug which caused a parsing exception when there was a variant with a dp field of ".", e.g. "GT:DP 0/1:." -- which can happen when using imputation.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4420 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-04 12:37:36 +00:00
ebanks 6448753cf7 Removed the SequenomValidationConvertor and renamed it VariantValidationAssessor since it no longer handles ped/sequenom files (but instead works on vcfs/variantcontexts). Updated all of the wiki docs, including adding instructions on how to convert ped files to vcf, a la Shaun Purcell. We now officially no longer support ped files everyone. Other misc cleanup in the code.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4419 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-04 02:11:38 +00:00
kiran a15757b8e8 Obsoleted by VariantReport.R
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4418 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-04 01:00:59 +00:00
kshakir cf01f6d58a Renamed conflicting 'package.dir' in build.xml to 'package.xml.dir'.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4417 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-04 00:46:47 +00:00
kiran 62f5383859 * Added an R package, "gsalib", providing a place to store common, useful, documented R methods. To use this module, you must follow three steps:
1) Build the module with the following command:
$ ant gsalib

2) Add the module path to your ~/.Rprofile file:
.libPaths("/path/to/Sting/trunk/R/")

3) At the top of each R script that will use the library, include the line:
library(gsalib)

You can now use the package like any other R package.  To get high-level documentation, supply the following command to R:
help(gsalib)

The methods contained herein are:

    getargs         : A method to easily provide arguments to interactive and non-interactive scripts.
                        Prints out a help message specifying how the script should be run if no arguments
                        or "-h" is provided.  Very helpful when you're writing an R-script piecemeal in
                        interactive mode, then want to make it a command-line program.
    plot.venn       : Plots a two-way or three-way proportional Venn diagram.
    read.eval       : Reads VariantEval output that's formatted in R style.
    read.gatkreport : Reads GATKReport output.
    gsa.message     : Emits a message with the prefix "[gsalib]" to stdout.
    gsa.warn        : Emits a warning message with the prefix "[gsalib] Warning:" to stdout.
    gsa.error       : Emits an error message with the prefix "[gsalib] Error: to stdout, calls traceback()
                        and halts execution.

Documentation on each of these methods can be obtained by typing "help(method_name)" at the R prompt.

* Retired GATKReport.R, as that functionality has now been moved to gsalib.
* Retired gsacommons, as that functionality has been split between gsalib and VariantReport.R.
* Modified VariantReport.R to make use of gsalib.  The script now uses the getargs() method to provide the user with some information as to the proper way to run the script.  Documentation on how to prepare output is given at http://www.broadinstitute.org/gsa/wiki/index.php/VariantEval .
* Added 'gsalib' target to build.xml file.  Running "ant gsalib" will compile this module and place the R-ready package in R/gsalib .



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4416 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-04 00:27:59 +00:00
ebanks d8db48204e Fix typo and tell people not to post user errors
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4415 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-03 18:58:03 +00:00
ebanks 490e5e1b0f Better error when bad ref bases are provided
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4414 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-03 05:40:37 +00:00
kiran 40b2f62a83 Changed precision on Ti/Tv in venn diagrams
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4413 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-02 05:27:13 +00:00
kiran d0e44b7a8e Lower precision on Ti/Tv in variant summary matrix
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4412 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-02 05:18:48 +00:00
kiran 6deb755164 Ti/Tv plots are restricted to a Ti/Tv range of 0.0-4.0. Added column to variant summary specifying the total variant counts (known+novel). Allele spectrum plots now show neutral expectation.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4411 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-02 05:15:34 +00:00
aaron 64b7b3f83b fix for a recent change to the indexing code where we ignore the results of locking the file (this is bad), and as a result don't write the index; this should fix the build.
Off to Yosemite in 4 hours, enjoy the week gsa folks!



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4410 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-02 04:35:11 +00:00
depristo 7551ba8249 Trival refactoring in preparation for on-the-fly indexing
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4409 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-01 22:32:59 +00:00
hanna 399e6f1463 Make package dir configurable.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4408 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-01 22:25:17 +00:00
kiran 1d7e48c4b0 Venn diagrams are now oriented properly when a < b. Added a slide with callset summary table. All plots now show the present-in-a, filtered-in-b metrics. Added title page with project name, author, and timestamp.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4407 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-01 22:17:21 +00:00
rpoplin 2f7892601c Useful debugging argument added to VariantRecalibrator to only use sites whose qual field is above --qual
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4406 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-01 21:08:55 +00:00
hanna 575c38fc04 Accidental fail to commit missing file.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4405 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-01 20:26:51 +00:00
kiran fe29c8b09c Placeholder commit: improvements to VariantReport (now shows stats for variants that are called in one set and filtered in another). Better command-line argument support.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4404 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-01 18:46:53 +00:00
delangel d4398f2686 silly bug fix: if I'm to do a short term hack to avoid -infinity likelihoods I might as well do it right.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4403 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-01 18:39:45 +00:00
hanna 8d25a5f9f2 A mechanism for supplying attribution text -- mainly useful for external
walkers.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4402 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-01 18:31:19 +00:00
delangel e920badcc4 Temporary fix for case where genotype likelihoods are exactly (1,0,0) or (0,1,0) etc. at a site with new indel genotyper: this would make us blow up when converting to log space and try to assign genotypes at a site. A more robust solution is in the works.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4401 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-01 17:43:43 +00:00
rpoplin b83fdf8a17 Bug fix in AnalyzeAnnotations. Be sure the site is a biallelic, unfiltered SNP.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4400 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-01 13:09:46 +00:00
chartl 7639692e5b Sigh. Fix the source of even more UserErrors in the phone home directory: make sure to gunzip the beagle files before passing them into the conversion walker...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4399 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-01 03:28:36 +00:00
delangel fa9c21c020 More fixes for exact AF calculation model in new unified genotyper:
a) Fixed bugs in new dynamic programming-based genotyper
b) Fixed up temp hack that handles extended pileups for now.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4398 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-01 02:32:50 +00:00