Commit Graph

4702 Commits (ce051e4e9a2c5dd4e98c510a7e90426d258254ea)

Author SHA1 Message Date
ebanks ce051e4e9a Write to sdout when no -o is provided
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4743 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-28 06:19:26 +00:00
kiran 9cca14acc5 Changed VCF subsetting procedure.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4742 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-28 00:46:29 +00:00
kiran f8a3bd7243 A helper script to merge two VCFs, run VariantEval, and the VariantReport.R script.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4741 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-28 00:45:21 +00:00
kiran c40c9aa5ef Had these changes regarding memory cutoffs for a while. Committing them so I can be running a clean codebase, but people shouldn't use this. Queue is way better.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4740 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-27 23:31:31 +00:00
kiran ecd496cf51 Modifications to reflect changes to gsalib. Smarter about figuring out the names of the filtered parts of the callset.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4739 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-27 23:26:03 +00:00
kiran 247f33a553 Prefixed all the functions with gsa. in order to distinguish the methods from other possible methods of the same name in the namespace.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4738 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-27 23:24:42 +00:00
kshakir 6f8cd97673 Added a ten sample 1000G whole exome test along with SimpleMetricsBySample to the pipeline validation.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4737 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-26 23:17:23 +00:00
ebanks e3e6d176df Looking over the daily error log email made me realize that there were 2 implementations of vc.modifyLocation() - the correct one in VC that didn't require lazy loading the genotype data and the bad one in VCUtils that did. Removing the implementation in VCUtils and updating the code accordingly. Also, removing createPotentiallyInvalidGenomeLoc() since no one uses it anymore.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4736 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-26 18:40:34 +00:00
ebanks 35b90d2295 Don't compute SB for ref calls
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4735 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-26 03:54:26 +00:00
ebanks 6934f83cc7 Two changes to CombineVariants.
1. Fix: VCs were padded before the merge, but they were never unpadded afterwards.  This leaves us with a VC that doesn't meet our spec.
2. Update: instead of running the merged VC through every standard annotation (which seems really wrong, since this isn't the annotator tool), just update the chromosome count annotations (AC,AF,AN) through VCUtils.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4734 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-25 04:52:12 +00:00
fromer d775192631 Check if MNP annotation of amino acid is dependent on the MNP, or could it be obtained through some single-base variant?
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4733 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-24 22:38:33 +00:00
rpoplin 0dd40c3684 Updating doc text
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4732 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-24 21:34:14 +00:00
rpoplin ed08899abc Overwhelming evidence that maxQ = 50 is now a better default than maxQ = 40 in the base quality score recalibrator, especially when combined with dbsnp build 132. Also, added option in ProduceBeagleInputWalker for Beagle-ing chromosome X calls with male samples which sets the genotype likelihood for the AB allele to zero for those samples.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4731 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-24 21:32:26 +00:00
fromer ca70ed611c Totally revamped the MNP annotation and put it in its own walker: AnnotateMNPsWalker
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4730 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-24 18:05:10 +00:00
corin 6b70cde0b9 Adding a forgotten quote mark
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4729 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-24 16:38:27 +00:00
depristo 8768e1a240 Useful profiling tool that reads in a single rod and evalutes the time it takes to read the file by byte, by line, into pieces, just the sites of the vcf, and finally the full vcf. Emits a useful table for plotting with the associated R script that can be run like Rscript R/analyzeRodProfile.R table.txt table.pdf titleString
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4728 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-24 14:59:16 +00:00
ebanks 7a8b85dd15 Catch the JEXL exception when trying to match a variable that's not in the context - and don't filter in these cases. Now everyone can happily go back to using the stupid (and hopefully temporary) AlleleBalance filter.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4727 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-24 05:00:41 +00:00
ebanks caf2c21f61 Must close the writer to flush the cache
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4726 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-24 04:33:08 +00:00
ebanks 816c33c821 indel-related fixes to the strict validator
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4725 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-24 04:08:34 +00:00
delangel 9cdc341be5 Trivial update for data processing paper: change syntax of output argument for Beagle by depth walker to update to new GATK format.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4724 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-24 01:45:44 +00:00
corin e15d18129c Adding by sample metrics. Not sure why we didn't have this in here in the first place
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4723 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-23 21:36:03 +00:00
ebanks ea6e2218c1 1. dbsnp has some massive indels which my left-aligner was barfing on because there isn't enough reference context; fixed. 2. Lower default calling threshold to Q30 for UGv2.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4722 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-23 19:28:33 +00:00
corin fe28f8da9c Removing Uniquify from main pipeline indel merge, since the pipeline isn't merging from samples with the same name anyway.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4721 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-23 17:25:22 +00:00
aaron 53672361cc capture more details when something IO-related goes wrong in writing a Tribble index
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4720 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-23 17:06:28 +00:00
hanna 082073ca3c Stop RBP.getPileupBySample() from throwing a NullPointerException if the
sample doesn't exist -- now returns null.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4719 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-23 05:17:06 +00:00
kiran d2fc30d188 Added a debugging statement to plot.venn
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4718 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-23 01:19:31 +00:00
kshakir 787e5d85e9 Added the ability to test pipelines in dry or live mode via 'ant pipelinetest' and 'ant pipelinetest -Dpipeline.run=run'.
Added an initial test for genotyping chr20 on ten 1000G bams.
Since tribble needs logging support too, for now setting the logging level and appending the console logger to the root logger, not just to "org.broadinstitute.sting".
Updated IntervalUtilsUnitTest to output to a temp directory and not the SVN controlled testdata directory.
Added refseq tables and dbsnps to validation data in BaseTest.
Now waiting up to two minutes for gather parts to propagate over NFS before attempting to merge the files.
Setting scatter/gather directories relative to the -run directory instead of the current directory that queue is running.
Fixed a bug where escaping test expressions didn't handle delimiters at the beginning or end of the String.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4717 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-22 22:59:42 +00:00
depristo 187b464ded calling pipeline for v13 of the paper calls -- the final version
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4716 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-22 21:57:12 +00:00
kiran 28805d17ca Commenting out allele-balance for now.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4715 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-22 16:48:08 +00:00
hanna 8ca5edf89f Fix issue where non-required file inputs can throw a NullPointerException
rather than a UserException when an the input argument is specified without
an argument value. 
The magnitude of code required to fix this points to a need to give the
command-line argument system a good spring cleaning.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4714 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-22 01:49:17 +00:00
ebanks b9a59ea54f Adding Het/Hom ratio to the temp per sample metrics. Because I'm in a generous mood tonight, I'm going ahead and fixing the paths for the classes I'm touching...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4713 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-21 04:24:42 +00:00
ebanks cff7c6ddce These are user exceptions
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4712 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-21 02:08:11 +00:00
bthomas 374c0deba2 Updating the core LocusWalker tools to include the Sample infrastructure that I added last month. This commit touches a lot of files, but only significantly changes a few: LocusIteratorByState and ReadBackedPileup and associated classes.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4711 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-19 19:59:05 +00:00
kshakir c723db1f4b Added a -summary jexl argument to VariantEval similar to -validate.
Updated the package of ValidationGenotyper to match the file location.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4710 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-19 04:42:46 +00:00
corin 8dca5bd861 Putting the annotation back in, both to the filters and to UG
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4709 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-18 21:02:15 +00:00
corin da1fe5bb37 Removing the AB filter given that we don't have that in the VCF anymore
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4708 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-18 20:22:05 +00:00
kshakir 79725f2d9c Excluding the QFunction log files from the set of files to delete on completion.
When a QGraph is empty displaying a warning instead of crashing with an JGraph internal assertion error.
Cleaned up code using the Log4J root logger and explicitly talking to a logger for Sting.
When integration tests are run detecting that the logger has already been setup so that messages aren't logged twice.
Updated from Ivy 2.2.0-rc1 to 2.2.0.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4707 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-18 20:22:01 +00:00
depristo 721e8cb679 VariantsToTable now supports wildcard captures. -F PREFIX* now captures all fields that begin with PREFIX, output as a comma-separated list of unique values. Added integration test for VariantsToTable since I find it so useful.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4706 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-18 18:54:59 +00:00
hanna 302cc13735 Trying out Queue for the first time.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4705 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-18 18:29:12 +00:00
depristo 5dabf73039 Useful script for me
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4704 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-18 15:21:06 +00:00
depristo 8cba86a69d Trivial code organization for the haplotype score
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4703 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-18 12:32:55 +00:00
hanna 9f356b6cd0 Package all walkers in org/broadinstitute/sting/gatk/walkers directory in release.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4702 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-18 02:33:45 +00:00
hanna 9942f436b4 Support dist target for externally written walkers.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4701 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-17 20:41:04 +00:00
hanna 90711d445c Change the interface for RMDTrackBuilder, therefore always mandating the specification
of a sequence dictionary and related info.  This will hopefully eliminate the cases in
which the refseq track depends a sequence dictionary / contig parser that hasn't been
specified.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4700 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-17 19:00:17 +00:00
fromer 367cc9135f Use VariantContext and Genotype accessor methods for attributes that will return null for unparseable data
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4699 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-17 18:19:56 +00:00
fromer 2f3578182a Added VERY preliminary version for merging refseq annotations as SNPs are merged
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4698 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-17 16:49:12 +00:00
fromer e2f7f33ce7 Added getIntegerAttribute()
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4697 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-17 16:33:07 +00:00
kiran d492eb94ad Actually subsets the resulting table now, like it was supposed to all along.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4696 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-17 16:18:23 +00:00
depristo d86ab2becb JEXL expressions now generate exceptions, not warnings. Tools should catch the runtime exception to handle correctly. Removed unncessary complexity from the JEXL contexts
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4695 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-17 16:08:16 +00:00
delangel 539651de30 Initial version of Indel Statistics module for Variant Eval - not for general use yet, needs more verification and more work. Older IndelHistogram module will be obsolete with this new walker. Right now, for each sample (and for all samples), the following are computed:
- Number of insertions
- Number of deletions
- Length distribution for indels.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4694 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-17 15:52:01 +00:00