Commit Graph

3927 Commits (d3bebe0f2c4ad25720741bc9c191863e2fe62003)

Author SHA1 Message Date
depristo d3bebe0f2c Reasonable comment
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3963 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-06 22:03:55 +00:00
ebanks ac4699a650 Re-enabling this test
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3962 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-06 20:20:37 +00:00
depristo bb5dfd7e5e Slightly nicer plotting; not yet complete
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3961 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-06 20:01:31 +00:00
depristo f275041b1c -minimalVCF for CombineVariants. Work around for broken locking code.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3960 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-06 16:10:59 +00:00
depristo 669d9096e3 now support -o output option, useful for pipelines
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3959 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-06 14:57:04 +00:00
aaron 9076c0b28b removing unused code
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3958 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-06 14:24:39 +00:00
depristo 70f492a6e8 Prints out trivial debugging info
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3957 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-06 13:24:21 +00:00
ebanks 341e752c6c 1) AlleleBalance is no longer a standard annotation, but the Allelic Depth (AD) is for each sample.
2) Small fixes in the VCFWriter:
a) Trailing missing values weren't being removed if their count was > 1 (e.g. ".,.")
b) We were handling key values that were Lists, but not Arrays.  We now handle both.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3956 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-06 12:05:14 +00:00
aaron c68625f055 Fixes from Mark for the MutableContexts; this fixes the clearGenotypes() and the clearFilters() methods, and adds a method to clear the attributes. Also added is a method for creating a variant context where the attribute list is pruned to a specific subset, which can be null.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3955 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-05 22:39:51 +00:00
aaron 72ae81c6de VariantContext has now moved over to Tribble, and the VCF4 parser is now the only VCF parser in town. Other changes include:
- Tribble is included directly in the GATK repo; those who have access to commit to Tribble can now directly commit from the GATK directory from Intellij; command line users can commit from 
inside the tribble directory.
- Hapmap ROD now in Tribble; all mentions have been switched over.
- VariantContext does not know about GenomeLoc; use VariantContextUtils.getLocation(VariantContext vc) to get a genome loc.
- VariantContext.getSNPSubstitutionType is now in VariantContextUtils.
- This does not include the checked-in project files for Intellij; still running into issues with changes to the iml files being marked as changes by SVN

I'll send out an email to GSAMembers with some more details.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3954 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-05 18:47:53 +00:00
fromer b21f90aee0 Added preliminary framework for performing short-range phasing (ReadBackedPhasingWalker.java)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3953 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-05 14:56:34 +00:00
rpoplin a8d37da10b Checking in everyone's changes to the variant recalibrator. We now calculate the variant quality score as a LOD score between the true and false hypothesis. Allele Count prior is changed to be (1 - 0.5^ac). Known prior breaks out HapMap sites
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3952 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-05 14:12:19 +00:00
ebanks 07addf1187 Fix for Kiran: since the Variant Annotator will re-annotate on top of existing annotations it makes sense to remove old headers if they conflict with the definitions being added by VA.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3951 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-05 06:44:39 +00:00
ebanks 1539791a04 Fix for Kiran: when using VCFs for the comp tracks in the Annotator(s), don't put the headers from them into the output VCF.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3950 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-05 04:45:47 +00:00
ebanks 227c4b10f0 Bug fix for Chris: convert comp tracks to VC so that we can respect the filter field. Added an integration test to cover this.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3949 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-05 04:13:16 +00:00
ebanks 84ca2f27bb Bug fix for Chris: added method createPotentiallyInvalidGenomeLoc() to the GenomeLocParser that doesn't check that the contig exists in the sequence dictionary. This is crucial for lifting over from one reference to another, as sometimes contigs names change in the liftover (e.g. chrM to MT).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3948 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-05 03:19:02 +00:00
ebanks f247cbf68e I want to be the first to use the new super-cool Hidden annotation! No more telling people not to use the cleaner debugging options.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3947 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-05 02:44:37 +00:00
hanna 78bfe6ac48 Added @Hidden annotation, a way to deliberately exclude experimental fields and
walkers from the help system.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3946 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-05 02:26:46 +00:00
chartl 82d6c5073b A simple read strand filter for potluri on get satisfaction
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3945 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-04 23:23:50 +00:00
asivache d53d5ffbf6 A utility class that computes running average and standard deviation for a stream of numbers it is being fed with. Updates mean/stddev on the fly and does not cache the observations, so it uses no memory and also should be stable against overflow/loss of precision. Simple unit test is also provided (does *not* stress-test the engine with millions of numbers though).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3944 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-04 21:39:02 +00:00
ebanks 8d8acc9fae Moving G's MyHapScore to replace the old HapScore
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3943 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-04 21:00:54 +00:00
ebanks 7858ffec32 Spit out the error in the warning message so that Sendu can tell me what his problem is
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3942 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-04 20:40:28 +00:00
chartl 5815348ebc Switch to newer version of comp tracks (and make the trigger track a comp as well). Indel cleaning should override the interval list and only use the contig interval list; and also force jobs to go to long.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3941 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-04 20:05:27 +00:00
delangel 86211b74e8 Bug fix: when padding alleles in creating a Variant context from an indel, leave no-call alleles as no-call alleles.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3940 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-04 19:51:10 +00:00
chartl 38e65f6e1b Added: A VariantEval module that gives simple metrics by sample, an an abstract class that makes per-sample modules easy to write (but a little bit clunky since a class needs be defined for each data point -- see SimpleMetricsBySample as an example). AnalysisModuleScanner needed a slight update to pull in data points from parent classes for this to work (thanks Khalid for showing me how to do this). After a code review with Aaron (thanks) and ensuring integration tests pass, I am committing.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3939 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-04 19:37:39 +00:00
hanna f13d52e427 Attempt to determine whether underlying filesystem supports file locking and
disable on-the-fly dict and fai generation.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3938 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-04 19:28:27 +00:00
kiran 1a36cb9296 Can now set the maximum number of variants to see in a cluster plot (useful when you don't need to see a billion points to get an idea of what's going on. Limit applies to known and novel variants separately.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3937 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-04 18:45:24 +00:00
kiran bd27287fe7 An R module that takes in a Variant Recalibration cluster file (file with '@!CLUSTER' lines in it), a tabularized VCF, and optionally a set of loci that should be examined more carefully, and emits a tremendous number of plots. For every annotation used in clustering, the distributions and pair-wise comparison (with ellipses denoting the 2-sigma cluster boundaries) are shown. Each cluster is shaded with a color proportional to its mixture coefficient.
To use this module, you'll first have to take your VCF and create an R-readable table out of it with the following command:

python /path/to/Sting/trunk/python/vcf2table.py -f CHROM,POS,ID,AC,AF,AN,DB,DP,HRun,MQ,MQ0,MyHaplotypeScore,QD,SB my.vcf > my.vcf.table

Then, simply invoke this module with the command:

Rscript /path/to/Sting/trunk/R/VariantRecalibratorReport/VariantRecalibratorReport.R /path/to/output/prefix /path/to/my/my.clusters /path/to/my.vcf.table [/path/to/my.suspicious.loci]

This will create a number of plots all with the prefix "/path/to/output/prefix".  For instance, if you used QD, SB, HRun, and MyHaplotypeScore annotations during clustering, you should see output like this:

    /path/to/output/prefix.anndist.HRun.pdf
    /path/to/output/prefix.anndist.MyHaplotypeScore.pdf
    /path/to/output/prefix.anndist.QD.pdf
    /path/to/output/prefix.anndist.SB.pdf
    /path/to/output/prefix.cluster.HRun_vs_MyHaplotypeScore.pdf
    /path/to/output/prefix.cluster.HRun_vs_QD.pdf
    /path/to/output/prefix.cluster.HRun_vs_SB.pdf
    /path/to/output/prefix.cluster.MyHaplotypeScore_vs_HRun.pdf
    /path/to/output/prefix.cluster.MyHaplotypeScore_vs_QD.pdf
    /path/to/output/prefix.cluster.MyHaplotypeScore_vs_SB.pdf
    /path/to/output/prefix.cluster.QD_vs_HRun.pdf
    /path/to/output/prefix.cluster.QD_vs_MyHaplotypeScore.pdf
    /path/to/output/prefix.cluster.QD_vs_SB.pdf
    /path/to/output/prefix.cluster.SB_vs_HRun.pdf
    /path/to/output/prefix.cluster.SB_vs_MyHaplotypeScore.pdf
    /path/to/output/prefix.cluster.SB_vs_QD.pdf



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3936 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-04 18:35:14 +00:00
ebanks 340bd0e2c1 Removed hard-coded pointers to references
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3934 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-04 17:59:37 +00:00
asivache a47824d680 A couple of type specific implementations of a single extend() method: takes an array (byte[] or short[] currently) and "extends" it to the left or to the right by the specified number of elements. Returns newly allocated array, with the content of original array copied in (if we extend by n elements to the left, then the returned array will have n default-filled elements *followed* by the content of the old array).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3932 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-04 15:30:48 +00:00
asivache 012a7cf0a5 mismatchCount now has a version that counts mismatches only along a part of the read (takes additional args start_on_read and length_on_read to specify the read's subsequence to be interrogated);
isMateUnmapped() convenience shortcut method added.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3931 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-04 15:27:35 +00:00
delangel e6e8a20a1e 1) Fix MyHaplotypeScore to ignore 454 reads, since all those pathological non-existing indels make some sites' score blow up. If a site is only covered by 454 reads, we (hopefully) detect this graciously and just emit a score of 0.0 for the site.
2) New annotation SByDepth = log10(-StrandBias/Depth) (non-standard annotation, key name = "SBD"). If StrandBias/Depth happens to be positive (very rare but can happen), annotation gets value=-1000. 
3) Abstracted out new class AnnotationByDepth so that QD and SBD can share code.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3930 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-04 15:23:08 +00:00
ebanks bf60ed0b25 Needed it here too: warn user instead of dying if the R script cannot be executed
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3929 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-04 13:11:27 +00:00
ebanks 40ffe34686 Warn user instead of dying if the R script cannot be executed
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3928 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-04 13:08:15 +00:00
ebanks 17d5e89734 Now --list annotates which modules are Standard
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3927 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-03 21:00:37 +00:00
ebanks 72875cf717 Removing annoying printouts
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3926 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-03 19:55:00 +00:00
ebanks 2307bed742 VariantEval now uses the "standard" modules only by default. You can add other modules with the -E argument and not use all of the standard ones with -noStandard (they can be added back individually with -E).
Generalized some of the packaging code from VariantAnnotator.  Matt might want to take a look to make this nicer...?



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3925 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-03 16:51:10 +00:00
ebanks a7ff9caf54 Added sanity check against bad people and/or crazy big indels at edges of ref context
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3918 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-03 05:37:17 +00:00
hanna 5f1b67c1de Coping out and forcing the entire GATK (and associated JVM) to use US English
locale.  Method to force JVM into proper locale exists in CommandLineProgram
and is disabled by default, but implementers of CommandLineProgram can opt in
to the forced US locale by calling a static method.

Question for the VCF developers: I removed the code to explicitly output doubles
in US locale.  Do you / how do you want to handle this in applications that use
Tribble outside the GATK?


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3917 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-03 03:48:26 +00:00
hanna b5b2c19124 Updated resources package descriptor with dbsnp 129 for b37.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3916 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-03 02:42:48 +00:00
chartl 2bc69572cb Make transcript2info capable of handling b37/hg19 contigs
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3915 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-02 17:32:08 +00:00
depristo c203e0fb02 Added JEXL support for hetCount, homRefCount, and homVarCount in VCs.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3914 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-02 12:24:11 +00:00
depristo 7fab5c0a8f support for -singleton_fp_rate arguments to variant recalibrator instead of the pop.gen. AF prior. Worth experimenting with Ryan.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3913 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-31 21:17:47 +00:00
chartl 6dcb63888d Be smart about the headers.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3912 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-30 16:11:16 +00:00
chartl eeb767a012 Remove legacy "out"
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3911 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-30 15:01:02 +00:00
chartl f20cdbe60a Modified to work with MT containing VCFs.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3910 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-30 14:59:59 +00:00
ebanks 6d91cd587e Be explicitly clear about which options are for debugging purposes only and shouldn't be used if your username is not ebanks@broad. If only we had a @hidden annotation option for args...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3909 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-30 14:18:31 +00:00
depristo ac8048f17b Support for automated selects for tranches in variant eval -- use -tf to make tranch-specific ve outputs. ApplyVariantCuts with tranche reading functions for general use, along with todo for ryan. CombineVariants now has --filteredAreUncalled and will treat filtered snps in input VCFs are uncalled, and so won't emit -filteredInOther set features
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3908 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-30 14:16:43 +00:00
chartl 9231d13252 Minor modification: adding an argument to make slightly more general.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3907 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-30 05:20:20 +00:00
chartl db54d63fc7 Hahaha yes, ownage. This now works.
BTW, Eric, thanks for forwarding the DepthOfCoverage thread to gsamembers. I'd forgotten about reduce by interval. Mighty helpful in this case!




git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3906 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-30 04:23:02 +00:00