kshakir
4f51a02dea
Changed logging level to default at INFO instead of WARN.
...
Changes to StingUtils command line for use in Queue, replacing Queue's use of property files.
Updates to walkers used in existing QScripts to add @Input/@Output.
RMD used in @Required/@Allows now has a new default equal to "any" type.
New QueueGATKExtensions.jar generator for auto wrapping walkers as Queue CommandLineFunctions.
Added hooks to modify the functions that perform the Scattering and Gathering (setting their jar files, other arguments, etc.)
Removed dependency on BroadCore by porting LSF job submitter to scala.
Ivy now pulls down module dependencies from maven.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3984 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-09 16:42:48 +00:00
aaron
30178c05c5
providing a way to specify how you'd like -BTI combined with your -L options; set BTIMR to either UNION (default) or INTERSECTION.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3983 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-09 14:00:52 +00:00
aaron
f3883585d0
removing the build lines I inadvertently committed. As a note: these are the lines you need to add if you want to debug tests, just don't check them in.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3982 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-09 13:48:16 +00:00
hanna
6b4a1e3b9f
Reenabling code that was commented out after it was confirmed to work by many participating in this thread:
...
http://getsatisfaction.com/gsa/topics/error_thrown_when_reading_reference_file
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3981 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-09 00:12:09 +00:00
kiran
48e311a5ea
Added copyright notice.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3980 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-08 07:11:51 +00:00
kiran
9aa70d9c7c
Replaced by SelectVariants
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3979 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-08 07:07:42 +00:00
kiran
758ab428f5
Better logging info for the samples being selected and the sample expressions being ignored.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3978 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-08 07:03:37 +00:00
kiran
e242a8f143
Put single quotes around the regex. This isn't strictly necessary through the integration test machinery, but *is* necessary at the console, and it's convenient to be able to cut and paste this.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3977 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-08 05:56:57 +00:00
kiran
13f29660bb
Integration test for SelectVariants. Tests a complex case with an explicit sample selection, sample selection by regex, exclusion of non-variant and filtered loci, and JEXL selection on low allele-frequency variants
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3976 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-08 05:49:47 +00:00
ebanks
637a1e5055
Updating to use the new VA interface
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3975 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-08 05:31:01 +00:00
ebanks
bd6d5a8d51
Adding command-line header to VA and VF
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3974 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-08 05:21:15 +00:00
kiran
64446f0ddf
Avoid NaNs in the final output.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3973 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-08 05:16:52 +00:00
ebanks
3f6e44dc71
Updated recalibrator and cleaner to output full command-lines in the bam header
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3972 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-08 04:39:18 +00:00
kiran
0da0dfa1da
Cosmetic change - lower-case for all command-line arguments' short names.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3971 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-08 04:12:01 +00:00
kiran
eb1bb94d1c
Moved the evaluation of the JEXL expressions to a point *after* the samples are subset and the INFO-field annotations are updated. I think this makes more sense than having the evaluations happen beforehand, since it seems jarring to have the JEXL expressions operate on the annotations before they're updated, and have the file contain the annotations after they're updated. Now, selecting on something like allele frequency will actually apply to the annotations that actually end up in the file, while selection on other annotations (which are carried over without modification) will act exactly the same regardless.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3970 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-08 04:09:02 +00:00
ebanks
594b7912f1
Added a generic method for returning the complete command-line used when calling a walker, to be used in the bam/vcf headers. As requested, every possible engine/walker argument is included. I've added it to the Unified Genotyper output, so people can try it out and let me know what they think. Something that needs to be discussed in group meeting: what happens when we merge VCFs? Do we keep all of the command-lines?
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3969 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-08 03:53:07 +00:00
kiran
6e389059cf
An improved version of VariantSubset and VariantSelect, meant to replace those walkers. Takes in a VCF and creates a subsetted VCF by sample(s), JEXL expressions, or both.
...
When subsetting by sample, the -SN argument is treated as a literal sample name and, if no match is found, as a regular expression. This allows for a large number of samples to be selected at once (useful when, for instance, cases are given one sample name prefix and controls are given another).
After the subsetting procedure, the INFO-field annotations AC, AN, AF, and DP are all recalculated to properly reflect the new contents of the VCF.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3968 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-08 02:57:06 +00:00
depristo
41fee2d75e
Publication tranches report is now the default output
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3967 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-07 13:58:59 +00:00
depristo
f4ffef4479
Default max variants is now 5000
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3966 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-07 13:58:32 +00:00
depristo
80e31df40d
Useful script to see the status of gsa computing resources. Crontab'd and will be arriving as email at 8 am
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3965 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-07 12:36:28 +00:00
depristo
b63d64bbbc
Beautiful labels, better choice of dimension ranges. Supports fast loading of just first N records for testing.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3964 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-06 23:17:32 +00:00
depristo
d3bebe0f2c
Reasonable comment
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3963 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-06 22:03:55 +00:00
ebanks
ac4699a650
Re-enabling this test
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3962 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-06 20:20:37 +00:00
depristo
bb5dfd7e5e
Slightly nicer plotting; not yet complete
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3961 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-06 20:01:31 +00:00
depristo
f275041b1c
-minimalVCF for CombineVariants. Work around for broken locking code.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3960 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-06 16:10:59 +00:00
depristo
669d9096e3
now support -o output option, useful for pipelines
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3959 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-06 14:57:04 +00:00
aaron
9076c0b28b
removing unused code
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3958 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-06 14:24:39 +00:00
depristo
70f492a6e8
Prints out trivial debugging info
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3957 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-06 13:24:21 +00:00
ebanks
341e752c6c
1) AlleleBalance is no longer a standard annotation, but the Allelic Depth (AD) is for each sample.
...
2) Small fixes in the VCFWriter:
a) Trailing missing values weren't being removed if their count was > 1 (e.g. ".,.")
b) We were handling key values that were Lists, but not Arrays. We now handle both.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3956 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-06 12:05:14 +00:00
aaron
c68625f055
Fixes from Mark for the MutableContexts; this fixes the clearGenotypes() and the clearFilters() methods, and adds a method to clear the attributes. Also added is a method for creating a variant context where the attribute list is pruned to a specific subset, which can be null.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3955 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-05 22:39:51 +00:00
aaron
72ae81c6de
VariantContext has now moved over to Tribble, and the VCF4 parser is now the only VCF parser in town. Other changes include:
...
- Tribble is included directly in the GATK repo; those who have access to commit to Tribble can now directly commit from the GATK directory from Intellij; command line users can commit from
inside the tribble directory.
- Hapmap ROD now in Tribble; all mentions have been switched over.
- VariantContext does not know about GenomeLoc; use VariantContextUtils.getLocation(VariantContext vc) to get a genome loc.
- VariantContext.getSNPSubstitutionType is now in VariantContextUtils.
- This does not include the checked-in project files for Intellij; still running into issues with changes to the iml files being marked as changes by SVN
I'll send out an email to GSAMembers with some more details.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3954 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-05 18:47:53 +00:00
fromer
b21f90aee0
Added preliminary framework for performing short-range phasing (ReadBackedPhasingWalker.java)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3953 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-05 14:56:34 +00:00
rpoplin
a8d37da10b
Checking in everyone's changes to the variant recalibrator. We now calculate the variant quality score as a LOD score between the true and false hypothesis. Allele Count prior is changed to be (1 - 0.5^ac). Known prior breaks out HapMap sites
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3952 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-05 14:12:19 +00:00
ebanks
07addf1187
Fix for Kiran: since the Variant Annotator will re-annotate on top of existing annotations it makes sense to remove old headers if they conflict with the definitions being added by VA.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3951 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-05 06:44:39 +00:00
ebanks
1539791a04
Fix for Kiran: when using VCFs for the comp tracks in the Annotator(s), don't put the headers from them into the output VCF.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3950 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-05 04:45:47 +00:00
ebanks
227c4b10f0
Bug fix for Chris: convert comp tracks to VC so that we can respect the filter field. Added an integration test to cover this.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3949 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-05 04:13:16 +00:00
ebanks
84ca2f27bb
Bug fix for Chris: added method createPotentiallyInvalidGenomeLoc() to the GenomeLocParser that doesn't check that the contig exists in the sequence dictionary. This is crucial for lifting over from one reference to another, as sometimes contigs names change in the liftover (e.g. chrM to MT).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3948 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-05 03:19:02 +00:00
ebanks
f247cbf68e
I want to be the first to use the new super-cool Hidden annotation! No more telling people not to use the cleaner debugging options.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3947 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-05 02:44:37 +00:00
hanna
78bfe6ac48
Added @Hidden annotation, a way to deliberately exclude experimental fields and
...
walkers from the help system.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3946 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-05 02:26:46 +00:00
chartl
82d6c5073b
A simple read strand filter for potluri on get satisfaction
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3945 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-04 23:23:50 +00:00
asivache
d53d5ffbf6
A utility class that computes running average and standard deviation for a stream of numbers it is being fed with. Updates mean/stddev on the fly and does not cache the observations, so it uses no memory and also should be stable against overflow/loss of precision. Simple unit test is also provided (does *not* stress-test the engine with millions of numbers though).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3944 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-04 21:39:02 +00:00
ebanks
8d8acc9fae
Moving G's MyHapScore to replace the old HapScore
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3943 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-04 21:00:54 +00:00
ebanks
7858ffec32
Spit out the error in the warning message so that Sendu can tell me what his problem is
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3942 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-04 20:40:28 +00:00
chartl
5815348ebc
Switch to newer version of comp tracks (and make the trigger track a comp as well). Indel cleaning should override the interval list and only use the contig interval list; and also force jobs to go to long.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3941 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-04 20:05:27 +00:00
delangel
86211b74e8
Bug fix: when padding alleles in creating a Variant context from an indel, leave no-call alleles as no-call alleles.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3940 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-04 19:51:10 +00:00
chartl
38e65f6e1b
Added: A VariantEval module that gives simple metrics by sample, an an abstract class that makes per-sample modules easy to write (but a little bit clunky since a class needs be defined for each data point -- see SimpleMetricsBySample as an example). AnalysisModuleScanner needed a slight update to pull in data points from parent classes for this to work (thanks Khalid for showing me how to do this). After a code review with Aaron (thanks) and ensuring integration tests pass, I am committing.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3939 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-04 19:37:39 +00:00
hanna
f13d52e427
Attempt to determine whether underlying filesystem supports file locking and
...
disable on-the-fly dict and fai generation.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3938 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-04 19:28:27 +00:00
kiran
1a36cb9296
Can now set the maximum number of variants to see in a cluster plot (useful when you don't need to see a billion points to get an idea of what's going on. Limit applies to known and novel variants separately.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3937 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-04 18:45:24 +00:00
kiran
bd27287fe7
An R module that takes in a Variant Recalibration cluster file (file with '@!CLUSTER' lines in it), a tabularized VCF, and optionally a set of loci that should be examined more carefully, and emits a tremendous number of plots. For every annotation used in clustering, the distributions and pair-wise comparison (with ellipses denoting the 2-sigma cluster boundaries) are shown. Each cluster is shaded with a color proportional to its mixture coefficient.
...
To use this module, you'll first have to take your VCF and create an R-readable table out of it with the following command:
python /path/to/Sting/trunk/python/vcf2table.py -f CHROM,POS,ID,AC,AF,AN,DB,DP,HRun,MQ,MQ0,MyHaplotypeScore,QD,SB my.vcf > my.vcf.table
Then, simply invoke this module with the command:
Rscript /path/to/Sting/trunk/R/VariantRecalibratorReport/VariantRecalibratorReport.R /path/to/output/prefix /path/to/my/my.clusters /path/to/my.vcf.table [/path/to/my.suspicious.loci]
This will create a number of plots all with the prefix "/path/to/output/prefix". For instance, if you used QD, SB, HRun, and MyHaplotypeScore annotations during clustering, you should see output like this:
/path/to/output/prefix.anndist.HRun.pdf
/path/to/output/prefix.anndist.MyHaplotypeScore.pdf
/path/to/output/prefix.anndist.QD.pdf
/path/to/output/prefix.anndist.SB.pdf
/path/to/output/prefix.cluster.HRun_vs_MyHaplotypeScore.pdf
/path/to/output/prefix.cluster.HRun_vs_QD.pdf
/path/to/output/prefix.cluster.HRun_vs_SB.pdf
/path/to/output/prefix.cluster.MyHaplotypeScore_vs_HRun.pdf
/path/to/output/prefix.cluster.MyHaplotypeScore_vs_QD.pdf
/path/to/output/prefix.cluster.MyHaplotypeScore_vs_SB.pdf
/path/to/output/prefix.cluster.QD_vs_HRun.pdf
/path/to/output/prefix.cluster.QD_vs_MyHaplotypeScore.pdf
/path/to/output/prefix.cluster.QD_vs_SB.pdf
/path/to/output/prefix.cluster.SB_vs_HRun.pdf
/path/to/output/prefix.cluster.SB_vs_MyHaplotypeScore.pdf
/path/to/output/prefix.cluster.SB_vs_QD.pdf
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3936 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-04 18:35:14 +00:00
ebanks
340bd0e2c1
Removed hard-coded pointers to references
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3934 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-04 17:59:37 +00:00