kshakir
f39dce1082
Exposed CommandLineFunction defaults to the Queue.jar command line (see -help).
...
Added ability to skip up-to-date jobs where the outputs are older than the inputs.
Changed -T CountDuplicates --quiet to --quietLocus so that Queue GATK extensions can use both short and full argument names.
Short names can be used to set values on Queue GATK extensions, for example: vf.XL :+= myFile
Moved Hidden from the GATK to StingUtils.
Updated ivy from 2.0.0 to 2.2.0-rc1 to fix sha1 issue: http://bit.ly/aX72w7
Added Queue to javadoc and testing build targets.
Added first Queue unit test.
Another pass at avoiding cycles in the DAG thanks to all function I/O being files.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4017 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-11 21:58:26 +00:00
chartl
8c08f47923
1) Make sure that the table size is set correctly in finalize()
...
2) Make sure variants are biallelic before asking for isTransversion()
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4016 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-11 20:32:22 +00:00
hanna
41d57b7139
Massive cleanup of read filtering.
...
- Eliminate reduncancy of filter application.
- Track filter metrics per-shard to facitate per merging.
- Flatten counting iterator hierarchy for easier debugging.
- Rename Reads class to ReadProperties and track it outside of the Sting iterators.
Note: because shards are currently tied so closely to reads and not the merged triplet of <reads,ref,RODs>, the metrics
classes are managed by the SAMDataSource when they should be managed by something more general. For now, we're hacking
the reads data source to manage the metrics; in the future, something more general should manage the metrics classes.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4015 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-11 20:17:11 +00:00
ebanks
7385cce494
Useful tool for calculating the perentage of misaligned reads at homozygous non-ref indel sites
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4013 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-11 17:57:44 +00:00
ebanks
cc9e6b4ad9
Moved into Tribble to be with VC
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4012 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-11 17:14:32 +00:00
aaron
14e492fa80
fix for a problem in readNextRecord() of BFS, where we'd go looking for the next record far into in the next contig because (f.getEnd() >= start) was never true once we cycled to a new conitg. Added a check for contig identity. Also, removed duplicate HW calculation classes in the GATK and Tribble.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4011 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-11 17:01:38 +00:00
flannick
cd4cd6db81
Added option to print out discordant sites in GenotypeConcordance
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4006 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-10 19:55:19 +00:00
flannick
18fc5c8c3e
Initial implementation of annotator to compute allele balance for each sample
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4005 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-10 19:40:17 +00:00
flannick
1dc373b9d0
Initial implementation of evaluator to compute popgen theta statistics
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4004 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-10 19:36:34 +00:00
aaron
0a8ebcb4f9
moving tests over from the GATK to Tribble, and added a speed-up to the readNextRecord() that Mark suggested. Also removed the contained flag from the queries to Tribble in the GATK.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4003 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-10 17:54:59 +00:00
ebanks
3ff6e3404e
Alleles are now returned in a consistent order, so we can deal with tri-allelic sites
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4002 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-10 15:21:10 +00:00
ebanks
ca5b274f16
Unit, integration, and performance tests are all busted, so this is a good time to make a big commit...
...
Major cleanup of the genotype writer code from the calling end. UG no longer supports making calls in anything but VCF, and that allows us to use the VCFWriter more generically now. Putting the ball in Matt's court to finish collapsing everything.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3996 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-10 04:18:29 +00:00
ebanks
419a36f74c
Starting the clean up of the sting.utils.genotype code which is all either moving to Tribble, moving to sting.utils.vcf, or being removed.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3994 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-10 02:16:05 +00:00
depristo
2a4a4b0aab
VariantRecalibrator now calls plot_Tranches directly so it works on the farm
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3993 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-09 23:17:16 +00:00
depristo
c2c0c1f57c
Removing used --enable_overlap_filters argument; Eric assures me this won't break the currently broken tests
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3992 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-09 22:27:13 +00:00
aaron
0f29f2ae3f
fixes for the Tree index, and some small clean-up in the GATK.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3991 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-09 20:41:50 +00:00
rpoplin
3eee3183fd
Checking in the tiger team changes. LOD calculation modified. -qScale is back in case people need it.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3990 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-09 20:41:03 +00:00
ebanks
0eeb659aa3
Useful utility function to print out the Allele as a String since toString prints out * for refs. It was annoying to keep seeing new String(Allele.getBases()).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3989 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-09 20:35:56 +00:00
chartl
d0ecb8875a
Added - a class to count functional annotations by sample (currently for the MAF annotation strings, soon to be migrated to genomic annotator once it is up and running)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3988 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-09 20:09:13 +00:00
aaron
5b0b9e79ba
protect against nulls
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3987 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-09 19:21:39 +00:00
depristo
8944800f60
Minor refactoring for Ryan
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3986 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-09 18:05:23 +00:00
kshakir
4f51a02dea
Changed logging level to default at INFO instead of WARN.
...
Changes to StingUtils command line for use in Queue, replacing Queue's use of property files.
Updates to walkers used in existing QScripts to add @Input/@Output.
RMD used in @Required/@Allows now has a new default equal to "any" type.
New QueueGATKExtensions.jar generator for auto wrapping walkers as Queue CommandLineFunctions.
Added hooks to modify the functions that perform the Scattering and Gathering (setting their jar files, other arguments, etc.)
Removed dependency on BroadCore by porting LSF job submitter to scala.
Ivy now pulls down module dependencies from maven.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3984 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-09 16:42:48 +00:00
aaron
30178c05c5
providing a way to specify how you'd like -BTI combined with your -L options; set BTIMR to either UNION (default) or INTERSECTION.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3983 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-09 14:00:52 +00:00
hanna
6b4a1e3b9f
Reenabling code that was commented out after it was confirmed to work by many participating in this thread:
...
http://getsatisfaction.com/gsa/topics/error_thrown_when_reading_reference_file
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3981 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-09 00:12:09 +00:00
kiran
48e311a5ea
Added copyright notice.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3980 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-08 07:11:51 +00:00
kiran
9aa70d9c7c
Replaced by SelectVariants
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3979 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-08 07:07:42 +00:00
kiran
758ab428f5
Better logging info for the samples being selected and the sample expressions being ignored.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3978 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-08 07:03:37 +00:00
ebanks
637a1e5055
Updating to use the new VA interface
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3975 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-08 05:31:01 +00:00
ebanks
bd6d5a8d51
Adding command-line header to VA and VF
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3974 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-08 05:21:15 +00:00
kiran
64446f0ddf
Avoid NaNs in the final output.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3973 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-08 05:16:52 +00:00
ebanks
3f6e44dc71
Updated recalibrator and cleaner to output full command-lines in the bam header
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3972 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-08 04:39:18 +00:00
kiran
0da0dfa1da
Cosmetic change - lower-case for all command-line arguments' short names.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3971 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-08 04:12:01 +00:00
kiran
eb1bb94d1c
Moved the evaluation of the JEXL expressions to a point *after* the samples are subset and the INFO-field annotations are updated. I think this makes more sense than having the evaluations happen beforehand, since it seems jarring to have the JEXL expressions operate on the annotations before they're updated, and have the file contain the annotations after they're updated. Now, selecting on something like allele frequency will actually apply to the annotations that actually end up in the file, while selection on other annotations (which are carried over without modification) will act exactly the same regardless.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3970 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-08 04:09:02 +00:00
ebanks
594b7912f1
Added a generic method for returning the complete command-line used when calling a walker, to be used in the bam/vcf headers. As requested, every possible engine/walker argument is included. I've added it to the Unified Genotyper output, so people can try it out and let me know what they think. Something that needs to be discussed in group meeting: what happens when we merge VCFs? Do we keep all of the command-lines?
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3969 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-08 03:53:07 +00:00
kiran
6e389059cf
An improved version of VariantSubset and VariantSelect, meant to replace those walkers. Takes in a VCF and creates a subsetted VCF by sample(s), JEXL expressions, or both.
...
When subsetting by sample, the -SN argument is treated as a literal sample name and, if no match is found, as a regular expression. This allows for a large number of samples to be selected at once (useful when, for instance, cases are given one sample name prefix and controls are given another).
After the subsetting procedure, the INFO-field annotations AC, AN, AF, and DP are all recalculated to properly reflect the new contents of the VCF.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3968 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-08 02:57:06 +00:00
ebanks
ac4699a650
Re-enabling this test
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3962 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-06 20:20:37 +00:00
depristo
f275041b1c
-minimalVCF for CombineVariants. Work around for broken locking code.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3960 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-06 16:10:59 +00:00
ebanks
341e752c6c
1) AlleleBalance is no longer a standard annotation, but the Allelic Depth (AD) is for each sample.
...
2) Small fixes in the VCFWriter:
a) Trailing missing values weren't being removed if their count was > 1 (e.g. ".,.")
b) We were handling key values that were Lists, but not Arrays. We now handle both.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3956 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-06 12:05:14 +00:00
aaron
c68625f055
Fixes from Mark for the MutableContexts; this fixes the clearGenotypes() and the clearFilters() methods, and adds a method to clear the attributes. Also added is a method for creating a variant context where the attribute list is pruned to a specific subset, which can be null.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3955 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-05 22:39:51 +00:00
aaron
72ae81c6de
VariantContext has now moved over to Tribble, and the VCF4 parser is now the only VCF parser in town. Other changes include:
...
- Tribble is included directly in the GATK repo; those who have access to commit to Tribble can now directly commit from the GATK directory from Intellij; command line users can commit from
inside the tribble directory.
- Hapmap ROD now in Tribble; all mentions have been switched over.
- VariantContext does not know about GenomeLoc; use VariantContextUtils.getLocation(VariantContext vc) to get a genome loc.
- VariantContext.getSNPSubstitutionType is now in VariantContextUtils.
- This does not include the checked-in project files for Intellij; still running into issues with changes to the iml files being marked as changes by SVN
I'll send out an email to GSAMembers with some more details.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3954 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-05 18:47:53 +00:00
fromer
b21f90aee0
Added preliminary framework for performing short-range phasing (ReadBackedPhasingWalker.java)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3953 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-05 14:56:34 +00:00
rpoplin
a8d37da10b
Checking in everyone's changes to the variant recalibrator. We now calculate the variant quality score as a LOD score between the true and false hypothesis. Allele Count prior is changed to be (1 - 0.5^ac). Known prior breaks out HapMap sites
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3952 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-05 14:12:19 +00:00
ebanks
07addf1187
Fix for Kiran: since the Variant Annotator will re-annotate on top of existing annotations it makes sense to remove old headers if they conflict with the definitions being added by VA.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3951 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-05 06:44:39 +00:00
ebanks
1539791a04
Fix for Kiran: when using VCFs for the comp tracks in the Annotator(s), don't put the headers from them into the output VCF.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3950 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-05 04:45:47 +00:00
ebanks
227c4b10f0
Bug fix for Chris: convert comp tracks to VC so that we can respect the filter field. Added an integration test to cover this.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3949 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-05 04:13:16 +00:00
ebanks
84ca2f27bb
Bug fix for Chris: added method createPotentiallyInvalidGenomeLoc() to the GenomeLocParser that doesn't check that the contig exists in the sequence dictionary. This is crucial for lifting over from one reference to another, as sometimes contigs names change in the liftover (e.g. chrM to MT).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3948 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-05 03:19:02 +00:00
ebanks
f247cbf68e
I want to be the first to use the new super-cool Hidden annotation! No more telling people not to use the cleaner debugging options.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3947 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-05 02:44:37 +00:00
hanna
78bfe6ac48
Added @Hidden annotation, a way to deliberately exclude experimental fields and
...
walkers from the help system.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3946 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-05 02:26:46 +00:00
chartl
82d6c5073b
A simple read strand filter for potluri on get satisfaction
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3945 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-04 23:23:50 +00:00
asivache
d53d5ffbf6
A utility class that computes running average and standard deviation for a stream of numbers it is being fed with. Updates mean/stddev on the fly and does not cache the observations, so it uses no memory and also should be stable against overflow/loss of precision. Simple unit test is also provided (does *not* stress-test the engine with millions of numbers though).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3944 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-04 21:39:02 +00:00