Commit Graph

3966 Commits (3ff6e3404e19d7bf7136ef78c7d05ecc41f4d026)

Author SHA1 Message Date
ebanks 3ff6e3404e Alleles are now returned in a consistent order, so we can deal with tri-allelic sites
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4002 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-10 15:21:10 +00:00
depristo 67063deb16 Removed coloring by mixture weight. Each cluster gets a distinct color, and the legend indicates which cluster has which id and its weight
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4001 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-10 14:28:24 +00:00
depristo 672bee295c now plots tranches separately from optimizer
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4000 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-10 12:02:52 +00:00
depristo cd2d051209 full path to Rscript
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3999 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-10 12:02:38 +00:00
depristo 9b432d0801 1kg script now works
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3998 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-10 12:01:18 +00:00
aaron d514c424fd adding tests for BTI in the ROD validation tests
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3997 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-10 06:05:40 +00:00
ebanks ca5b274f16 Unit, integration, and performance tests are all busted, so this is a good time to make a big commit...
Major cleanup of the genotype writer code from the calling end.  UG no longer supports making calls in anything but VCF, and that allows us to use the VCFWriter more generically now.  Putting the ball in Matt's court to finish collapsing everything.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3996 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-10 04:18:29 +00:00
aaron 0f78f70ed4 fix for feature source in Tribble; we need to check that the record coming back isn't null. Also in the GATK added code to set the default logging level in integration tests to WARN, with the default level change they were spewing a bunch of text.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3995 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-10 02:57:23 +00:00
ebanks 419a36f74c Starting the clean up of the sting.utils.genotype code which is all either moving to Tribble, moving to sting.utils.vcf, or being removed.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3994 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-10 02:16:05 +00:00
depristo 2a4a4b0aab VariantRecalibrator now calls plot_Tranches directly so it works on the farm
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3993 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-09 23:17:16 +00:00
depristo c2c0c1f57c Removing used --enable_overlap_filters argument; Eric assures me this won't break the currently broken tests
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3992 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-09 22:27:13 +00:00
aaron 0f29f2ae3f fixes for the Tree index, and some small clean-up in the GATK.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3991 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-09 20:41:50 +00:00
rpoplin 3eee3183fd Checking in the tiger team changes. LOD calculation modified. -qScale is back in case people need it.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3990 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-09 20:41:03 +00:00
ebanks 0eeb659aa3 Useful utility function to print out the Allele as a String since toString prints out * for refs. It was annoying to keep seeing new String(Allele.getBases()).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3989 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-09 20:35:56 +00:00
chartl d0ecb8875a Added - a class to count functional annotations by sample (currently for the MAF annotation strings, soon to be migrated to genomic annotator once it is up and running)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3988 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-09 20:09:13 +00:00
aaron 5b0b9e79ba protect against nulls
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3987 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-09 19:21:39 +00:00
depristo 8944800f60 Minor refactoring for Ryan
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3986 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-09 18:05:23 +00:00
kshakir c66e93d86e Fixed a problem when mixing queue with other targets, such as 'ant clean oneoffs queue' and the STING_BUILD_TYPE environment is set.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3985 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-09 17:59:51 +00:00
kshakir 4f51a02dea Changed logging level to default at INFO instead of WARN.
Changes to StingUtils command line for use in Queue, replacing Queue's use of property files.
Updates to walkers used in existing QScripts to add @Input/@Output.
RMD used in @Required/@Allows now has a new default equal to "any" type.
New QueueGATKExtensions.jar generator for auto wrapping walkers as Queue CommandLineFunctions.
Added hooks to modify the functions that perform the Scattering and Gathering (setting their jar files, other arguments, etc.)
Removed dependency on BroadCore by porting LSF job submitter to scala.
Ivy now pulls down module dependencies from maven.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3984 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-09 16:42:48 +00:00
aaron 30178c05c5 providing a way to specify how you'd like -BTI combined with your -L options; set BTIMR to either UNION (default) or INTERSECTION.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3983 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-09 14:00:52 +00:00
aaron f3883585d0 removing the build lines I inadvertently committed. As a note: these are the lines you need to add if you want to debug tests, just don't check them in.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3982 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-09 13:48:16 +00:00
hanna 6b4a1e3b9f Reenabling code that was commented out after it was confirmed to work by many participating in this thread:
http://getsatisfaction.com/gsa/topics/error_thrown_when_reading_reference_file


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3981 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-09 00:12:09 +00:00
kiran 48e311a5ea Added copyright notice.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3980 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-08 07:11:51 +00:00
kiran 9aa70d9c7c Replaced by SelectVariants
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3979 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-08 07:07:42 +00:00
kiran 758ab428f5 Better logging info for the samples being selected and the sample expressions being ignored.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3978 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-08 07:03:37 +00:00
kiran e242a8f143 Put single quotes around the regex. This isn't strictly necessary through the integration test machinery, but *is* necessary at the console, and it's convenient to be able to cut and paste this.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3977 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-08 05:56:57 +00:00
kiran 13f29660bb Integration test for SelectVariants. Tests a complex case with an explicit sample selection, sample selection by regex, exclusion of non-variant and filtered loci, and JEXL selection on low allele-frequency variants
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3976 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-08 05:49:47 +00:00
ebanks 637a1e5055 Updating to use the new VA interface
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3975 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-08 05:31:01 +00:00
ebanks bd6d5a8d51 Adding command-line header to VA and VF
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3974 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-08 05:21:15 +00:00
kiran 64446f0ddf Avoid NaNs in the final output.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3973 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-08 05:16:52 +00:00
ebanks 3f6e44dc71 Updated recalibrator and cleaner to output full command-lines in the bam header
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3972 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-08 04:39:18 +00:00
kiran 0da0dfa1da Cosmetic change - lower-case for all command-line arguments' short names.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3971 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-08 04:12:01 +00:00
kiran eb1bb94d1c Moved the evaluation of the JEXL expressions to a point *after* the samples are subset and the INFO-field annotations are updated. I think this makes more sense than having the evaluations happen beforehand, since it seems jarring to have the JEXL expressions operate on the annotations before they're updated, and have the file contain the annotations after they're updated. Now, selecting on something like allele frequency will actually apply to the annotations that actually end up in the file, while selection on other annotations (which are carried over without modification) will act exactly the same regardless.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3970 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-08 04:09:02 +00:00
ebanks 594b7912f1 Added a generic method for returning the complete command-line used when calling a walker, to be used in the bam/vcf headers. As requested, every possible engine/walker argument is included. I've added it to the Unified Genotyper output, so people can try it out and let me know what they think. Something that needs to be discussed in group meeting: what happens when we merge VCFs? Do we keep all of the command-lines?
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3969 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-08 03:53:07 +00:00
kiran 6e389059cf An improved version of VariantSubset and VariantSelect, meant to replace those walkers. Takes in a VCF and creates a subsetted VCF by sample(s), JEXL expressions, or both.
When subsetting by sample, the -SN argument is treated as a literal sample name and, if no match is found, as a regular expression.  This allows for a large number of samples to be selected at once (useful when, for instance, cases are given one sample name prefix and controls are given another).

After the subsetting procedure, the INFO-field annotations AC, AN, AF, and DP are all recalculated to properly reflect the new contents of the VCF.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3968 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-08 02:57:06 +00:00
depristo 41fee2d75e Publication tranches report is now the default output
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3967 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-07 13:58:59 +00:00
depristo f4ffef4479 Default max variants is now 5000
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3966 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-07 13:58:32 +00:00
depristo 80e31df40d Useful script to see the status of gsa computing resources. Crontab'd and will be arriving as email at 8 am
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3965 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-07 12:36:28 +00:00
depristo b63d64bbbc Beautiful labels, better choice of dimension ranges. Supports fast loading of just first N records for testing.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3964 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-06 23:17:32 +00:00
depristo d3bebe0f2c Reasonable comment
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3963 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-06 22:03:55 +00:00
ebanks ac4699a650 Re-enabling this test
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3962 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-06 20:20:37 +00:00
depristo bb5dfd7e5e Slightly nicer plotting; not yet complete
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3961 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-06 20:01:31 +00:00
depristo f275041b1c -minimalVCF for CombineVariants. Work around for broken locking code.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3960 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-06 16:10:59 +00:00
depristo 669d9096e3 now support -o output option, useful for pipelines
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3959 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-06 14:57:04 +00:00
aaron 9076c0b28b removing unused code
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3958 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-06 14:24:39 +00:00
depristo 70f492a6e8 Prints out trivial debugging info
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3957 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-06 13:24:21 +00:00
ebanks 341e752c6c 1) AlleleBalance is no longer a standard annotation, but the Allelic Depth (AD) is for each sample.
2) Small fixes in the VCFWriter:
a) Trailing missing values weren't being removed if their count was > 1 (e.g. ".,.")
b) We were handling key values that were Lists, but not Arrays.  We now handle both.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3956 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-06 12:05:14 +00:00
aaron c68625f055 Fixes from Mark for the MutableContexts; this fixes the clearGenotypes() and the clearFilters() methods, and adds a method to clear the attributes. Also added is a method for creating a variant context where the attribute list is pruned to a specific subset, which can be null.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3955 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-05 22:39:51 +00:00
aaron 72ae81c6de VariantContext has now moved over to Tribble, and the VCF4 parser is now the only VCF parser in town. Other changes include:
- Tribble is included directly in the GATK repo; those who have access to commit to Tribble can now directly commit from the GATK directory from Intellij; command line users can commit from 
inside the tribble directory.
- Hapmap ROD now in Tribble; all mentions have been switched over.
- VariantContext does not know about GenomeLoc; use VariantContextUtils.getLocation(VariantContext vc) to get a genome loc.
- VariantContext.getSNPSubstitutionType is now in VariantContextUtils.
- This does not include the checked-in project files for Intellij; still running into issues with changes to the iml files being marked as changes by SVN

I'll send out an email to GSAMembers with some more details.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3954 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-05 18:47:53 +00:00
fromer b21f90aee0 Added preliminary framework for performing short-range phasing (ReadBackedPhasingWalker.java)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3953 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-05 14:56:34 +00:00