aaron
b5f6f54968
Almost done removing any trace of the old Variation and Genotype interfaces.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3202 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-20 14:52:15 +00:00
hanna
1bc26f69e9
An attempt to cleanup the Utils directory. Email to follow.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3198 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-19 23:00:08 +00:00
ebanks
d73c63a99a
Redoing the conversion to VariantContext: instead of walkers passing in a ref allele, they pass in the ref context and the adaptors create the allele. This is the right way of doing it.
...
Also, adding some more useful integration tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3194 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-19 05:47:17 +00:00
aaron
e11ca74eb5
removing some outdated ROD classes (PooledEMSNPROD and SangerSNPROD), removing an out-of-date interface (VariantBackedByBenotype), and moving AnalyzeAnnotationWalker over to VariationContext.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3188 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-16 18:59:29 +00:00
ebanks
d5e5589b8f
No longer used
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3187 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-16 17:57:39 +00:00
ebanks
e702bea99f
Moving VE2 to core; calling it "VariantEval" (one more checkin coming)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3179 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-15 20:25:47 +00:00
ebanks
ac9dc0b4b4
Removing VariantEval (v1); everyone should be using VE2 now. Docs coming ASAP.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3177 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-15 19:53:02 +00:00
ebanks
3330e254a9
Standardize the dbsnp track name in preparation for case-sensitivity
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3176 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-15 19:41:57 +00:00
ebanks
5f7564bf0a
Better naming of output columns
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3175 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-15 18:08:07 +00:00
ebanks
04909fa6ad
Removing arbitrary selects
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3169 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-14 17:46:39 +00:00
ebanks
f1189bac5a
Bug fix: final map call wasn't being triggered (because we returned when ref==null before applying update0)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3168 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-14 16:58:55 +00:00
ebanks
dde092fb61
Added the ability in VE2 to select which eval modules to run, so that you aren't forced to use all of them. You can use --list to list all of the possible modules to run.
...
Heads up everyone: by default, *no* modules are run. Please add "-all" to your scripts to maintain the previous behavior.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3161 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-13 22:15:58 +00:00
ebanks
0b575596f8
Fix for concordance: samples found only in truth no longer kill it.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3160 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-13 21:33:49 +00:00
rpoplin
c2a37e4b5c
Variant Quality Score modules in VariantEval2 no longer create huge lists which hold all of the quality scores encountered and instead cast the quality score to an integer and use hash tables. Bug fix for files in which all the quality scores are set to -1.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3146 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-09 18:36:06 +00:00
chartl
7025f5b51d
Added an auxiliary table to DepthOfCoverage, which is the cumulative equivalent of the locus table (got tired of doing the calculation by hand). Also took care of a trailing tab in the per-locus output table.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3138 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-08 19:37:17 +00:00
aaron
20cc2a85a4
removed the hashmap from Genotype Concordance, moved it into a table
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3133 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-07 21:24:48 +00:00
aaron
e55f27b3b1
forgot a file
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3132 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-07 20:51:13 +00:00
depristo
918b746798
More detailed validation output. Fixes for genotyping overflow -- these are temporary and need to be properly resolved
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3129 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-07 16:38:28 +00:00
rpoplin
7b44e6bd55
ApplyVariantClusters now outputs interesting threshold points based on hitting the target novel TiTv
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3126 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-06 19:47:29 +00:00
rpoplin
60c227d67f
Added new VE2 module to create a plot of titv ratio by variant quality score
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3125 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-06 15:19:27 +00:00
rpoplin
2d002c56c3
Added histogram of variant quality scores broken out by true positive and false positive calls to the GenotypeConcordance module of VariantEval2
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3123 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-06 13:48:31 +00:00
chartl
f7d1b8f5de
CoverageStatistics has now replaced DepthOfCoverage -- old DoC is in the archive.
...
Also, I can't be bothered to fix the spelling of "oldepthofcoverage" to contain the necessary number of D's. Be content that it does, however, contain the requisite number of O's.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3109 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-01 16:27:23 +00:00
aaron
585cc880a2
changed jexl expressions to jexl names in the VariantEval2 output, fixed integration test, and fixed a problem where a line was getting dropped in CSV output
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3108 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-01 16:23:14 +00:00
aaron
3d3d19a6a7
the last-mile commit for Tribble integration. The system is now ready for Tribble to be turned on, as soon as we've removed any dependencies in the ROD code on interfaces that aren't in the Tribble library (i.e. the Variation or Genotype interface on RODs). All of the walkers should be up to date.
...
a caveat: for anyone asking for all of the ROD's back from the RefMetaDataTracker (if your not using the facilities to get the track by name), you'll now be getting back a collection of GATKFeature objects. This object will contain the track name, and a method for getting the underlying object (getUnderlyingObject()), which will be the traditional RodVCF, rodDbSNP, etc. This layer is needed so we can integrate Tribble tracks (which don't natively have names). Calls that ask for RODs by name will still get back the traditional reference ordered data objects (RodVCF, rodDbSNP, etc).
Sorry for the inconvenience! More changes to come, but this is by far the largest (as has the greatest effect on end users).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3104 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-31 22:39:56 +00:00
chartl
dc802aa26f
Moved CoverageStatistics to core. This will be (soon) renamed DepthOfCoverage; so please use CoverageStatistics
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3090 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-29 13:32:00 +00:00
aaron
074ec77dcc
First go of the new output system for VE2. There are three different report types supported right now (Table, Grep, CSV), which can be
...
specified with the reportType command line option in VE2.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3083 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-27 03:59:32 +00:00
ebanks
2373a4618f
bug caused by a misprint: context != contexts
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3073 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-25 03:08:24 +00:00
aaron
60dfba997b
added some sample annotations to VariantEval2 analysis modules, and some changes to the report system.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3067 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-24 05:40:10 +00:00
depristo
076d21d394
Minor bug workaround in GenotypeConcordance module (see todo). General platform read filter. You can say -rl Platform illumina to remove all SLX reads
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3054 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-22 02:47:09 +00:00
depristo
7b17bcd0af
Refactoring a few useful routines for detecting mendelian violations
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3043 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-19 13:19:01 +00:00
ebanks
b8e8852b4f
Better interface for the Annotator in how it interacts with VariantContext.
...
Also, added a proof of concept genotype-level annotation (not working yet, almost there).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3035 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-18 20:41:57 +00:00
ebanks
ee0e833616
Some significant changes to the annotator:
...
1. Annotations can now be "decorated" with any arbitrary interface description - not just standard or experimental.
2. Users can now not only specify specific annotations to use, but also the interface names from #1 . Any number of them can be specified, e.g. -G Standard -G Experimental -A RankSumTest.
3. These same arguments can be used with the Unified Genotyper for when it calls into the Annotator.
4. There are now two types of annotations: those that are applied to the INFO field and those that are applied to specific genotypes (the FORMAT field) in the VCF (however, I haven't implemented any of these latter annotations just yet; coming soon).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3029 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-18 05:38:32 +00:00
ebanks
e367a50e9b
Added genotype concordance module. Not at all finished, but needed to give something to Aaron to look at for help in printing the output nicely.
...
Also misc cleanup and fixes (e.g. perform evalulation even when no comp tracks are provided).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2996 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-14 19:02:24 +00:00
depristo
b39b5edca8
Bug fix in variant eval 2. Preliminary (slow and buggy) support for -XL exclude lists.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2991 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-12 19:23:12 +00:00
depristo
18ba9929f9
notes for eric
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2983 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-11 20:34:54 +00:00
depristo
4f4555c80f
PPV and Sensitivity added to validation tool output; support for arbitrary -sample arguments to subset variant contexts by sample
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2978 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-10 22:28:31 +00:00
depristo
486bef9318
Support for validationRate calculation in variant eval 2; better error messages for failed genome loc parsing; tolerance to odd whitespace in plinkrod, and fix for monomorphic sites in vcf2variantcontext.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2976 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-10 16:25:16 +00:00
chartl
0a49dffa8f
Row/Column names are now R-friendly
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2966 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-09 19:01:03 +00:00
ebanks
9f3b99c11b
Moving UnifiedGenotyper and VariantAnnotator over to VariantContext system.
...
Removing obsolete genotyping classes.
First stage of removing dependence on old Genotype class.
More changes to come.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2960 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-09 03:41:07 +00:00
chartl
21bf8b4b93
Odd, what I saw on IntelliJ hadn't saved to sting before committing. Here's the actual change.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2956 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-08 15:54:41 +00:00
chartl
cc6a714c09
Handle excess coverage in interval output
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2954 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-08 14:40:05 +00:00
chartl
037ac9c9af
Actually calculate base counts by read group when "both" is specified. Modified integration test to cement the now-correct "both" behavior.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2941 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-05 18:31:48 +00:00
chartl
8738c544f1
Minor refactoring of CoverageStatistics to allow simultaneous output of per-sample and per-read group statistics.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2940 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-05 17:06:52 +00:00
depristo
33cefddf55
Better INFO field annotation for Mendel violations
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2937 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-05 15:22:04 +00:00
chartl
706d49d84c
Commit for Aaron
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2932 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-04 21:29:07 +00:00
ebanks
0dd65461a1
Various improvements to plink, variant context, and VCF code.
...
We almost completely support indels. Not yet done with plink stuff.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2926 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-04 17:58:01 +00:00
chartl
6759acbdef
Coverage statistics now fully implements DepthOfCoverage functionality, including the ability to print base counts. Minor changes to BaseUtils to support 'N' and 'D' characters. PickSequenomProbes now has the option to not print the whole window as part of the probe name (e.g. you just see PROJECT_NAME|CHR_POS and not PROJECT_NAME|CHR_POS_CHR_PROBESTART-PROBEND). Full integration tests for CoverageStatistics are forthcoming.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2924 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-04 15:00:02 +00:00
aaron
790d2a7776
adding the initial ROD for Reads support; more convenience methods in ReadMetaDataTracker to come.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2918 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-03 15:56:44 +00:00
chartl
cfff486338
This commit is for Kiran
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2898 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-26 18:18:38 +00:00
chartl
87f8fb7282
Quick commit in advance of Aaron's. Just a bunch of refactoring (private classes separated out, put in proper package). Also support added for coverage by read group rather than sample.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2897 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-26 16:39:47 +00:00