ebanks
e702bea99f
Moving VE2 to core; calling it "VariantEval" (one more checkin coming)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3179 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-15 20:25:47 +00:00
ebanks
ac9dc0b4b4
Removing VariantEval (v1); everyone should be using VE2 now. Docs coming ASAP.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3177 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-15 19:53:02 +00:00
ebanks
3330e254a9
Standardize the dbsnp track name in preparation for case-sensitivity
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3176 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-15 19:41:57 +00:00
ebanks
5f7564bf0a
Better naming of output columns
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3175 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-15 18:08:07 +00:00
ebanks
04909fa6ad
Removing arbitrary selects
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3169 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-14 17:46:39 +00:00
ebanks
f1189bac5a
Bug fix: final map call wasn't being triggered (because we returned when ref==null before applying update0)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3168 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-14 16:58:55 +00:00
ebanks
dde092fb61
Added the ability in VE2 to select which eval modules to run, so that you aren't forced to use all of them. You can use --list to list all of the possible modules to run.
...
Heads up everyone: by default, *no* modules are run. Please add "-all" to your scripts to maintain the previous behavior.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3161 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-13 22:15:58 +00:00
ebanks
0b575596f8
Fix for concordance: samples found only in truth no longer kill it.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3160 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-13 21:33:49 +00:00
rpoplin
c2a37e4b5c
Variant Quality Score modules in VariantEval2 no longer create huge lists which hold all of the quality scores encountered and instead cast the quality score to an integer and use hash tables. Bug fix for files in which all the quality scores are set to -1.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3146 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-09 18:36:06 +00:00
chartl
7025f5b51d
Added an auxiliary table to DepthOfCoverage, which is the cumulative equivalent of the locus table (got tired of doing the calculation by hand). Also took care of a trailing tab in the per-locus output table.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3138 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-08 19:37:17 +00:00
aaron
20cc2a85a4
removed the hashmap from Genotype Concordance, moved it into a table
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3133 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-07 21:24:48 +00:00
aaron
e55f27b3b1
forgot a file
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3132 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-07 20:51:13 +00:00
depristo
918b746798
More detailed validation output. Fixes for genotyping overflow -- these are temporary and need to be properly resolved
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3129 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-07 16:38:28 +00:00
rpoplin
7b44e6bd55
ApplyVariantClusters now outputs interesting threshold points based on hitting the target novel TiTv
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3126 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-06 19:47:29 +00:00
rpoplin
60c227d67f
Added new VE2 module to create a plot of titv ratio by variant quality score
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3125 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-06 15:19:27 +00:00
rpoplin
2d002c56c3
Added histogram of variant quality scores broken out by true positive and false positive calls to the GenotypeConcordance module of VariantEval2
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3123 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-06 13:48:31 +00:00
chartl
f7d1b8f5de
CoverageStatistics has now replaced DepthOfCoverage -- old DoC is in the archive.
...
Also, I can't be bothered to fix the spelling of "oldepthofcoverage" to contain the necessary number of D's. Be content that it does, however, contain the requisite number of O's.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3109 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-01 16:27:23 +00:00
aaron
585cc880a2
changed jexl expressions to jexl names in the VariantEval2 output, fixed integration test, and fixed a problem where a line was getting dropped in CSV output
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3108 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-01 16:23:14 +00:00
aaron
3d3d19a6a7
the last-mile commit for Tribble integration. The system is now ready for Tribble to be turned on, as soon as we've removed any dependencies in the ROD code on interfaces that aren't in the Tribble library (i.e. the Variation or Genotype interface on RODs). All of the walkers should be up to date.
...
a caveat: for anyone asking for all of the ROD's back from the RefMetaDataTracker (if your not using the facilities to get the track by name), you'll now be getting back a collection of GATKFeature objects. This object will contain the track name, and a method for getting the underlying object (getUnderlyingObject()), which will be the traditional RodVCF, rodDbSNP, etc. This layer is needed so we can integrate Tribble tracks (which don't natively have names). Calls that ask for RODs by name will still get back the traditional reference ordered data objects (RodVCF, rodDbSNP, etc).
Sorry for the inconvenience! More changes to come, but this is by far the largest (as has the greatest effect on end users).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3104 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-31 22:39:56 +00:00
chartl
dc802aa26f
Moved CoverageStatistics to core. This will be (soon) renamed DepthOfCoverage; so please use CoverageStatistics
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3090 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-29 13:32:00 +00:00
aaron
074ec77dcc
First go of the new output system for VE2. There are three different report types supported right now (Table, Grep, CSV), which can be
...
specified with the reportType command line option in VE2.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3083 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-27 03:59:32 +00:00
ebanks
2373a4618f
bug caused by a misprint: context != contexts
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3073 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-25 03:08:24 +00:00
aaron
60dfba997b
added some sample annotations to VariantEval2 analysis modules, and some changes to the report system.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3067 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-24 05:40:10 +00:00
depristo
076d21d394
Minor bug workaround in GenotypeConcordance module (see todo). General platform read filter. You can say -rl Platform illumina to remove all SLX reads
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3054 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-22 02:47:09 +00:00
depristo
7b17bcd0af
Refactoring a few useful routines for detecting mendelian violations
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3043 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-19 13:19:01 +00:00
ebanks
b8e8852b4f
Better interface for the Annotator in how it interacts with VariantContext.
...
Also, added a proof of concept genotype-level annotation (not working yet, almost there).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3035 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-18 20:41:57 +00:00
ebanks
ee0e833616
Some significant changes to the annotator:
...
1. Annotations can now be "decorated" with any arbitrary interface description - not just standard or experimental.
2. Users can now not only specify specific annotations to use, but also the interface names from #1 . Any number of them can be specified, e.g. -G Standard -G Experimental -A RankSumTest.
3. These same arguments can be used with the Unified Genotyper for when it calls into the Annotator.
4. There are now two types of annotations: those that are applied to the INFO field and those that are applied to specific genotypes (the FORMAT field) in the VCF (however, I haven't implemented any of these latter annotations just yet; coming soon).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3029 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-18 05:38:32 +00:00
ebanks
e367a50e9b
Added genotype concordance module. Not at all finished, but needed to give something to Aaron to look at for help in printing the output nicely.
...
Also misc cleanup and fixes (e.g. perform evalulation even when no comp tracks are provided).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2996 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-14 19:02:24 +00:00
depristo
b39b5edca8
Bug fix in variant eval 2. Preliminary (slow and buggy) support for -XL exclude lists.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2991 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-12 19:23:12 +00:00
depristo
18ba9929f9
notes for eric
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2983 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-11 20:34:54 +00:00
depristo
4f4555c80f
PPV and Sensitivity added to validation tool output; support for arbitrary -sample arguments to subset variant contexts by sample
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2978 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-10 22:28:31 +00:00
depristo
486bef9318
Support for validationRate calculation in variant eval 2; better error messages for failed genome loc parsing; tolerance to odd whitespace in plinkrod, and fix for monomorphic sites in vcf2variantcontext.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2976 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-10 16:25:16 +00:00
chartl
0a49dffa8f
Row/Column names are now R-friendly
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2966 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-09 19:01:03 +00:00
ebanks
9f3b99c11b
Moving UnifiedGenotyper and VariantAnnotator over to VariantContext system.
...
Removing obsolete genotyping classes.
First stage of removing dependence on old Genotype class.
More changes to come.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2960 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-09 03:41:07 +00:00
chartl
21bf8b4b93
Odd, what I saw on IntelliJ hadn't saved to sting before committing. Here's the actual change.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2956 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-08 15:54:41 +00:00
chartl
cc6a714c09
Handle excess coverage in interval output
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2954 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-08 14:40:05 +00:00
chartl
037ac9c9af
Actually calculate base counts by read group when "both" is specified. Modified integration test to cement the now-correct "both" behavior.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2941 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-05 18:31:48 +00:00
chartl
8738c544f1
Minor refactoring of CoverageStatistics to allow simultaneous output of per-sample and per-read group statistics.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2940 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-05 17:06:52 +00:00
depristo
33cefddf55
Better INFO field annotation for Mendel violations
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2937 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-05 15:22:04 +00:00
chartl
706d49d84c
Commit for Aaron
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2932 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-04 21:29:07 +00:00
ebanks
0dd65461a1
Various improvements to plink, variant context, and VCF code.
...
We almost completely support indels. Not yet done with plink stuff.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2926 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-04 17:58:01 +00:00
chartl
6759acbdef
Coverage statistics now fully implements DepthOfCoverage functionality, including the ability to print base counts. Minor changes to BaseUtils to support 'N' and 'D' characters. PickSequenomProbes now has the option to not print the whole window as part of the probe name (e.g. you just see PROJECT_NAME|CHR_POS and not PROJECT_NAME|CHR_POS_CHR_PROBESTART-PROBEND). Full integration tests for CoverageStatistics are forthcoming.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2924 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-04 15:00:02 +00:00
aaron
790d2a7776
adding the initial ROD for Reads support; more convenience methods in ReadMetaDataTracker to come.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2918 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-03 15:56:44 +00:00
chartl
cfff486338
This commit is for Kiran
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2898 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-26 18:18:38 +00:00
chartl
87f8fb7282
Quick commit in advance of Aaron's. Just a bunch of refactoring (private classes separated out, put in proper package). Also support added for coverage by read group rather than sample.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2897 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-26 16:39:47 +00:00
chartl
496ecc8186
Change in how overall coverage and means are stored in the DOCS object; change from keeping track of sample mean coverage to keeping track of sample total coverage (calculate means at the end)
...
This is a mid-way commit for Aaron
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2895 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-26 15:51:12 +00:00
depristo
9a6b384adb
Support for no qual fields in VCF; better support for Mendelian violation calculations
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2893 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-26 00:29:17 +00:00
aaron
246fa28386
RODs for reads phase 2: modified RODRecordList to implement List<ReferenceOrderedDatum> so I could stub it out for testing, added a FlashBackIterator which is needed to prevent the ResourcePool from opening infinity+1 iterators, and some other interfaces to make unit testing much smoother.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2892 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-25 22:48:55 +00:00
chartl
591102a841
Don't close the output stream if we're printing to stdout
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2891 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-25 21:50:58 +00:00
chartl
10cc71ceb0
Another midway commit for teh engineerz
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2890 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-25 21:24:02 +00:00
chartl
3d92e5a737
Initial commit of integration test(s) for CoverageStatistics, currently in progress [midway commit is for Matt]
...
Modifications to CoverageStatistics - now includes and extends much of the behavior of DepthOfCoverage (per-base output, per-target output).
Additional functionality (coverage without deletions, base counts, by read group instead of by sample) is upcoming.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2888 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-25 20:25:07 +00:00
aaron
fef1154fc8
starting on RODs for Reads: made RODRecordList implement list<RODatum> (so we can sub in fake lists during testing), and removed unnecessary generic-ness. Removed BrokenRODSimulator, which isn't being used.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2884 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-24 22:11:53 +00:00
chartl
5df37968de
Simplification of code segments; slight alteration to per-locus tabulation; added to-do items for cosmetic changes (mostly binning options and settigns)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2882 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-24 05:20:18 +00:00
chartl
1f673e9fab
Float the bins with the given lower bound
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2878 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-23 20:48:53 +00:00
chartl
119d449b46
Formatting changes
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2877 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-23 20:43:15 +00:00
chartl
173956927b
Summaries generated for firehose from DoC output have been migrated to its own walker to calculate aggregate coverage statistics in a parallelizable and fast way. This is an initial commit, bug-fixing and testing is upcoming.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2876 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-23 18:41:02 +00:00
chartl
0e05a3acb0
Adding depth of coverage features to firehose summary tools
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2860 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-19 19:47:16 +00:00
aaron
b1a4e6d840
removing non-ascii characters from my Copyright and from VariantEval2Walker
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2856 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-18 18:54:36 +00:00
depristo
a1a3d5fcb0
Support for reading in table of rsIDs -> dbSNP builds to back generate a dbSNP build X from a single file. Very useful indeed. dbSNP -> VC now captures the rsID in the context
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2837 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-12 22:40:55 +00:00
chartl
04a2784bf7
Initial commit of tools under development for data QC through firehose.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2834 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-12 19:13:24 +00:00
depristo
5f74fffa02
Massive improvements to VE2 infrastructure. Now supports VCF writing of interesting sites; multiple comp and eval tracks. Eric will be taking it over and expanding functionality over the next few weeks until it's ready to replace VE1
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2832 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-12 15:26:52 +00:00
depristo
c66861746a
improvements to ve2, including more meaningful mendelian violation counting. Support for VCF emitted interesting sites, annotated according to the evaluations themselves. Basic intergration test for VE2 started
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2819 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-10 16:12:29 +00:00
depristo
934d4b93a2
VariantContext to VCF converter. BeagleROD, and phasing of VCF calls. Integration tests galore :-)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2814 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-09 19:02:25 +00:00
depristo
94f892ad42
VCF->beagle and VCF phasing using beagle input. Appears to work fairly well. VariantContexts now support phased genotypes.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2812 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-09 01:22:05 +00:00
chartl
935e76daa1
Minor changes to oneoff walkers. PlinkRod altered but still commented.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2808 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-08 18:49:56 +00:00
depristo
3b1ab86d11
Added generic interfaces to RefMetaDataTracker to obtain VariantContext objects. More docs. Integration tests for VariantContexts using dbSNP and VCF. At this stage if you use dbSNP or VCF files only in your walkers, please move them over to the VariantContext, it's just nicer. If you've got RODs that implemented the old variation/genotype interfaces, and you want them to work in new walkers, please add an adaptor to VariantContextAdaptors in refdata package. It should be easy and will reduce burden in the long term when those interfaces are retired.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2803 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-06 16:26:06 +00:00
depristo
995d55da81
now uses the new RMDT getVariantContext() functions instead of doing the work itself.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2802 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-06 16:23:06 +00:00
depristo
af8c47fc2f
Fixing up testVariantContext for integration tests for variant context. Printing of VCs and genotypes now stable using sorting. Cleaned up comments in quality score by strand. RefMetaDataTracker now directly allows walkers to obtain VariantContexts using the simple Collection<VariantContext> getAllVariantContexts(GenomeLoc curLocation, EnumSet<VariantContext.Type> allowedTypes, boolean requireStartHere, boolean takeFirstOnly) function. VCF and dbSNP VariantContexts now officially supported. Other importan types can be added to the adapator system in refdata package. Integration tests later today
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2791 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-05 15:42:54 +00:00
depristo
c6d86da4b8
almost managed to move things around perfectly in move go
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2788 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-05 14:18:26 +00:00
depristo
e0af3bf761
updating back names
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2786 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-05 13:53:45 +00:00
depristo
777617b6c7
managed to actually move the files too! Damn you svn
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2785 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-05 13:47:19 +00:00
depristo
8938a4146d
moving varianteval2 to it's own dir
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2784 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-05 13:37:04 +00:00
depristo
69132c81aa
Documentation. Plus nicer structure to adaptors. Intermediate checkin before move into core
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2783 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-05 13:33:27 +00:00
depristo
1d86dd7fd1
Interface changes following Matt's advice. VariantContexts are now immutable, and there are special mutable versions, in case you need to change things. AttributedObject now a InferredGeneticContext and package protected. VariantContexts are now named, which makes them easier to use with the rod system
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2780 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-04 20:55:49 +00:00
asivache
152f65b362
Do not die in --cycleOnly mode when the lane is not paired end, just count all single end basequals into the first column and leave the second column filled with 0s
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2778 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-04 19:48:12 +00:00
asivache
a3cd56897d
moving older versions of the oneoff project to archive, bye-bye
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2777 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-04 19:46:27 +00:00
asivache
f7e7bcd2ef
Oneoff project, totally unrelated to anything
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2776 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-04 19:44:50 +00:00
hanna
334da80e8b
Fixed Mark's bad checkin.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2775 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-04 12:40:58 +00:00
depristo
1ce0f06216
temp checkin for reorganization
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2774 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-04 11:10:24 +00:00
depristo
c89ba7b1a4
improvements to variant eval 2. Now has titv calculations and mendelian violation detect support. we only make ~80 mendelian violations in 380K calls for the YRI trio, in case you are interested
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2768 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-03 16:03:19 +00:00
depristo
fa2cd432fd
better printing in VE2. Added support for TiTv analysis
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2766 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-02 21:20:29 +00:00
depristo
cbbc0e98d2
fix for broken imports
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2765 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-02 15:20:27 +00:00
depristo
681c196097
V2 of VariantEval2. Framework is essentially complete., very simple and clear now compared to VE1. Support for any number of JEXL expressions. dbSNP% evaluation added to show paired comparison evaluation. Pretty printing output tables. Performance is poor but can easily be fixed (see todo notes).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2764 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-02 14:18:46 +00:00
hanna
9dbdfff786
Moved VariantEval to core. Updated integration test md5s to reflect new Analysis class names.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2762 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-02 00:22:15 +00:00
asivache
4ddbaeed07
In attempt to reuse: --pairCountsOutput is now optional, if not specified then only per-locus statistics is collected; --silent - do not echo results into stdout; --minMapQ - count only bases coming from reads mapped with specified quality or better; --blacklistedlanes - do not count reads/bases coming from specific lanes.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2761 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-01 22:05:19 +00:00
chartl
2c4f709f6f
Bunch of oneoff stuff that I don't want to lose. Also:
...
VCFRecord - "." dbsnp-ID entries now taken into account (thought these were represented as null; but I guess not)
VCFGenotypeRecord - added a replaceFormat option; since intersecting Broad/BC call sets required genotype formats also be intersected (no changing on-the-fly)
VCFCombine - altered doc to instruct user to give complete priority list (was throwing exception if not)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2760 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-01 21:35:10 +00:00
ebanks
506d39f751
The UG calculations are now driven by an independent engine.
...
This completely separates the genotyper walker from other walkers.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2758 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-01 20:57:31 +00:00
depristo
d9671dffba
Documentation for VariantContext. Please read it and start using it.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2756 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-01 17:49:51 +00:00
ebanks
f6da57dc79
1. For Matt: JIRA GSA-270. Other walkers needing to call into the Unified Genotyper now use static methods (e.g. runGenotyper()) instead of calling initialize and map.
...
2. Set the default confidence cutoff to 50 (instead of 0).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2752 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-31 21:14:57 +00:00
depristo
3d45457595
VariantEval2 test framework implemented; Kiran is experimenting with the system. Not for use by anyone else. VariantContext appears to work well; I'll release it next week for general use following docs of the functions. Removing newvarianteval and other classes to avoid any future confusion. Update to TraverseLoci and RodLocusView to simplify a few functions and to correct some minor errors. All tests pass without modification.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2748 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-30 20:51:24 +00:00
chartl
236764b249
Major (and useful) changes to MultiSampleConcordance:
...
1) Now cares about Genotype filtering. If it is flagged as filtered, it can count as a FP/FN/TP; but goes into a "non-confident genotype" bin, rather than het/hom.
2) Can give it a Genotype Confidence flag (-GC) which will automatically filter genotypes in the way above for quality > Q for "-GC Q"
3) Can give it an -assumeRef flag. For sites only in the truth VCF (that don't even appear in the variant VCF), that locus will be treated as confident
ref calls for all individuals in the variant VCF; and the calculators updated accordingly.
*** Important: Default behavior is that sites unique to the truth VCF are considered no-call sites for the variant. This flag can help get aroudn that;
however the safest way to run this is to have a variant VCF with calls at each and every locus, if that is possible.
VCFGenotypeRecord -- added an isFiltered() call to automate looking up the FILTERED flag for VCF v3.3
SimpleVCFIntersectWalker - basic outline for a walker I'm working on tonight.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2747 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-30 01:18:31 +00:00
chartl
97f60dbc4b
Moving stuff around. ( core;playground ) ----> ( oneoffs ). I've been a bad boy, sullying the core codebase.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2745 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-29 22:50:03 +00:00
depristo
88495a39d4
better formating
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2737 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-29 15:38:21 +00:00
chartl
8de6a8d246
Lots of changes; all to do something relatively minor.
...
1) Changed VCF/RodVCF to allow for inquiries to whether or not the site is novel; isNovel() looks at the ID field, and those members of the info field that indicate membership in dbsnp, hapmap2, or hapmap3; and if none can be found, returns true.
2) Changed VariantAnnotator to annotate hapmap2 and hapmap3, if you bind rods to it with those names. Works in the same way as DBSNP does -- if you give it a rod named "hapmap2" it'll annotate membership in it. -- Passes integration tests
3) Changed UnifiedGenotyper to do the same thing (since it uses Annotations as a subroutine) -- Passes integration tests
4) Changed MultiSampleConcordanceWalker to take a flag --ignoreKnownSites (or -novels) to examine concordance only on sites that are not marked as in dbSNP or in Hapmap in the variant VCF
5) Changed VCFConcordanceCalculator (the object MultiSampleConcordanceWalker runs on) to output Concordant_Het_Calls and Concordant_Hom_Calls separately, rather than combined as Concordant_Calls
6) AlleleBalanceHistogramWalker -- I don't know what i did to this thing. I've been jerry rigging System.outs to do stuff it was never really intended to do; so there's probably some dumb System.out.print("HI I AM AT LOCUS:"+loc) stuck somewhere. It compiles at any rate.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2724 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-28 21:06:56 +00:00
depristo
956b570c8e
V5 improvements to VariantContext. Now fully supports genotypes. Filtering enabled. Significant tests throughout system. Support for rebuilding variant contexts from subsets of genotypes. Some code cleanup around repository
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2721 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-28 18:37:17 +00:00
depristo
f6bca7873c
V3 of VariantContext. Support for Genotypes and NO_CALL alleles. QUAL fields fully implemented. Can parse VCF records and dbSNP. More complete validation. Detailed testing routines for VariantContext and Allele.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2718 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-28 04:10:16 +00:00
depristo
3399ad9691
Incremental update 2 -- refined allele and VariantContext classes; support for AttributedObject class; extensive testing for Allele class, and partial for VariantContext. Now possible to easily convert dbSNP to VariantContext.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2705 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-27 17:19:37 +00:00
chartl
ed9b7edee3
Changed " to ' to stop the
...
[javadoc] /humgen/gsa-scr1/chartl/sting/java/src/org/broadinstitute/sting/oneoffprojects/variantcontext/VariantContext.java:99: warning: unmappable character for encoding ASCII
[javadoc] * if one of the alleles is deleted (?-?).
warnings on compile.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2703 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-27 15:23:55 +00:00
chartl
df112e64b8
Minor tweaks
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2699 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-27 04:17:47 +00:00
chartl
2c8d7b0c44
Forgot the onTraversalDone. That was dumb.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2692 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-26 21:02:46 +00:00