gatk-3.8

Commit Graph

Author	SHA1	Message	Date
chartl	f7d1b8f5de	CoverageStatistics has now replaced DepthOfCoverage -- old DoC is in the archive. Also, I can't be bothered to fix the spelling of "oldepthofcoverage" to contain the necessary number of D's. Be content that it does, however, contain the requisite number of O's. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3109 348d0f76-0448-11de-a6fe-93d51630548a	2010-04-01 16:27:23 +00:00
aaron	585cc880a2	changed jexl expressions to jexl names in the VariantEval2 output, fixed integration test, and fixed a problem where a line was getting dropped in CSV output git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3108 348d0f76-0448-11de-a6fe-93d51630548a	2010-04-01 16:23:14 +00:00
aaron	3d3d19a6a7	the last-mile commit for Tribble integration. The system is now ready for Tribble to be turned on, as soon as we've removed any dependencies in the ROD code on interfaces that aren't in the Tribble library (i.e. the Variation or Genotype interface on RODs). All of the walkers should be up to date. a caveat: for anyone asking for all of the ROD's back from the RefMetaDataTracker (if your not using the facilities to get the track by name), you'll now be getting back a collection of GATKFeature objects. This object will contain the track name, and a method for getting the underlying object (getUnderlyingObject()), which will be the traditional RodVCF, rodDbSNP, etc. This layer is needed so we can integrate Tribble tracks (which don't natively have names). Calls that ask for RODs by name will still get back the traditional reference ordered data objects (RodVCF, rodDbSNP, etc). Sorry for the inconvenience! More changes to come, but this is by far the largest (as has the greatest effect on end users). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3104 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-31 22:39:56 +00:00
chartl	dc802aa26f	Moved CoverageStatistics to core. This will be (soon) renamed DepthOfCoverage; so please use CoverageStatistics git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3090 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-29 13:32:00 +00:00
aaron	074ec77dcc	First go of the new output system for VE2. There are three different report types supported right now (Table, Grep, CSV), which can be specified with the reportType command line option in VE2. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3083 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-27 03:59:32 +00:00
ebanks	2373a4618f	bug caused by a misprint: context != contexts git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3073 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-25 03:08:24 +00:00
aaron	60dfba997b	added some sample annotations to VariantEval2 analysis modules, and some changes to the report system. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3067 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-24 05:40:10 +00:00
depristo	076d21d394	Minor bug workaround in GenotypeConcordance module (see todo). General platform read filter. You can say -rl Platform illumina to remove all SLX reads git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3054 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-22 02:47:09 +00:00
depristo	7b17bcd0af	Refactoring a few useful routines for detecting mendelian violations git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3043 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-19 13:19:01 +00:00
ebanks	b8e8852b4f	Better interface for the Annotator in how it interacts with VariantContext. Also, added a proof of concept genotype-level annotation (not working yet, almost there). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3035 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-18 20:41:57 +00:00
ebanks	ee0e833616	Some significant changes to the annotator: 1. Annotations can now be "decorated" with any arbitrary interface description - not just standard or experimental. 2. Users can now not only specify specific annotations to use, but also the interface names from #1. Any number of them can be specified, e.g. -G Standard -G Experimental -A RankSumTest. 3. These same arguments can be used with the Unified Genotyper for when it calls into the Annotator. 4. There are now two types of annotations: those that are applied to the INFO field and those that are applied to specific genotypes (the FORMAT field) in the VCF (however, I haven't implemented any of these latter annotations just yet; coming soon). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3029 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-18 05:38:32 +00:00
ebanks	e367a50e9b	Added genotype concordance module. Not at all finished, but needed to give something to Aaron to look at for help in printing the output nicely. Also misc cleanup and fixes (e.g. perform evalulation even when no comp tracks are provided). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2996 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-14 19:02:24 +00:00
depristo	b39b5edca8	Bug fix in variant eval 2. Preliminary (slow and buggy) support for -XL exclude lists. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2991 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-12 19:23:12 +00:00
depristo	18ba9929f9	notes for eric git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2983 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-11 20:34:54 +00:00
depristo	4f4555c80f	PPV and Sensitivity added to validation tool output; support for arbitrary -sample arguments to subset variant contexts by sample git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2978 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-10 22:28:31 +00:00
depristo	486bef9318	Support for validationRate calculation in variant eval 2; better error messages for failed genome loc parsing; tolerance to odd whitespace in plinkrod, and fix for monomorphic sites in vcf2variantcontext. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2976 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-10 16:25:16 +00:00
chartl	0a49dffa8f	Row/Column names are now R-friendly git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2966 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-09 19:01:03 +00:00
ebanks	9f3b99c11b	Moving UnifiedGenotyper and VariantAnnotator over to VariantContext system. Removing obsolete genotyping classes. First stage of removing dependence on old Genotype class. More changes to come. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2960 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-09 03:41:07 +00:00
chartl	21bf8b4b93	Odd, what I saw on IntelliJ hadn't saved to sting before committing. Here's the actual change. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2956 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-08 15:54:41 +00:00
chartl	cc6a714c09	Handle excess coverage in interval output git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2954 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-08 14:40:05 +00:00
chartl	037ac9c9af	Actually calculate base counts by read group when "both" is specified. Modified integration test to cement the now-correct "both" behavior. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2941 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-05 18:31:48 +00:00
chartl	8738c544f1	Minor refactoring of CoverageStatistics to allow simultaneous output of per-sample and per-read group statistics. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2940 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-05 17:06:52 +00:00
depristo	33cefddf55	Better INFO field annotation for Mendel violations git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2937 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-05 15:22:04 +00:00
chartl	706d49d84c	Commit for Aaron git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2932 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-04 21:29:07 +00:00
ebanks	0dd65461a1	Various improvements to plink, variant context, and VCF code. We almost completely support indels. Not yet done with plink stuff. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2926 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-04 17:58:01 +00:00
chartl	6759acbdef	Coverage statistics now fully implements DepthOfCoverage functionality, including the ability to print base counts. Minor changes to BaseUtils to support 'N' and 'D' characters. PickSequenomProbes now has the option to not print the whole window as part of the probe name (e.g. you just see PROJECT_NAME\|CHR_POS and not PROJECT_NAME\|CHR_POS_CHR_PROBESTART-PROBEND). Full integration tests for CoverageStatistics are forthcoming. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2924 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-04 15:00:02 +00:00
aaron	790d2a7776	adding the initial ROD for Reads support; more convenience methods in ReadMetaDataTracker to come. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2918 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-03 15:56:44 +00:00
chartl	cfff486338	This commit is for Kiran git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2898 348d0f76-0448-11de-a6fe-93d51630548a	2010-02-26 18:18:38 +00:00
chartl	87f8fb7282	Quick commit in advance of Aaron's. Just a bunch of refactoring (private classes separated out, put in proper package). Also support added for coverage by read group rather than sample. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2897 348d0f76-0448-11de-a6fe-93d51630548a	2010-02-26 16:39:47 +00:00
chartl	496ecc8186	Change in how overall coverage and means are stored in the DOCS object; change from keeping track of sample mean coverage to keeping track of sample total coverage (calculate means at the end) This is a mid-way commit for Aaron git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2895 348d0f76-0448-11de-a6fe-93d51630548a	2010-02-26 15:51:12 +00:00
depristo	9a6b384adb	Support for no qual fields in VCF; better support for Mendelian violation calculations git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2893 348d0f76-0448-11de-a6fe-93d51630548a	2010-02-26 00:29:17 +00:00
aaron	246fa28386	RODs for reads phase 2: modified RODRecordList to implement List<ReferenceOrderedDatum> so I could stub it out for testing, added a FlashBackIterator which is needed to prevent the ResourcePool from opening infinity+1 iterators, and some other interfaces to make unit testing much smoother. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2892 348d0f76-0448-11de-a6fe-93d51630548a	2010-02-25 22:48:55 +00:00
chartl	591102a841	Don't close the output stream if we're printing to stdout git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2891 348d0f76-0448-11de-a6fe-93d51630548a	2010-02-25 21:50:58 +00:00
chartl	10cc71ceb0	Another midway commit for teh engineerz git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2890 348d0f76-0448-11de-a6fe-93d51630548a	2010-02-25 21:24:02 +00:00
chartl	3d92e5a737	Initial commit of integration test(s) for CoverageStatistics, currently in progress [midway commit is for Matt] Modifications to CoverageStatistics - now includes and extends much of the behavior of DepthOfCoverage (per-base output, per-target output). Additional functionality (coverage without deletions, base counts, by read group instead of by sample) is upcoming. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2888 348d0f76-0448-11de-a6fe-93d51630548a	2010-02-25 20:25:07 +00:00
aaron	fef1154fc8	starting on RODs for Reads: made RODRecordList implement list<RODatum> (so we can sub in fake lists during testing), and removed unnecessary generic-ness. Removed BrokenRODSimulator, which isn't being used. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2884 348d0f76-0448-11de-a6fe-93d51630548a	2010-02-24 22:11:53 +00:00
chartl	5df37968de	Simplification of code segments; slight alteration to per-locus tabulation; added to-do items for cosmetic changes (mostly binning options and settigns) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2882 348d0f76-0448-11de-a6fe-93d51630548a	2010-02-24 05:20:18 +00:00
chartl	1f673e9fab	Float the bins with the given lower bound git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2878 348d0f76-0448-11de-a6fe-93d51630548a	2010-02-23 20:48:53 +00:00
chartl	119d449b46	Formatting changes git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2877 348d0f76-0448-11de-a6fe-93d51630548a	2010-02-23 20:43:15 +00:00
chartl	173956927b	Summaries generated for firehose from DoC output have been migrated to its own walker to calculate aggregate coverage statistics in a parallelizable and fast way. This is an initial commit, bug-fixing and testing is upcoming. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2876 348d0f76-0448-11de-a6fe-93d51630548a	2010-02-23 18:41:02 +00:00
chartl	0e05a3acb0	Adding depth of coverage features to firehose summary tools git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2860 348d0f76-0448-11de-a6fe-93d51630548a	2010-02-19 19:47:16 +00:00
aaron	b1a4e6d840	removing non-ascii characters from my Copyright and from VariantEval2Walker git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2856 348d0f76-0448-11de-a6fe-93d51630548a	2010-02-18 18:54:36 +00:00
depristo	a1a3d5fcb0	Support for reading in table of rsIDs -> dbSNP builds to back generate a dbSNP build X from a single file. Very useful indeed. dbSNP -> VC now captures the rsID in the context git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2837 348d0f76-0448-11de-a6fe-93d51630548a	2010-02-12 22:40:55 +00:00
chartl	04a2784bf7	Initial commit of tools under development for data QC through firehose. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2834 348d0f76-0448-11de-a6fe-93d51630548a	2010-02-12 19:13:24 +00:00
depristo	5f74fffa02	Massive improvements to VE2 infrastructure. Now supports VCF writing of interesting sites; multiple comp and eval tracks. Eric will be taking it over and expanding functionality over the next few weeks until it's ready to replace VE1 git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2832 348d0f76-0448-11de-a6fe-93d51630548a	2010-02-12 15:26:52 +00:00
depristo	c66861746a	improvements to ve2, including more meaningful mendelian violation counting. Support for VCF emitted interesting sites, annotated according to the evaluations themselves. Basic intergration test for VE2 started git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2819 348d0f76-0448-11de-a6fe-93d51630548a	2010-02-10 16:12:29 +00:00
depristo	934d4b93a2	VariantContext to VCF converter. BeagleROD, and phasing of VCF calls. Integration tests galore :-) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2814 348d0f76-0448-11de-a6fe-93d51630548a	2010-02-09 19:02:25 +00:00
depristo	94f892ad42	VCF->beagle and VCF phasing using beagle input. Appears to work fairly well. VariantContexts now support phased genotypes. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2812 348d0f76-0448-11de-a6fe-93d51630548a	2010-02-09 01:22:05 +00:00
chartl	935e76daa1	Minor changes to oneoff walkers. PlinkRod altered but still commented. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2808 348d0f76-0448-11de-a6fe-93d51630548a	2010-02-08 18:49:56 +00:00
depristo	3b1ab86d11	Added generic interfaces to RefMetaDataTracker to obtain VariantContext objects. More docs. Integration tests for VariantContexts using dbSNP and VCF. At this stage if you use dbSNP or VCF files only in your walkers, please move them over to the VariantContext, it's just nicer. If you've got RODs that implemented the old variation/genotype interfaces, and you want them to work in new walkers, please add an adaptor to VariantContextAdaptors in refdata package. It should be easy and will reduce burden in the long term when those interfaces are retired. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2803 348d0f76-0448-11de-a6fe-93d51630548a	2010-02-06 16:26:06 +00:00
depristo	995d55da81	now uses the new RMDT getVariantContext() functions instead of doing the work itself. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2802 348d0f76-0448-11de-a6fe-93d51630548a	2010-02-06 16:23:06 +00:00
depristo	af8c47fc2f	Fixing up testVariantContext for integration tests for variant context. Printing of VCs and genotypes now stable using sorting. Cleaned up comments in quality score by strand. RefMetaDataTracker now directly allows walkers to obtain VariantContexts using the simple Collection<VariantContext> getAllVariantContexts(GenomeLoc curLocation, EnumSet<VariantContext.Type> allowedTypes, boolean requireStartHere, boolean takeFirstOnly) function. VCF and dbSNP VariantContexts now officially supported. Other importan types can be added to the adapator system in refdata package. Integration tests later today git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2791 348d0f76-0448-11de-a6fe-93d51630548a	2010-02-05 15:42:54 +00:00
depristo	c6d86da4b8	almost managed to move things around perfectly in move go git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2788 348d0f76-0448-11de-a6fe-93d51630548a	2010-02-05 14:18:26 +00:00
depristo	e0af3bf761	updating back names git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2786 348d0f76-0448-11de-a6fe-93d51630548a	2010-02-05 13:53:45 +00:00
depristo	777617b6c7	managed to actually move the files too! Damn you svn git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2785 348d0f76-0448-11de-a6fe-93d51630548a	2010-02-05 13:47:19 +00:00
depristo	8938a4146d	moving varianteval2 to it's own dir git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2784 348d0f76-0448-11de-a6fe-93d51630548a	2010-02-05 13:37:04 +00:00
depristo	69132c81aa	Documentation. Plus nicer structure to adaptors. Intermediate checkin before move into core git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2783 348d0f76-0448-11de-a6fe-93d51630548a	2010-02-05 13:33:27 +00:00
depristo	1d86dd7fd1	Interface changes following Matt's advice. VariantContexts are now immutable, and there are special mutable versions, in case you need to change things. AttributedObject now a InferredGeneticContext and package protected. VariantContexts are now named, which makes them easier to use with the rod system git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2780 348d0f76-0448-11de-a6fe-93d51630548a	2010-02-04 20:55:49 +00:00
asivache	152f65b362	Do not die in --cycleOnly mode when the lane is not paired end, just count all single end basequals into the first column and leave the second column filled with 0s git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2778 348d0f76-0448-11de-a6fe-93d51630548a	2010-02-04 19:48:12 +00:00
asivache	a3cd56897d	moving older versions of the oneoff project to archive, bye-bye git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2777 348d0f76-0448-11de-a6fe-93d51630548a	2010-02-04 19:46:27 +00:00
asivache	f7e7bcd2ef	Oneoff project, totally unrelated to anything git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2776 348d0f76-0448-11de-a6fe-93d51630548a	2010-02-04 19:44:50 +00:00
hanna	334da80e8b	Fixed Mark's bad checkin. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2775 348d0f76-0448-11de-a6fe-93d51630548a	2010-02-04 12:40:58 +00:00
depristo	1ce0f06216	temp checkin for reorganization git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2774 348d0f76-0448-11de-a6fe-93d51630548a	2010-02-04 11:10:24 +00:00
depristo	c89ba7b1a4	improvements to variant eval 2. Now has titv calculations and mendelian violation detect support. we only make ~80 mendelian violations in 380K calls for the YRI trio, in case you are interested git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2768 348d0f76-0448-11de-a6fe-93d51630548a	2010-02-03 16:03:19 +00:00
depristo	fa2cd432fd	better printing in VE2. Added support for TiTv analysis git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2766 348d0f76-0448-11de-a6fe-93d51630548a	2010-02-02 21:20:29 +00:00
depristo	cbbc0e98d2	fix for broken imports git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2765 348d0f76-0448-11de-a6fe-93d51630548a	2010-02-02 15:20:27 +00:00
depristo	681c196097	V2 of VariantEval2. Framework is essentially complete., very simple and clear now compared to VE1. Support for any number of JEXL expressions. dbSNP% evaluation added to show paired comparison evaluation. Pretty printing output tables. Performance is poor but can easily be fixed (see todo notes). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2764 348d0f76-0448-11de-a6fe-93d51630548a	2010-02-02 14:18:46 +00:00
hanna	9dbdfff786	Moved VariantEval to core. Updated integration test md5s to reflect new Analysis class names. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2762 348d0f76-0448-11de-a6fe-93d51630548a	2010-02-02 00:22:15 +00:00
asivache	4ddbaeed07	In attempt to reuse: --pairCountsOutput is now optional, if not specified then only per-locus statistics is collected; --silent - do not echo results into stdout; --minMapQ - count only bases coming from reads mapped with specified quality or better; --blacklistedlanes - do not count reads/bases coming from specific lanes. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2761 348d0f76-0448-11de-a6fe-93d51630548a	2010-02-01 22:05:19 +00:00
chartl	2c4f709f6f	Bunch of oneoff stuff that I don't want to lose. Also: VCFRecord - "." dbsnp-ID entries now taken into account (thought these were represented as null; but I guess not) VCFGenotypeRecord - added a replaceFormat option; since intersecting Broad/BC call sets required genotype formats also be intersected (no changing on-the-fly) VCFCombine - altered doc to instruct user to give complete priority list (was throwing exception if not) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2760 348d0f76-0448-11de-a6fe-93d51630548a	2010-02-01 21:35:10 +00:00
ebanks	506d39f751	The UG calculations are now driven by an independent engine. This completely separates the genotyper walker from other walkers. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2758 348d0f76-0448-11de-a6fe-93d51630548a	2010-02-01 20:57:31 +00:00
depristo	d9671dffba	Documentation for VariantContext. Please read it and start using it. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2756 348d0f76-0448-11de-a6fe-93d51630548a	2010-02-01 17:49:51 +00:00
ebanks	f6da57dc79	1. For Matt: JIRA GSA-270. Other walkers needing to call into the Unified Genotyper now use static methods (e.g. runGenotyper()) instead of calling initialize and map. 2. Set the default confidence cutoff to 50 (instead of 0). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2752 348d0f76-0448-11de-a6fe-93d51630548a	2010-01-31 21:14:57 +00:00
depristo	3d45457595	VariantEval2 test framework implemented; Kiran is experimenting with the system. Not for use by anyone else. VariantContext appears to work well; I'll release it next week for general use following docs of the functions. Removing newvarianteval and other classes to avoid any future confusion. Update to TraverseLoci and RodLocusView to simplify a few functions and to correct some minor errors. All tests pass without modification. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2748 348d0f76-0448-11de-a6fe-93d51630548a	2010-01-30 20:51:24 +00:00
chartl	236764b249	Major (and useful) changes to MultiSampleConcordance: 1) Now cares about Genotype filtering. If it is flagged as filtered, it can count as a FP/FN/TP; but goes into a "non-confident genotype" bin, rather than het/hom. 2) Can give it a Genotype Confidence flag (-GC) which will automatically filter genotypes in the way above for quality > Q for "-GC Q" 3) Can give it an -assumeRef flag. For sites only in the truth VCF (that don't even appear in the variant VCF), that locus will be treated as confident ref calls for all individuals in the variant VCF; and the calculators updated accordingly. *** Important: Default behavior is that sites unique to the truth VCF are considered no-call sites for the variant. This flag can help get aroudn that; however the safest way to run this is to have a variant VCF with calls at each and every locus, if that is possible. VCFGenotypeRecord -- added an isFiltered() call to automate looking up the FILTERED flag for VCF v3.3 SimpleVCFIntersectWalker - basic outline for a walker I'm working on tonight. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2747 348d0f76-0448-11de-a6fe-93d51630548a	2010-01-30 01:18:31 +00:00
chartl	97f60dbc4b	Moving stuff around. ( core;playground ) ----> ( oneoffs ). I've been a bad boy, sullying the core codebase. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2745 348d0f76-0448-11de-a6fe-93d51630548a	2010-01-29 22:50:03 +00:00
depristo	88495a39d4	better formating git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2737 348d0f76-0448-11de-a6fe-93d51630548a	2010-01-29 15:38:21 +00:00
chartl	8de6a8d246	Lots of changes; all to do something relatively minor. 1) Changed VCF/RodVCF to allow for inquiries to whether or not the site is novel; isNovel() looks at the ID field, and those members of the info field that indicate membership in dbsnp, hapmap2, or hapmap3; and if none can be found, returns true. 2) Changed VariantAnnotator to annotate hapmap2 and hapmap3, if you bind rods to it with those names. Works in the same way as DBSNP does -- if you give it a rod named "hapmap2" it'll annotate membership in it. -- Passes integration tests 3) Changed UnifiedGenotyper to do the same thing (since it uses Annotations as a subroutine) -- Passes integration tests 4) Changed MultiSampleConcordanceWalker to take a flag --ignoreKnownSites (or -novels) to examine concordance only on sites that are not marked as in dbSNP or in Hapmap in the variant VCF 5) Changed VCFConcordanceCalculator (the object MultiSampleConcordanceWalker runs on) to output Concordant_Het_Calls and Concordant_Hom_Calls separately, rather than combined as Concordant_Calls 6) AlleleBalanceHistogramWalker -- I don't know what i did to this thing. I've been jerry rigging System.outs to do stuff it was never really intended to do; so there's probably some dumb System.out.print("HI I AM AT LOCUS:"+loc) stuck somewhere. It compiles at any rate. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2724 348d0f76-0448-11de-a6fe-93d51630548a	2010-01-28 21:06:56 +00:00
depristo	956b570c8e	V5 improvements to VariantContext. Now fully supports genotypes. Filtering enabled. Significant tests throughout system. Support for rebuilding variant contexts from subsets of genotypes. Some code cleanup around repository git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2721 348d0f76-0448-11de-a6fe-93d51630548a	2010-01-28 18:37:17 +00:00
depristo	f6bca7873c	V3 of VariantContext. Support for Genotypes and NO_CALL alleles. QUAL fields fully implemented. Can parse VCF records and dbSNP. More complete validation. Detailed testing routines for VariantContext and Allele. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2718 348d0f76-0448-11de-a6fe-93d51630548a	2010-01-28 04:10:16 +00:00
depristo	3399ad9691	Incremental update 2 -- refined allele and VariantContext classes; support for AttributedObject class; extensive testing for Allele class, and partial for VariantContext. Now possible to easily convert dbSNP to VariantContext. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2705 348d0f76-0448-11de-a6fe-93d51630548a	2010-01-27 17:19:37 +00:00
chartl	ed9b7edee3	Changed " to ' to stop the [javadoc] /humgen/gsa-scr1/chartl/sting/java/src/org/broadinstitute/sting/oneoffprojects/variantcontext/VariantContext.java:99: warning: unmappable character for encoding ASCII [javadoc] * if one of the alleles is deleted (?-?). warnings on compile. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2703 348d0f76-0448-11de-a6fe-93d51630548a	2010-01-27 15:23:55 +00:00
chartl	df112e64b8	Minor tweaks git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2699 348d0f76-0448-11de-a6fe-93d51630548a	2010-01-27 04:17:47 +00:00
chartl	2c8d7b0c44	Forgot the onTraversalDone. That was dumb. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2692 348d0f76-0448-11de-a6fe-93d51630548a	2010-01-26 21:02:46 +00:00
chartl	04e1832968	Added - AlleleBalanceHistogramWalker -- hopefully this'll be able to tell us very clearly whether bad genotype concordance is a result of systematic contamination (consistent wonky allele balances) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2691 348d0f76-0448-11de-a6fe-93d51630548a	2010-01-26 20:57:12 +00:00
depristo	c231547204	Refactoring and migration of new allele/variantcontext/genotype code into oneoffprojects. NOT FOR USE. PlinkRod commented out due to dependence on this new, rapidly changing interface. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2687 348d0f76-0448-11de-a6fe-93d51630548a	2010-01-26 13:53:29 +00:00
depristo	c871a0f221	UG map() now returns a VariantCallContext object. Also has a field for confidentlyCalledBases. UG reduce() emits statistics on the confident called % of bases git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2664 348d0f76-0448-11de-a6fe-93d51630548a	2010-01-22 23:06:43 +00:00
asivache	74779a9a78	First version of the tool that tries determining indel error rate (basically, counts indels that look like sequencing/alignment errors - such as a single observation at deeply covered locus, and reports the rate of their occurence) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2648 348d0f76-0448-11de-a6fe-93d51630548a	2010-01-21 15:28:20 +00:00
chartl	ab289872e4	Changes: - Annotations return null when given pileups with no second-base information - SequenomRodWithGenomeLoc -- beter handling of indels Eric; I made two small changes to the new Genotype interface that we should talk about (they basically have to do with allele/genotype representation): Allele - added a new UNKNOWN_POINT_MUTATION to AlleleType. If I see a sequenom genotype AG; one's got to be ref, one's got to be SNP, but until I have an actual reference base in hand, I don't know which is which. That's what this entry is for. Genotype - added an enum class StandardAttributes for dealing with things like deletion/inversion length. This is probably not the way we want to represent indels, so we should talk about this. Plus now that there's a direct link between my ROD and the genotype; when we do decide how to deal with indels, we'll be forced to alter the SequenomRodWithGenomeLoc accordingly. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2642 348d0f76-0448-11de-a6fe-93d51630548a	2010-01-20 16:45:17 +00:00
aaron	0513690416	two fixes in the new cached DbSNP code: -isBiallelic would incorrectly say triallelic sites are biallelic. -getAlternateAlleleList was broken, since the new cached list is immutable, we couldn’t remove list items. Also added a dbSNP validating walker to the one-offs, for testing the new b37 130 dbSNP rod. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2568 348d0f76-0448-11de-a6fe-93d51630548a	2010-01-13 00:27:34 +00:00
chartl	7e3e714d3c	Moving experimental annotations from core to oneoffs git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2528 348d0f76-0448-11de-a6fe-93d51630548a	2010-01-06 19:34:10 +00:00
chartl	a32245f7d2	Modifications: QualityUtils - Stole the BaseUtils code for flipping reads around and applied it to quality scores SecondBaseSkew - Nothing's really different, just a commented line Additions (experimental annotations for future development of second-base annotation) I DO NOT INTEND FOR ANYONE TO USE THESE - ProportionOfNonrefBasesSupportingSNP - ProportionOfSNPSecondBasesSupportingRef - ProportionOfRefSecondBasesSupportingSNP + I hope these are self-explanatory - QualityAdjustedSecondBaseLod + Adjust lod-score by 10*log10[P[second bases are as observed]] Added walker: QualityScoreByStrand - oneoff project that's being saved if i ever need it git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2527 348d0f76-0448-11de-a6fe-93d51630548a	2010-01-06 19:18:07 +00:00
hanna	a4b69d0adf	Misc bug fixes. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2501 348d0f76-0448-11de-a6fe-93d51630548a	2010-01-05 14:48:19 +00:00
hanna	29c129aced	Added very primitive read fishing walker with lots of hard coding. Fixed bugs encountered when testing read fishing in Ecoli. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2496 348d0f76-0448-11de-a6fe-93d51630548a	2010-01-04 00:54:57 +00:00
depristo	87e863b48d	Removed used routines in duputils; duplicatequals to archive; docs for new duplicate traversal code; general code cleanup; bug fixes for combineduplicates; integration tests for combine duplicates walker git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2468 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-29 19:46:29 +00:00
depristo	fcc80e8632	Completely rewritten duplicate traversal, more free of bugs, with integration tests for count duplicates walker validated on a TCGA hybrid capture lane. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2458 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-28 23:56:49 +00:00
ebanks	a5f75cbfd4	The previous commit broke the build, so this is a temporary patch to get it to compile. ConcordanceTruthTable should use enums (esp. now that all of the concordance variables need to be public), but VariantEval will need to be rewritten soon anyways so I'll just push it off until then. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2413 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-20 02:34:41 +00:00
chartl	7b5e332ff3	Added - PairedQualityScoreCountsWalker: counts quality scores (e.g. as a histogram) on first reads of a pair and second reads of a pair. Turns out there's a consistent difference in quality scores; even after recalibrating without the pair ordering as a covariate (there's a bit of averaging -- but not as much as I initially thought). Added - A paired read order covariate to use with recalibration. Currently experimental: for instance, what's a proper pair versus just a pair? Nobody should use this one... git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2401 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-18 15:01:01 +00:00
chartl	b42fc905e8	Added - new tests (Hapmap was re-added) Modified - Hapmap now takes a -q command to filter out variants by quality Modified - MathUtils - cumBinomialProbLog now uses BigDecimal to handle some numerical imprecisions Modified - PowerBelowFrequency - returns 0.0 if called with a negative number (can't be done from inside the walker itself, but since it's called elsewhere one can't be too careful) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2350 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-14 21:57:20 +00:00
hanna	0da2105e3c	Moving DuplicateQualsWalker to oneoffprojects. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2332 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-11 19:22:32 +00:00

1 2 3 4

155 Commits (3873dccb35eb904d2824c4ae8e7aad0c84c55635)