gatk-3.8

Commit Graph

Author	SHA1	Message	Date
hanna	1bc26f69e9	An attempt to cleanup the Utils directory. Email to follow. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3198 348d0f76-0448-11de-a6fe-93d51630548a	2010-04-19 23:00:08 +00:00
hanna	c08936d6f4	Added a reservoir downsampler which can sample elements in an iterator uniformly from a stream (see Vitter 1985). Thanks to Eric and Andrey for the pointer. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3197 348d0f76-0448-11de-a6fe-93d51630548a	2010-04-19 20:48:14 +00:00
aaron	e11ca74eb5	removing some outdated ROD classes (PooledEMSNPROD and SangerSNPROD), removing an out-of-date interface (VariantBackedByBenotype), and moving AnalyzeAnnotationWalker over to VariationContext. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3188 348d0f76-0448-11de-a6fe-93d51630548a	2010-04-16 18:59:29 +00:00
asivache	6dc1275cfb	Utility method added: getQualsInCycleOrder(read) - examines the read and returns its quals in the order the machine read them (i.e. always from cycle 1 to cycle N). Simply inverts quals if the read happens to be rc-aligned :) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3183 348d0f76-0448-11de-a6fe-93d51630548a	2010-04-16 00:15:57 +00:00
aaron	e682460c1f	add a fix so that XL arguments won't cancel out -BTI arguments, fixed a bug for Ben where the ROD -> interval list conversion was throwing an exception, and some old code removal. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3174 348d0f76-0448-11de-a6fe-93d51630548a	2010-04-15 16:31:43 +00:00
hanna	8573b0bc6f	Refactoring intervals, separating the process of parsing interval lists, sorting and merging interval lists, and creating RODs from intervals. This gives Doug the ability to keep using our interval list parsing code when sorting intervals on our behalf. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3159 348d0f76-0448-11de-a6fe-93d51630548a	2010-04-13 15:50:38 +00:00
ebanks	3f2455e346	Better error message as suggested by James P git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3141 348d0f76-0448-11de-a6fe-93d51630548a	2010-04-09 05:52:53 +00:00
aaron	12e4f88ca7	a little bit more clean-up git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3122 348d0f76-0448-11de-a6fe-93d51630548a	2010-04-05 20:49:06 +00:00
aaron	df7e7921ce	removing some unused code. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3121 348d0f76-0448-11de-a6fe-93d51630548a	2010-04-05 19:30:08 +00:00
bthomas	b4f6f54502	Reorganizing the way interval arguments are processed Most of the changes occur in GenomeAnalysisEngine.java and GenomeLocParser.java: -- parseIntervalRegion and parseGenomeLocs combined into parseIntervalArguments -- initializeIntervals modified -- some helper functions deprecated for cleanliness Includes new set of unit tests, GenomeAnalysisEngineTest.java New restrictions: -- all interval arguments are now checked to be on the reference contig -- all interval files must have one of the following extensions: .picard, .bed, .list, .intervals, .interval_list git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3106 348d0f76-0448-11de-a6fe-93d51630548a	2010-04-01 12:47:48 +00:00
aaron	c3c6e632d1	support for two new VCF header info field value-types, Flag (for fields that are just boolean truths), and Character (for single charatcer info fields). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3105 348d0f76-0448-11de-a6fe-93d51630548a	2010-04-01 03:11:32 +00:00
aaron	3d3d19a6a7	the last-mile commit for Tribble integration. The system is now ready for Tribble to be turned on, as soon as we've removed any dependencies in the ROD code on interfaces that aren't in the Tribble library (i.e. the Variation or Genotype interface on RODs). All of the walkers should be up to date. a caveat: for anyone asking for all of the ROD's back from the RefMetaDataTracker (if your not using the facilities to get the track by name), you'll now be getting back a collection of GATKFeature objects. This object will contain the track name, and a method for getting the underlying object (getUnderlyingObject()), which will be the traditional RodVCF, rodDbSNP, etc. This layer is needed so we can integrate Tribble tracks (which don't natively have names). Calls that ask for RODs by name will still get back the traditional reference ordered data objects (RodVCF, rodDbSNP, etc). Sorry for the inconvenience! More changes to come, but this is by far the largest (as has the greatest effect on end users). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3104 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-31 22:39:56 +00:00
hanna	400684542c	Revisions to take into account finalization of Picard patch: naming changes, better definition of public interfaces. This won't be the last Picard patch, but it should be the last big one. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3096 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-30 19:28:14 +00:00
hanna	85037ab13f	Fix for Kiran's sharding issue (Invalid GZIP header). General cleanup of Picard patch, including move of some of the Picard private classes we use to Picard public. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3087 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-29 03:21:27 +00:00
depristo	b8ab74a6dc	Minor useful changes to BaseUtils and MathUtils to support a new haplotype score annotation that determines to the two most likely haplotypes over an interval and scores variants by their consistency with a diploid model. Appears to be useful. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3085 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-28 21:45:22 +00:00
ebanks	47e30aba92	Rods for reads hooked up into the cleaner git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3070 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-24 18:17:56 +00:00
ebanks	49117819f5	For the cleaner to clean, it must beat the entropy produced by the aligner (and not just the raw reads). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3068 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-24 15:21:58 +00:00
aaron	a69b8555dd	Geli to variant context. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3063 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-23 06:45:29 +00:00
aaron	eafdd047f7	GLF to variant context. Added some methods in GLF to aid testing; and added a test that reads GLF, converts to VC, writes GLF and reads back to compare. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3062 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-23 03:43:25 +00:00
hanna	3767adb0bb	Processing intervals as they stream in means much lower memory usage and quicker runtime. Making change as minimal as possible to avoid conflicts with BT's incoming patch. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3061 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-22 22:04:45 +00:00
ebanks	0097106938	VariantFiltration can now filter specific samples. This is NOT an ideal implementation. One day when we have lots of free time (or a greater desire), we will implement this correctly and sophisticatedly using all the power of JEXL. For now, though, this will have to do. Docs coming tonight. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3060 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-22 20:45:11 +00:00
depristo	076d21d394	Minor bug workaround in GenotypeConcordance module (see todo). General platform read filter. You can say -rl Platform illumina to remove all SLX reads git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3054 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-22 02:47:09 +00:00
ebanks	c88a2a3027	Fixing/cleaning up the vcf merge util git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3047 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-19 15:13:32 +00:00
depristo	56092a0fc2	Slight cleanup for mathutils git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3042 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-19 13:18:08 +00:00
ebanks	03480c955c	And now the UnifiedGenotyper can officially annotate genotype (FORMAT) fields too. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3039 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-19 04:58:37 +00:00
ebanks	e757f6f078	Missing value for arbitrary format entries is empty string (need to revisit at some point, but it will require updating the VCF spec). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3038 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-19 03:56:27 +00:00
ebanks	0311980668	The VariantAnnotator can now officially annotate genotype (FORMAT) fields. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3037 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-19 03:30:14 +00:00
ebanks	ee0e833616	Some significant changes to the annotator: 1. Annotations can now be "decorated" with any arbitrary interface description - not just standard or experimental. 2. Users can now not only specify specific annotations to use, but also the interface names from #1. Any number of them can be specified, e.g. -G Standard -G Experimental -A RankSumTest. 3. These same arguments can be used with the Unified Genotyper for when it calls into the Annotator. 4. There are now two types of annotations: those that are applied to the INFO field and those that are applied to specific genotypes (the FORMAT field) in the VCF (however, I haven't implemented any of these latter annotations just yet; coming soon). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3029 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-18 05:38:32 +00:00
rpoplin	58a31bab6a	Variant optimizer now outputs VCF files via ApplyVariantClustersWalker. Documentation to be added to the wiki. It is ready to be used by other people but only with great caution. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3028 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-17 20:41:42 +00:00
hanna	d9398dc347	Remove some of the restrictions on getStart() and getStop(); getStart() and getStop() now do the minimum validation rather than the more rigorous only-within-the-contig-bounds header validation. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3027 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-17 19:39:30 +00:00
ebanks	ded4ba8966	Let's make artificial reads that actually adhere to the specs... git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3022 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-17 16:51:42 +00:00
bthomas	5b34bb9ab0	Adding three minor new features: + -L all now walks over all intervals + if a -L argument is passed with a .list extension, and file does not exist, returns a \ File Not Found error instead of "bad interval" error. We plan to soon revisit interval \ lists and generate a concrete list of filenames, so this is likely temporary. + Error is thrown if the start position on an interval is higher number than the end position. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3021 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-17 16:24:10 +00:00
ebanks	4340601c26	-Pushed base quals back down into SAMRecord; if -OQ is used, the SAMRecord quals get updated automatically -Better integration test git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3020 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-17 16:00:10 +00:00
ebanks	1fd909cdaf	Fix for Kiran: -1 is a valid value for genotype qualities in VCF, so VariantContext shouldn't die. Cleaned up the relevant VCF code while I was in there. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3015 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-17 00:20:15 +00:00
ebanks	586f87fa35	Quick fix git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3007 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-16 02:59:26 +00:00
ebanks	202231141c	-Push the --use_original_qualities argument into the engine. -Check that base and qual strings are the same lengths -Fix one more bug in the clipper. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3006 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-16 02:06:11 +00:00
ebanks	411d25c8d1	-Integration tests for walkers that use original quals. -framework for pushing -OQ into GATK (not done) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3004 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-15 18:46:31 +00:00
kcibul	9f519af06d	new method to filter out overlapping PE reads git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3002 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-15 15:40:09 +00:00
depristo	4dd7c5972c	Unit tests for -XL arguments; expt. annotation calculating the GC content within 100 bp of the current SNP git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2997 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-14 21:08:14 +00:00
aaron	ecb59f5d0d	removed old tests and old code git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2995 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-12 22:57:01 +00:00
depristo	e7eae9b61d	High performance, correct implementation of -XL exclusion lists. Enjoy. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2994 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-12 22:39:20 +00:00
aaron	88a48821ea	removed the dependence on removeRegion() in GenomeLocSortedSet git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2993 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-12 22:35:49 +00:00
aaron	1eb5f97255	fixed dropping single base intervals from deleteRegion, moving onto performance fixes. (stop - start is length-1 on closed intervals, so we need to check greater than OR equals to zero) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2990 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-12 19:14:21 +00:00
hanna	a7ba88e649	Rework the way the MicroScheduler handles locus shards to handle intervals that span shards with less memory consumption. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2981 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-11 18:40:31 +00:00
aaron	dde9fd8a15	some rods-for-reads cleaning and performance improvements. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2979 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-10 22:54:58 +00:00
depristo	486bef9318	Support for validationRate calculation in variant eval 2; better error messages for failed genome loc parsing; tolerance to odd whitespace in plinkrod, and fix for monomorphic sites in vcf2variantcontext. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2976 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-10 16:25:16 +00:00
ebanks	c85ed1ce90	Plumbing is now in place to emit indel calls from the UnifiedGenotyper. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2975 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-10 04:30:12 +00:00
ebanks	5a20bf0e64	3 changes to UG which break integration tests: 1. emit AA,AB,BB likelihoods in the FORMAT field for Mark 2. remove constraint that genotype alleles (in the GT field) need to be lexigraphically sorted. 3. Add bam file(s) used by genotyper to header for Kiran git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2963 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-09 17:16:47 +00:00
ebanks	9f3b99c11b	Moving UnifiedGenotyper and VariantAnnotator over to VariantContext system. Removing obsolete genotyping classes. First stage of removing dependence on old Genotype class. More changes to come. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2960 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-09 03:41:07 +00:00
hanna	1ef1091f7c	Cleanup and simplification of read interval sharding. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2944 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-05 23:34:38 +00:00
ebanks	0dd65461a1	Various improvements to plink, variant context, and VCF code. We almost completely support indels. Not yet done with plink stuff. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2926 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-04 17:58:01 +00:00
chartl	6759acbdef	Coverage statistics now fully implements DepthOfCoverage functionality, including the ability to print base counts. Minor changes to BaseUtils to support 'N' and 'D' characters. PickSequenomProbes now has the option to not print the whole window as part of the probe name (e.g. you just see PROJECT_NAME\|CHR_POS and not PROJECT_NAME\|CHR_POS_CHR_PROBESTART-PROBEND). Full integration tests for CoverageStatistics are forthcoming. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2924 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-04 15:00:02 +00:00
aaron	ca2cd9d4f5	a little clean-up: move setting the bases of generated reads into Artificial SAM Utils now that the clean read injector test is gone. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2919 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-03 16:31:45 +00:00
aaron	790d2a7776	adding the initial ROD for Reads support; more convenience methods in ReadMetaDataTracker to come. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2918 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-03 15:56:44 +00:00
ebanks	0e9a6826b0	Update to VCF code to get it up to spec. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2917 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-03 06:12:42 +00:00
ebanks	5f3c80d9aa	1. To make indel calls, we need to get rid of the SNP-centricity of our code. First step is to have the reference be a String, not a char in the Genotype. Note that this is just a temporary patch until the genotype code is ported over to use VariantContext. 2. Significant refactoring of Plink code to work in the rods and use VariantContext. More coming. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2913 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-02 20:26:40 +00:00
kcibul	7578678f99	refactored to provide a sum of mismatch quality scores capability as well (used by Cancer) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2911 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-02 16:40:03 +00:00
aaron	246fa28386	RODs for reads phase 2: modified RODRecordList to implement List<ReferenceOrderedDatum> so I could stub it out for testing, added a FlashBackIterator which is needed to prevent the ResourcePool from opening infinity+1 iterators, and some other interfaces to make unit testing much smoother. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2892 348d0f76-0448-11de-a6fe-93d51630548a	2010-02-25 22:48:55 +00:00
hanna	199b43fcf2	Reduce by interval alterations to interface with new sharding system. This checkin with be followed by a simplification of some of the locus traversal code. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2886 348d0f76-0448-11de-a6fe-93d51630548a	2010-02-25 00:16:50 +00:00
aaron	fef1154fc8	starting on RODs for Reads: made RODRecordList implement list<RODatum> (so we can sub in fake lists during testing), and removed unnecessary generic-ness. Removed BrokenRODSimulator, which isn't being used. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2884 348d0f76-0448-11de-a6fe-93d51630548a	2010-02-24 22:11:53 +00:00
aaron	5546aa4416	adding code to deal with the off-spec situation where our minimum likelihood is above the GLF max of 255. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2871 348d0f76-0448-11de-a6fe-93d51630548a	2010-02-22 22:27:39 +00:00
alecw	b236714c8a	Optimization - Added method to Covariates: void getValues( SAMRecord read, Comparable[] comparable ) which takes an array of size (at least) read.getReadLength() and fills it with covariate values for all positions in the given read. Made CovariateCounterWalker and TableRecalibrationWalker use this method instead of calling getValue(..) for each covariate and each offset. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2863 348d0f76-0448-11de-a6fe-93d51630548a	2010-02-22 17:35:25 +00:00
aaron	33ae256186	a start to some of the infrastructure for Tribble, including dynamic detection of new RMD; not nearly wired in or complete yet. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2855 348d0f76-0448-11de-a6fe-93d51630548a	2010-02-18 18:43:52 +00:00
ebanks	79ab7affda	- Change sortOnDisk option to sortInMemory - Fix horrible cleaner bug - Trivial optimizations to cleaner code - more significant ones coming soon. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2850 348d0f76-0448-11de-a6fe-93d51630548a	2010-02-17 20:52:57 +00:00
aaron	653f70efa2	added methods to validate an interval before you try to make a GenomeLoc: boolean validGenomeLoc(). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2846 348d0f76-0448-11de-a6fe-93d51630548a	2010-02-16 20:35:35 +00:00
rpoplin	3de72daa88	Removing an accidently added import statement. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2818 348d0f76-0448-11de-a6fe-93d51630548a	2010-02-10 15:54:24 +00:00
rpoplin	0b1e243a7b	CountCovariates now sorts the list of standard covariate classes coming from PackageUtils.getClassesImplementingInterface(). As a result some of the integration tests now make use of -standard git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2817 348d0f76-0448-11de-a6fe-93d51630548a	2010-02-10 15:52:20 +00:00
depristo	934d4b93a2	VariantContext to VCF converter. BeagleROD, and phasing of VCF calls. Integration tests galore :-) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2814 348d0f76-0448-11de-a6fe-93d51630548a	2010-02-09 19:02:25 +00:00
depristo	94f892ad42	VCF->beagle and VCF phasing using beagle input. Appears to work fairly well. VariantContexts now support phased genotypes. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2812 348d0f76-0448-11de-a6fe-93d51630548a	2010-02-09 01:22:05 +00:00
kshakir	fc810a1800	Updated VCF Reader to parse VCFs according to the VCFv3.3 spec. Column headers are tab separated since sample names might have spaces. Updated test files in /humgen/gsa-scr1/GATK_Data/Validation_Data/*.vcf to remove spaces except for when they are supposed to be in the sample name. Added @Test before VCFReaderTest.testHeaderNoRecords() git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2809 348d0f76-0448-11de-a6fe-93d51630548a	2010-02-08 22:55:59 +00:00
hanna	21369869b7	Extend regex that supports every 'word' character to use any printable character except ':'. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2807 348d0f76-0448-11de-a6fe-93d51630548a	2010-02-08 03:29:55 +00:00
depristo	af8c47fc2f	Fixing up testVariantContext for integration tests for variant context. Printing of VCs and genotypes now stable using sorting. Cleaned up comments in quality score by strand. RefMetaDataTracker now directly allows walkers to obtain VariantContexts using the simple Collection<VariantContext> getAllVariantContexts(GenomeLoc curLocation, EnumSet<VariantContext.Type> allowedTypes, boolean requireStartHere, boolean takeFirstOnly) function. VCF and dbSNP VariantContexts now officially supported. Other importan types can be added to the adapator system in refdata package. Integration tests later today git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2791 348d0f76-0448-11de-a6fe-93d51630548a	2010-02-05 15:42:54 +00:00
ebanks	83b9d63d59	1. Added functionality to the data sources to allow engine to get mapping from input files to (merged) read group ids from those files. 2. Used said mapping to implement N-way-in,N-way-out functionality in the new indel cleaner. Still needs more testing (to be done after vacation but preliminary tests look good). 3. Fixes to VCF validator: ignore case when testing VCF reference base against true reference base and allow quals of -1 (as per spec). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2773 348d0f76-0448-11de-a6fe-93d51630548a	2010-02-04 04:12:49 +00:00
chartl	2c4f709f6f	Bunch of oneoff stuff that I don't want to lose. Also: VCFRecord - "." dbsnp-ID entries now taken into account (thought these were represented as null; but I guess not) VCFGenotypeRecord - added a replaceFormat option; since intersecting Broad/BC call sets required genotype formats also be intersected (no changing on-the-fly) VCFCombine - altered doc to instruct user to give complete priority list (was throwing exception if not) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2760 348d0f76-0448-11de-a6fe-93d51630548a	2010-02-01 21:35:10 +00:00
asivache	421282cfa3	Convenience method: getMappingFilteredPileup(int minMapQ) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2759 348d0f76-0448-11de-a6fe-93d51630548a	2010-02-01 21:19:53 +00:00
depristo	d9671dffba	Documentation for VariantContext. Please read it and start using it. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2756 348d0f76-0448-11de-a6fe-93d51630548a	2010-02-01 17:49:51 +00:00
chartl	236764b249	Major (and useful) changes to MultiSampleConcordance: 1) Now cares about Genotype filtering. If it is flagged as filtered, it can count as a FP/FN/TP; but goes into a "non-confident genotype" bin, rather than het/hom. 2) Can give it a Genotype Confidence flag (-GC) which will automatically filter genotypes in the way above for quality > Q for "-GC Q" 3) Can give it an -assumeRef flag. For sites only in the truth VCF (that don't even appear in the variant VCF), that locus will be treated as confident ref calls for all individuals in the variant VCF; and the calculators updated accordingly. *** Important: Default behavior is that sites unique to the truth VCF are considered no-call sites for the variant. This flag can help get aroudn that; however the safest way to run this is to have a variant VCF with calls at each and every locus, if that is possible. VCFGenotypeRecord -- added an isFiltered() call to automate looking up the FILTERED flag for VCF v3.3 SimpleVCFIntersectWalker - basic outline for a walker I'm working on tonight. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2747 348d0f76-0448-11de-a6fe-93d51630548a	2010-01-30 01:18:31 +00:00
aaron	ac2a207b0b	added a wrapper exception for anything that goes wrong in VCF parsing; this way the problematic file line is emitted, no matter what happens. Makes debugging a lot easier, especially in large files. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2739 348d0f76-0448-11de-a6fe-93d51630548a	2010-01-29 19:58:51 +00:00
chartl	d57a86ad41	Not nearly as badass as it looks. The problem I mentioned yesterday with "bleeding in" of samples comes from VCFUtils and SampleUtils looking for all VCF-class RODs in the tracker, and stealing the name from them. I have introduced a new HapmapVCF - type rod for use when you want to protect your VCF header from being infected by the samples in a bound hapmap VCF. Changes are as follows: VCFRecord - minor change to adapt isNovel() to the case where the dbsnp ID field is empty, but the info field has DB=1 HapmapVCFRod - introduced for the reason at the top RODRecordIterator - was: catch ( Exception e ) { throw new StingException("long ass message") } is now: catch ( Exception e ) { throw new StingException("long ass message",e) } to permit full stack ejaculation. RodVCF - Now with more brackets! ReferenceOrderedData - registering HapmapVCF as a bindable string VariantAnnotator - There's an extra space on a line. And some new brackets. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2733 348d0f76-0448-11de-a6fe-93d51630548a	2010-01-29 15:19:50 +00:00
hanna	3d922a019f	Basic support for very simple index-driven locus traversals. Interface has been changed to support batched intervals in a single shard, but intervals are not yet compressed into a single shard. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2730 348d0f76-0448-11de-a6fe-93d51630548a	2010-01-29 03:14:26 +00:00
chartl	7a10c40fb3	Much clearer (and, like, not totally incorrect) implementation of isNovel git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2725 348d0f76-0448-11de-a6fe-93d51630548a	2010-01-28 21:16:21 +00:00
chartl	8de6a8d246	Lots of changes; all to do something relatively minor. 1) Changed VCF/RodVCF to allow for inquiries to whether or not the site is novel; isNovel() looks at the ID field, and those members of the info field that indicate membership in dbsnp, hapmap2, or hapmap3; and if none can be found, returns true. 2) Changed VariantAnnotator to annotate hapmap2 and hapmap3, if you bind rods to it with those names. Works in the same way as DBSNP does -- if you give it a rod named "hapmap2" it'll annotate membership in it. -- Passes integration tests 3) Changed UnifiedGenotyper to do the same thing (since it uses Annotations as a subroutine) -- Passes integration tests 4) Changed MultiSampleConcordanceWalker to take a flag --ignoreKnownSites (or -novels) to examine concordance only on sites that are not marked as in dbSNP or in Hapmap in the variant VCF 5) Changed VCFConcordanceCalculator (the object MultiSampleConcordanceWalker runs on) to output Concordant_Het_Calls and Concordant_Hom_Calls separately, rather than combined as Concordant_Calls 6) AlleleBalanceHistogramWalker -- I don't know what i did to this thing. I've been jerry rigging System.outs to do stuff it was never really intended to do; so there's probably some dumb System.out.print("HI I AM AT LOCUS:"+loc) stuck somewhere. It compiles at any rate. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2724 348d0f76-0448-11de-a6fe-93d51630548a	2010-01-28 21:06:56 +00:00
depristo	956b570c8e	V5 improvements to VariantContext. Now fully supports genotypes. Filtering enabled. Significant tests throughout system. Support for rebuilding variant contexts from subsets of genotypes. Some code cleanup around repository git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2721 348d0f76-0448-11de-a6fe-93d51630548a	2010-01-28 18:37:17 +00:00
ebanks	1dd9996f3a	New realigner now completely uses bytes, plus misc fixes. Still not ready for use. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2719 348d0f76-0448-11de-a6fe-93d51630548a	2010-01-28 04:17:20 +00:00
ebanks	fddca032bb	Initial commit of v2.0 of the cleaner. DO NOT USE. (this means you, Chris) Cleaned up SW code and started moving over everything to use byte[] instead of String or char[]. Added a wrapper class for SAMFileWriter that allows for adding reads out of order. Not even close to done, but I need to commit now to sync up with Andrey. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2712 348d0f76-0448-11de-a6fe-93d51630548a	2010-01-27 21:36:42 +00:00
hanna	fa3589e5c5	Update our error messages to point to getsatisfaction.com/gsa. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2706 348d0f76-0448-11de-a6fe-93d51630548a	2010-01-27 19:16:28 +00:00
hanna	022601b1a5	Warnings for walkers w/o Javadoc. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2683 348d0f76-0448-11de-a6fe-93d51630548a	2010-01-25 20:34:50 +00:00
hanna	d25a2fe120	Better handling of enums by the command-line argument system. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2647 348d0f76-0448-11de-a6fe-93d51630548a	2010-01-20 21:36:46 +00:00
hanna	1e9fe2a334	Clean up error output when enums have missing arguments. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2645 348d0f76-0448-11de-a6fe-93d51630548a	2010-01-20 19:48:26 +00:00
aaron	8d1d37302c	a quick change to GLF to keep as much precision in our likelihoods as long as possible, before we put it into byte space. Sanger was doing a diff at low coverage and noticed our calls didn't contain as much precision as theirs. Updated the MD5 for unified genotyper output. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2644 348d0f76-0448-11de-a6fe-93d51630548a	2010-01-20 19:36:49 +00:00
hanna	908d399670	Bug fix for help text / version number - help text retriever was crashing in the debugger if help text hadn't been built. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2643 348d0f76-0448-11de-a6fe-93d51630548a	2010-01-20 19:18:19 +00:00
hanna	8dafd26100	Print out the current version number in the application header. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2633 348d0f76-0448-11de-a6fe-93d51630548a	2010-01-19 21:58:36 +00:00
hanna	1488578617	Working with Aaron to get svnversion running within the build system. This change will break the build. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2628 348d0f76-0448-11de-a6fe-93d51630548a	2010-01-19 16:55:42 +00:00
depristo	41392f8ff5	functions for setting gentoype records and alternate bases; function for getting all rods implementing VCF git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2611 348d0f76-0448-11de-a6fe-93d51630548a	2010-01-16 20:19:43 +00:00
hanna	ac4756db20	Add the svn version on the fly to the version number properties. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2607 348d0f76-0448-11de-a6fe-93d51630548a	2010-01-16 00:28:01 +00:00
hanna	420cef4094	Added version numbers to the help doclet extractor. Since the help system is behaving more like a resource bundle at this point, changed it over to use the Java ResourceBundle support classes. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2606 348d0f76-0448-11de-a6fe-93d51630548a	2010-01-15 23:31:29 +00:00
hanna	930082314a	Put a major.minor version into the GATK Javadoc for reading. Also, update some straggler packages to the new package-info.java format introduced in 1.5. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2604 348d0f76-0448-11de-a6fe-93d51630548a	2010-01-15 21:48:30 +00:00
ebanks	b911b7df82	Fixing the AC annotation to be in line with the VCF spec git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2593 348d0f76-0448-11de-a6fe-93d51630548a	2010-01-15 18:28:52 +00:00
rpoplin	70df30fc1b	Added method to AlignmentUtils which takes a read's cigar and the refBases char array given to a ReadWalker and returns the aligned reference char array. Bug fix in solid_recal_modes to use this aligned reference array. Recalibrator version number is no longer separate for each of the two walkers. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2589 348d0f76-0448-11de-a6fe-93d51630548a	2010-01-15 15:36:59 +00:00
ebanks	2a116bb5d6	Made the VCF validator a simple rod walker instead of having it be in a separate package. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2588 348d0f76-0448-11de-a6fe-93d51630548a	2010-01-15 06:39:06 +00:00

1 2 3 4 5 ...

685 Commits (ad98512f6c0ada3932ca7db2ecd8bd23f50da99d)