gatk-3.8

Commit Graph

Author	SHA1	Message	Date
aaron	782e0018e4	removal of most of the old GATK ROD system; also a fix for -Dsingle so we can again run just a single unit or integration test (single tests in tribble can be run with the -DsingleTest option now). More to come. * Three integration tests had to change: * RecalibarationWalkersIntegrationTest: One of the tests was using the interval as the snp track, and wasn't supplying a DbSNP track (for CountCovariates) SequenomValidationConverterIntegrationTest: relies on Plink ROD which we've removed. PileupWalkerIntegrationTest: we no longer have implicit interval tracks, so there isn't a rod name over the specified region. Otherwise the same result. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4292 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-15 22:54:49 +00:00
hanna	70bb480939	The battle is over. Picard is revved. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4200 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-03 05:28:01 +00:00
ebanks	bfcac33e80	Cleaning up playground utils and tests git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4136 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-27 01:25:47 +00:00
ebanks	dfae48cee0	Moving supported tools to core git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4127 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-26 13:56:19 +00:00
ebanks	79cd716671	More cleanup of the Genomic Annotator. Also, we now require join tables to have unique entries for the column keyed on the join. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4124 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-26 04:43:52 +00:00
ebanks	4678613893	Significant fixes for the Genomic Annotator. 1. Rip out all of Ben's code intended to circumvent the stable VCF Writer output system in multi-threaded mode (I threw up a little when I saw this code). This will improve memory consumption when running with -nt. 2. Don't annotate indels or > bi-allelic sites. 3. Fix bug where not all records were making it into the output VCF. 4. General code clean up. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4118 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-25 20:16:50 +00:00
hanna	bf0b6bd486	Update integration tests to use the new ROD syntax. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4112 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-25 18:13:30 +00:00
ebanks	90aef66ec5	Minor fixes for my last commit git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4090 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-23 23:25:29 +00:00
ebanks	ccda4f6ec1	More output consistency changes (updating wiki docs as I go along). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4086 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-23 18:46:08 +00:00
aaron	c1df293feb	remove testing code from tribble track builder, set the command line program in walker test to null to reclaim memory in integration tests, and removed some orphaned intergration tests. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4046 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-16 23:52:01 +00:00
kshakir	f39dce1082	Exposed CommandLineFunction defaults to the Queue.jar command line (see -help). Added ability to skip up-to-date jobs where the outputs are older than the inputs. Changed -T CountDuplicates --quiet to --quietLocus so that Queue GATK extensions can use both short and full argument names. Short names can be used to set values on Queue GATK extensions, for example: vf.XL :+= myFile Moved Hidden from the GATK to StingUtils. Updated ivy from 2.0.0 to 2.2.0-rc1 to fix sha1 issue: http://bit.ly/aX72w7 Added Queue to javadoc and testing build targets. Added first Queue unit test. Another pass at avoiding cycles in the DAG thanks to all function I/O being files. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4017 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-11 21:58:26 +00:00
aaron	d514c424fd	adding tests for BTI in the ROD validation tests git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3997 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-10 06:05:40 +00:00
ebanks	340bd0e2c1	Removed hard-coded pointers to references git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3934 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-04 17:59:37 +00:00
delangel	5af986e0c1	Add an integration test for Beagle (one for ProduceBeagleInput and one for BeagleOutputToVCFWalker) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3897 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-29 18:49:22 +00:00
delangel	473ec91633	a) Bug fix in VCFHeader parsing - Info fields were not being parsed properly, with the result that the Count field was not being properly displayed in records (e.g. if Count=0 for a particular field, the INFO tag was still being displayed as ...;Field=x;... instead of ...;Field;... b) Bug fixes and update to how we represent indels and other complex events in a VariantContext object. Convention is now that all events are left aligned, with the first variant context location marking the common base before an event occurs. However, alleles in a VC don't have the common base in all VC's. Two new functions are now part of VariantContextUtils: CreateVariantContextWithPaddedAlleles and CreateVariantContextWithTrimmedAlleles. Both take a VC as an input and create a VC as an output. Main flow is that a VCF reader would create a VC with trimmed alleles, all walkers would ideally work with these trimmed alleles, and then the VCF writer would pad back the alleles before writing. However, there are special cases where we need to pad alleles like for example when merging/combining VC's. Pending issues: - PED and DBSNP RODs have to be updated to create VC's for indels following the convention above. Changes will go in after Tribble location is moved and things are tested. - Need to verify Indel genotyper and other modules that create VC's with indels.- Wiki page describing convention above and how walkers should interpret indel VC's still needs updating/detailing. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3850 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-22 02:36:45 +00:00
ebanks	c6ad26e04f	1) When quals/GQs are really integers (x.00), strip off the floating points. 2) Keep track of whether vcf records are unfiltered vs. pass filters in the variant context so we can regenerate the records on output. 3) No more "ID" hard-coded all over the code to set the VariantContext ID. Use a static variable instead. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3840 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-20 18:01:45 +00:00
rpoplin	8e31c01680	Solid processing in base quality recalibrator now has several options for how to handle no calls in the color space. --ignore_nocall_colorspace is removed and replace by --solid_nocall_strategy. Fixed some of the @Deprecated tags in BaseUtils. LocusWalkers now filter out FailsVendorQualityCheck reads. HLA caller integration test bam file had bad vendor reads so its integration test changed. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3831 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-19 19:10:29 +00:00
delangel	55b756f1cc	First step in major cleanup/redo of VCF functionality. Specifically, now: a) VCF track name can work again with 3.3 or 4.0 VCF's when specifying -B name,VCF,file. Code will read header and parse automatically the version. b) Old VCF codec is deprecated. Reader goes now direct from parsing VCF lines into producing VariantContext objects, with no intermediate VCF records. If anyone can't resist the urge to still input files using the old method, a new VCF3Codec is in place with the old code, but it will be eventually deleted. c) VCF headers and VCF info fields no longer keep track of the version. They are parsed into an internal representation and will be output only in VCF4.0 format. d) As a consequence, the existing GATK bug where files are produced with VCF4 body but VCF3.3 headers is solved. e) Several VCF 4.0 writer bugs are now solved. f) Integration test MD5's are changed, mostly because of corrected VCF4.0 headers and because validation data mostly uses now VCF4.0. g) Several VCF files in the ValidationData/ directory have been converted to VCF 4.0 format. I kept the old versions, and the new versions have a .vcf4 extension. Pending issues: a) We are still not dealing with indels consistently or correctly when representing them. This will be a second part of the changes. b) The VCF writer doesn't use VCFRecord but it does still use a lot of leftovers like VCFGenotypeEncoding, VCFGenotypeRecord, etc. This needs to be simplified and cleaned. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3813 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-16 22:49:16 +00:00
ebanks	6b5c88d4d6	The GATK no longer writes vcf3.3; welcome to the world of vcf4.0. Needed to fix a few output bugs to get this to work, but it's looking great. Much more still to come. Guillermo: hopefully this doesn't break your local build too badly. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3786 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-14 04:56:58 +00:00
ebanks	9a05e8143d	Move to 4.0 and away from VCFRecord. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3780 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-13 15:54:54 +00:00
ebanks	7e7da75d27	Moving over to 4.0 and away from VCFRecord git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3778 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-13 14:07:10 +00:00
weisburd	9ec393bfce	Updated md5 - vcf header line change git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3714 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-02 21:02:09 +00:00
weisburd	e15fe6858e	Disabling test - Will need to update big-tables soon.. will re-enable after updating md5 git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3637 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-25 15:43:41 +00:00
ebanks	1e06d2bf68	Initial HLA Caller integration tests. Kind of painful, but will improve with code refactoring. This baby is now officially ours. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3593 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-18 20:35:27 +00:00
aaron	c3434493b0	fixed integration test for VCF Header changes. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3589 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-18 16:31:48 +00:00
aaron	42e7ff4f28	forgot to update a test, the md5sum of the underlying file changed (which is recorded in the ROD tests). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3586 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-18 13:27:56 +00:00
weisburd	e26a273ef5	Turned the test back on git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3582 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-17 22:57:42 +00:00
aaron	3d049204ed	some refactoring for the variant eval output system git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3576 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-17 05:34:31 +00:00
weisburd	5b370ffc62	git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3574 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-16 20:42:58 +00:00
ebanks	8c28be5933	Fixing a VCF bug for Sendu: we weren't emitting flags (booleans) correctly in VCF3.3 (rev'ed tribble for this). Updated dbsnp/hapmap membership info fields to be flags now instead of ints. While I was there, I added the change in the Annotator for Jan to force reads to be from a specific sample. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3536 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-11 16:42:06 +00:00
aaron	871cf0f4f6	Call out ROD types by there record type, instead of the codec type (which was clumsy). So instead of: @Requires(value={},referenceMetaData=@RMD(name="eval",type= VCFCodec.class)) you'd say: @Requires(value={},referenceMetaData=@RMD(name="eval",type= VCFRecord.class)) Which is more in-line with what was done before. All instances in the existing codebase should be switched over. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3457 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-28 14:52:44 +00:00
aaron	a2fab07258	fixed the build problem: there were two copies of the AnnotatorInputTable Codec and Feature in two different spots. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3439 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-26 14:47:15 +00:00
ebanks	0607f76a15	commenting out this test until I can figure out what the hell is going on with the codecs. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3436 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-26 01:12:10 +00:00
ebanks	572b383fe2	Make VA annotate dbsnp again git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3345 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-11 14:06:53 +00:00
aaron	a68f3b2e9c	VCF moved over to tribble. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3302 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-05 17:28:48 +00:00
aaron	ad11201235	adding more ROD pile-up tests git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3301 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-05 16:01:11 +00:00
aaron	cbed0b1ade	Adding GeliText tribble track as the first enabled Tribble track. This mean 'Variants' is no longer valid for a ROD type, use GeliText instead. I've updated all the references in the codebase. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3271 348d0f76-0448-11de-a6fe-93d51630548a	2010-04-29 22:50:17 +00:00
aaron	7fbfd34315	adding the GELI ROD validation git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3270 348d0f76-0448-11de-a6fe-93d51630548a	2010-04-29 21:43:00 +00:00
ebanks	df31eeff9f	minor change git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3259 348d0f76-0448-11de-a6fe-93d51630548a	2010-04-26 06:05:29 +00:00
ebanks	e702bea99f	Moving VE2 to core; calling it "VariantEval" (one more checkin coming) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3179 348d0f76-0448-11de-a6fe-93d51630548a	2010-04-15 20:25:47 +00:00
weisburd	b930dc52a5	Integration test for GenomicAnnotator git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3167 348d0f76-0448-11de-a6fe-93d51630548a	2010-04-14 14:43:25 +00:00
aaron	4014a8a674	A long overdue correction; all unit tests now end in 'UnitTest'. This was something we wanted to do for a while, and now with the performance tests coming, it was a good time to clean-up. Please label any new test appropriately: UnitTest and IntegrationTest are the two valid file name patterns for tests. Thanks! git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3135 348d0f76-0448-11de-a6fe-93d51630548a	2010-04-08 06:14:15 +00:00
aaron	8fd59c8823	Modified the report system based on Ryan's feedback: tables are now created independently to avoid the permutation problem when they were all compressed in rows, and removed our dependency on FreeMarker. The Grep format stays the same. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3130 348d0f76-0448-11de-a6fe-93d51630548a	2010-04-07 20:39:55 +00:00
ebanks	73a14a985b	Moving VariantsToVCF to core. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3078 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-26 18:55:12 +00:00
ebanks	14bf6923a8	HapMap-to-VCF now works fine within Variants-to-VCF. Added integration test for it and removed old code. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3077 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-26 18:34:59 +00:00
ebanks	4398a8b370	Updated. Now uses VariantContext and is truly "variants" to vcf (i.e. not just GELI to vcf). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3074 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-25 04:53:31 +00:00
aaron	439c34ed38	clean-up before annotating VariantEval2 for output. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3055 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-22 07:39:20 +00:00
aaron	8a5f0b746e	some cleanup for the output system. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3032 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-18 12:54:39 +00:00
ebanks	4340601c26	-Pushed base quals back down into SAMRecord; if -OQ is used, the SAMRecord quals get updated automatically -Better integration test git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3020 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-17 16:00:10 +00:00
aaron	10e76abbbc	adding some VE2 report infrastructure; work-in-progress. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3008 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-16 03:57:42 +00:00

1 2 3

119 Commits (a10b2a00a55ec85d3d4d64249aed060abb9367d4)