gatk-3.8

Commit Graph

Author	SHA1	Message	Date
depristo	20db00a3e8	Lazy reference loading; the engine doesn't fetch the reference bases until you actually call ref.getBases(). With the new hidden --dontUpdateUG to table recalibrator this is 2-3x faster than before. Enabled for locus, read, and rod walkers. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4042 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-16 13:46:22 +00:00
aaron	14e492fa80	fix for a problem in readNextRecord() of BFS, where we'd go looking for the next record far into in the next contig because (f.getEnd() >= start) was never true once we cycled to a new conitg. Added a check for contig identity. Also, removed duplicate HW calculation classes in the GATK and Tribble. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4011 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-11 17:01:38 +00:00
ebanks	0eeb659aa3	Useful utility function to print out the Allele as a String since toString prints out * for refs. It was annoying to keep seeing new String(Allele.getBases()). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3989 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-09 20:35:56 +00:00
aaron	c68625f055	Fixes from Mark for the MutableContexts; this fixes the clearGenotypes() and the clearFilters() methods, and adds a method to clear the attributes. Also added is a method for creating a variant context where the attribute list is pruned to a specific subset, which can be null. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3955 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-05 22:39:51 +00:00
aaron	72ae81c6de	VariantContext has now moved over to Tribble, and the VCF4 parser is now the only VCF parser in town. Other changes include: - Tribble is included directly in the GATK repo; those who have access to commit to Tribble can now directly commit from the GATK directory from Intellij; command line users can commit from inside the tribble directory. - Hapmap ROD now in Tribble; all mentions have been switched over. - VariantContext does not know about GenomeLoc; use VariantContextUtils.getLocation(VariantContext vc) to get a genome loc. - VariantContext.getSNPSubstitutionType is now in VariantContextUtils. - This does not include the checked-in project files for Intellij; still running into issues with changes to the iml files being marked as changes by SVN I'll send out an email to GSAMembers with some more details. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3954 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-05 18:47:53 +00:00
delangel	86211b74e8	Bug fix: when padding alleles in creating a Variant context from an indel, leave no-call alleles as no-call alleles. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3940 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-04 19:51:10 +00:00
depristo	c203e0fb02	Added JEXL support for hetCount, homRefCount, and homVarCount in VCs. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3914 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-02 12:24:11 +00:00
depristo	ac8048f17b	Support for automated selects for tranches in variant eval -- use -tf to make tranch-specific ve outputs. ApplyVariantCuts with tranche reading functions for general use, along with todo for ryan. CombineVariants now has --filteredAreUncalled and will treat filtered snps in input VCFs are uncalled, and so won't emit -filteredInOther set features git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3908 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-30 14:16:43 +00:00
depristo	19ad44d332	Minor improvements to CombineVariants to handle the complex case from Chris. IntegrationTest of complex case. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3876 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-25 13:46:11 +00:00
depristo	b0b37c3476	No handles (I believe) reference only VCs correctly git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3871 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-23 23:09:23 +00:00
depristo	e21376219d	Updates to CombineVariants for Tim. -setKey can be null. Integrationtests for -setKey foo and -setKey null. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3870 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-23 22:35:52 +00:00
delangel	4fc1db7aaf	Change interface to VCFWriter add() method to take only 1 byte from reference (since that's the only thing it needs), to prevent bugs like having people call it with ref.addBases() which is wrong (since it provides bases starting from the left of reference context window). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3868 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-23 20:24:03 +00:00
depristo	536399eaa0	Improvements to variant combine. Now calculates AC/AN/AF correctly by calling into the VariantAnnotator engine. Automatically removes annotations that are inconsistent across incoming VCs (in simpleMerge). TODO bug fix for Guillermo/Eric. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3858 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-23 13:33:11 +00:00
delangel	473ec91633	a) Bug fix in VCFHeader parsing - Info fields were not being parsed properly, with the result that the Count field was not being properly displayed in records (e.g. if Count=0 for a particular field, the INFO tag was still being displayed as ...;Field=x;... instead of ...;Field;... b) Bug fixes and update to how we represent indels and other complex events in a VariantContext object. Convention is now that all events are left aligned, with the first variant context location marking the common base before an event occurs. However, alleles in a VC don't have the common base in all VC's. Two new functions are now part of VariantContextUtils: CreateVariantContextWithPaddedAlleles and CreateVariantContextWithTrimmedAlleles. Both take a VC as an input and create a VC as an output. Main flow is that a VCF reader would create a VC with trimmed alleles, all walkers would ideally work with these trimmed alleles, and then the VCF writer would pad back the alleles before writing. However, there are special cases where we need to pad alleles like for example when merging/combining VC's. Pending issues: - PED and DBSNP RODs have to be updated to create VC's for indels following the convention above. Changes will go in after Tribble location is moved and things are tested. - Need to verify Indel genotyper and other modules that create VC's with indels.- Wiki page describing convention above and how walkers should interpret indel VC's still needs updating/detailing. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3850 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-22 02:36:45 +00:00
ebanks	c6ad26e04f	1) When quals/GQs are really integers (x.00), strip off the floating points. 2) Keep track of whether vcf records are unfiltered vs. pass filters in the variant context so we can regenerate the records on output. 3) No more "ID" hard-coded all over the code to set the VariantContext ID. Use a static variable instead. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3840 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-20 18:01:45 +00:00
ebanks	0db7fab1a9	Fixing genotype filtering for VF and adding integration tests git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3839 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-20 07:30:21 +00:00
ebanks	f742980864	1. Refactoring of GenoypeWriters so that parallelization now works again with VCF4.0. We now have just a single reference to the old VCF classes, and that one will be purged soon. 2. Moved Jared's VCFTool code into archive so that everything would compile. 3. Added the vcf reference base (needed for indels) as an attribute to the VariantContext from the reader. 4. TribbleRMDTrackBuilderUnitTest was complaining that a validation file didn'r exist, so I commented it out. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3835 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-20 06:16:45 +00:00
depristo	7c42e6994f	FindBugs fixes throughout the code base git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3823 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-18 16:29:59 +00:00
hanna	9207c58b8f	A fix for the integration test I broke on Friday on my way out the door -- some workflows using AlignmentContext were working with it in a way I didn't expect and wound up treating extended pileups as base pileups. I'll work to make sure the AlignmentContext interface is crystal clear. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3815 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-17 22:22:44 +00:00
hanna	96034aee0e	Cleanup for Steve Hershman's issue. In the midst of doing this, I discovered that the semantics for which reads are in an extended event pileup are not clear at this point. Eric and I have planned a future clarification for this and the two of us will discuss who will implement this clarification and when it'll happen. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3809 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-16 18:57:58 +00:00
ebanks	0226412b11	Add GQ to list of genotype attributes for reg exp git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3791 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-14 19:01:11 +00:00
ebanks	6b5c88d4d6	The GATK no longer writes vcf3.3; welcome to the world of vcf4.0. Needed to fix a few output bugs to get this to work, but it's looking great. Much more still to come. Guillermo: hopefully this doesn't break your local build too badly. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3786 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-14 04:56:58 +00:00
ebanks	e50627a49e	1. Updated tests and added integration test for liftover code. 2. Updated liftover code (and scripts) to emit vcf 4.0 and no longer depend on VCFRecord. 3. Beagle walker now also emits vcf 4.0. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3767 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-12 17:58:18 +00:00
ebanks	0c4a32843c	No longer uses VCFRecord git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3763 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-12 13:57:39 +00:00
ebanks	fb717fe128	First pass needed to remove old VCF code: moving all VCF-related constants into a single unified class git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3759 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-11 07:19:16 +00:00
ebanks	6b960bd9c5	Fix for Steve: genotype filters still want to see the values from the VC git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3758 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-11 04:30:15 +00:00
depristo	c3c66e853c	Improvements for Jason git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3756 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-09 20:18:37 +00:00
depristo	760aaeda88	Update to CombineVariants. Now splits merge options into variant and genotype options separately. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3746 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-08 20:09:48 +00:00
ebanks	1c146aebe8	Fix logic bug git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3734 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-08 04:32:46 +00:00
hanna	773a72e6ea	An initial fix for performance issues when filtering UG with new StratifiedAlignmentContext. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3724 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-07 01:07:46 +00:00
aaron	86031f4034	part two: todo's in combine variants, fixes for InferredGeneticContext, and some other tests and clean-up. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3721 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-05 21:07:53 +00:00
ebanks	e7220bc885	Variant Context simple merging routine should keep ID if one of the VCs has it git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3718 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-05 01:10:15 +00:00
depristo	cd2e4b0a1e	merging now very close to working. Bug todo in writer and vcf infrastructure. Can almost create merged snp and indel files git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3712 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-02 20:09:25 +00:00
depristo	61e2b2e39b	Nearly finalize merging capabilities for CombineVariants. Support for dealing with inconsistent indel alleles at loci. Improvements to Allele and removal of addAllele to MutableGenotype. We are close to being able to merge all of 1000 genomes -- snps and indels -- into a single combined vcf git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3710 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-02 13:32:33 +00:00
hanna	c9d5345150	Redo StratifiedAlignmentContext to use ReadBackedPileup's stratification options. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3699 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-01 02:46:05 +00:00
depristo	5f2b2d860e	Final stage of renaming git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3696 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-30 21:39:07 +00:00
depristo	b8d6a95e7a	Preliminary commit of new VCFCombine, soon to be called CombineVariants (next commit) that support merging any number of VCF files via a general VC merge routine that support prioritization and merging of samples! It's now possible to merge the pilot1/2/3 call sets into a single (monster) VCF taking genotypes from pilot2, then pilot3, then pilot1 as needed. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3690 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-30 20:13:03 +00:00
ebanks	464ac63a22	Allowing N's in ALT field git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3650 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-28 11:41:32 +00:00
ebanks	f0fc34bb8e	Bug fix: N's are allowed in the ref so don't fail when e.g. dbsnp has an N! git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3620 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-23 17:49:14 +00:00
aaron	a6d3e4bd47	Add code to allow reference alleles with 'N' in VariantContext, but not in the alternate allele(s). Also more updates to the VCF 4 code (fixed parsing for files without genotypes). This check-in will temperarly break the build (I need to see if Bamboo is correctly returning the log file for the failed builds). Will be fixed once Bamboo starts building. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3609 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-22 18:26:37 +00:00
hanna	f18ac069e2	A refactoring / unification of ReadBackedPileup and ReadBackedExtendedEventPileup. Provides a cleaner interface with extended events inheriting all of the basic RBP functionality. Implementation is still slightly messy, but should allow users to provide separate implementations of methods for sample split pileups and unsplit pileups for efficiency's sake. Methods not covered by unit/integration tests have not been sufficiently tested yet. Unit tests will follow this week. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3597 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-20 04:42:26 +00:00
hanna	c3b68cc58d	Rethinking DownsamplingLocusIteratorByState with a flattened read structure. Samples are kept independent while processing, and only merged back in a priority queue if necessary in a special variant of the ReadBackedPileup. This code is not live yet except in the case of naive deduping. Downsampling by sample temporarily disabled, and the ReadBackedPileup variant is sketchy and not well integrated with StratifiedAlignmentContext or the walkers. Cleanup to follow. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3540 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-13 01:47:02 +00:00
depristo	6eeb1693ca	JEXL2 upgrade. Improvements to JEXL processing including dynamically resolving variable -> value bindings instead of up front adding them to a map. Performance improvements and code cleanup throughout. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3494 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-07 00:33:02 +00:00
depristo	3ea506fe52	No more new Allele() -- must use create. Allelel simple alleles are now cached for efficiency reasons. VCF4 codec optimizations -- 4x performance in general. Now working in general but hooked up to the ROD system now as VCF4. WARNING -- does not actually work with indels, genotype filters, etc. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3489 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-04 23:03:55 +00:00
depristo	b811e61ae1	Optimized, nearly complete VCF4 reader 2-4x faster than the previous implementation, along with a VCF4 reader performance testing walker that can read 3/4 files, useful for benchmarking git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3487 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-04 18:11:38 +00:00
ebanks	597b3744ab	Always use phasing info when converting genotypes to strings git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3482 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-03 17:50:50 +00:00
chartl	88cb93cc3c	Changes to Depth of Coverage (added maximum base and mapping quality flags; with new integration tests -- because they use b36, and the other test uses hg18, it's in a different class (integration test system can't change refs on the fly). Initial change to VariantAnnotator to allow it to see extended event pilups; you currently have to throw the -dels flag; and it's specified as "very experimental". Yet,all the integration tests pass. Homopolymer Run now does the "right" thing (e.g. single bases are represented as HRun = 0 rather than HRun = 1) for indels. AlleleBalance now does something close enough to correct. Added a convenience method to VariantContext that will return the indel length (or lengths if a site is not biallelic). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3409 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-21 13:02:01 +00:00
depristo	a10fca0d5c	Genotyper now is using bytes not chars. Passes all tests. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3406 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-20 21:02:44 +00:00
depristo	6ce3835622	Removing unused methods in QualityUtils; ReferenceContext now converting all bases to upper case, but can be disabled with static boolean git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3399 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-20 12:38:06 +00:00
depristo	5abac5c057	A few more char -> byte cleanups git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3398 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-20 00:02:06 +00:00

1 2

99 Commits (44f3c5639ad4152e7bea9573dbe525b0069bc494)