Commit Graph

14 Commits (6b960bd9c5fb54c6b618fcd61edab6378824b634)

Author SHA1 Message Date
depristo 45fb614296 Fixes to VE for obscure bug, as well as disabled integration test for CombineVariants
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3749 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-09 00:13:07 +00:00
depristo b934cc7554 Updates to fix some bugs in merger. Now able to merge into project wide indel VCF files. Integration teests coming tomorrow
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3727 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-07 03:16:33 +00:00
aaron 86031f4034 part two: todo's in combine variants, fixes for InferredGeneticContext, and some other tests and clean-up.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3721 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-05 21:07:53 +00:00
aaron 3347d1ca7c part one of combining format and info header lines code into a single abstract class for Mark; plus some 'm' removals from access methods for Eric. Adding fixes for CombineVariants next.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3719 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-05 05:57:58 +00:00
depristo cd2e4b0a1e merging now very close to working. Bug todo in writer and vcf infrastructure. Can almost create merged snp and indel files
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3712 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-02 20:09:25 +00:00
depristo 61e2b2e39b Nearly finalize merging capabilities for CombineVariants. Support for dealing with inconsistent indel alleles at loci. Improvements to Allele and removal of addAllele to MutableGenotype. We are close to being able to merge all of 1000 genomes -- snps and indels -- into a single combined vcf
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3710 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-02 13:32:33 +00:00
aaron f967cae1aa tiny comment change
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3708 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-01 22:04:25 +00:00
aaron 3093a20a55 fixing VCF header format and info fields so that they propery emit the unbounded count value correctly for vcf4 or vcf3. Eric we should update the vcf4 spec page to indicate format fields are allowed to use the unbounded count as well (if this is true).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3707 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-01 22:02:16 +00:00
aaron 43ca595d15 VCF headers now can be set to a particular VCF version after creation, which converts the header lines to the appropriate encoding on output. Plus some clean-up of the code.
Also commented out the Tribble index out-of-date tests, the timing seems to be troublesome from the farm.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3702 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-01 05:32:14 +00:00
depristo b8d6a95e7a Preliminary commit of new VCFCombine, soon to be called CombineVariants (next commit) that support merging any number of VCF files via a general VC merge routine that support prioritization and merging of samples! It's now possible to merge the pilot1/2/3 call sets into a single (monster) VCF taking genotypes from pilot2, then pilot3, then pilot1 as needed.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3690 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-30 20:13:03 +00:00
delangel d932322190 More necessary fixes for VCF4.0 - now results look more sensible in realistic, bigger VCF files produced by say Dindel and not just the small test VCF:
- Fixed and cleaned code to produce trailing and padding bases in alleles around indels.
- Deal better with missing fields.
Pending:
- Chopping missing fields at end of genotypes.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3679 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-30 02:59:30 +00:00
delangel 3ca2b7374b Fixes to better deal with the "Type" and "Number" field in the INFO and FORMAT header lines in VCF4.0. We now record these fields and provide appropriate conversions. This is the first version that passes fully the VCF validator.
Also, moved the flag indicating VCF4.0 to the VCFWriter constructor.

 


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3669 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-29 16:43:00 +00:00
delangel ed71e53dd4 1) Initial complete version of VCF4 writer. There are still issues (see below) but at least this version is fully functional. It incorporates getting rid of intermediate VCFRecord so we now operate from VariantContext objects directly to VCF 4.0 output.
See VCF4WriterTestWalker for usage example: it just amounts to adding
vcfWriter.add(vc,ref.getBases()) in walker.

add() method in VCFWriter is polymorphic and can also take a VCFRecord, lthough eventually this should be obsolete.
addRecord is still supported so all backward compatibility is maintained.

Resulting VCF4.0 are still not perfect, so additional changes are in progress. Specifically:
a) INFO codes of length 0 (e.g. HM, DB) are not emitted correctly (they should emit just "HM" but now they emit "HM=1").
b) Genotype values that are specified as Integer in header are ignored in type and are printed out as Doubles.

Both issues should be corrected with better header parsing.

2) Check in ability of Beagle to mask an additional percentage of genotype likelihoods (0 by default), for testing purposes.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3664 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-28 23:54:38 +00:00
aaron d3848745ab moving VCF 3.3 back into the GATK so Guillermo can make changes for VCF 4 output
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3639 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-25 18:20:06 +00:00