b) Updated VCF4WriterTestWalker to take either VCF3 or VCF4 as inputs (this walker can also be used to convert from 3.3 to 4.0).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3711 348d0f76-0448-11de-a6fe-93d51630548a
Also putting more tolerance into the timing on the tibble index tests (that check to make sure we're deleting out of date indexes, and not deleting perfectly good indexes). It seems that some of the farm nodes aren't great with a stopwatch.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3674 348d0f76-0448-11de-a6fe-93d51630548a
Also, moved the flag indicating VCF4.0 to the VCFWriter constructor.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3669 348d0f76-0448-11de-a6fe-93d51630548a
See VCF4WriterTestWalker for usage example: it just amounts to adding
vcfWriter.add(vc,ref.getBases()) in walker.
add() method in VCFWriter is polymorphic and can also take a VCFRecord, lthough eventually this should be obsolete.
addRecord is still supported so all backward compatibility is maintained.
Resulting VCF4.0 are still not perfect, so additional changes are in progress. Specifically:
a) INFO codes of length 0 (e.g. HM, DB) are not emitted correctly (they should emit just "HM" but now they emit "HM=1").
b) Genotype values that are specified as Integer in header are ignored in type and are printed out as Doubles.
Both issues should be corrected with better header parsing.
2) Check in ability of Beagle to mask an additional percentage of genotype likelihoods (0 by default), for testing purposes.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3664 348d0f76-0448-11de-a6fe-93d51630548a
1) Some improvements to the VCF4 parsing, including disabling validation.
2) Reimplemented RefSeq in the new Tribble-style rod system.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3630 348d0f76-0448-11de-a6fe-93d51630548a
This check-in will temperarly break the build (I need to see if Bamboo is correctly returning the log file for the failed builds).
Will be fixed once Bamboo starts building.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3609 348d0f76-0448-11de-a6fe-93d51630548a
Provides a cleaner interface with extended events inheriting all of the basic RBP
functionality. Implementation is still slightly messy, but should allow users to
provide separate implementations of methods for sample split pileups and unsplit
pileups for efficiency's sake.
Methods not covered by unit/integration tests have not been sufficiently tested yet.
Unit tests will follow this week.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3597 348d0f76-0448-11de-a6fe-93d51630548a
1. VA is now a ROD walker so it no longer requires reads (needs a little more testing)
2. Annotations can now represent multiple INFO fields (i.e. sets of key/value pairs)
3. The chromosome count annotations have been pulled out of UG and the VCF writer code and into VA where they belong. Fixed the headers too.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3513 348d0f76-0448-11de-a6fe-93d51630548a
+ Identifies opposite homozygote sites
+ Identifies the parent from whom it is expected that a null allele was inherited (or whether it was a putative genotype error; e.g. mom=homref, dad=homref, child=homvar)
+ Labels each opposite homozygote with its homozygous region in the child (e.g. region 1, region 2)
+ Labels each opposite homozygote with the size of the homozygous region in which it was found, the number of child homozygotes in the region, and the number of opposite homozygote violations within that region
To come:
+ Classification of sites as likely tri-allelic
Note that this is very experimental
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3498 348d0f76-0448-11de-a6fe-93d51630548a
More doc/info to follow shortly. Issues still to be solved:
a) Walker changes all genotypes based on Beagle data, but annotations on the original VCF are unchanged. They should in theory be recomputed based on new genotypes.
b) Current implementation is ugly, dirty unwieldy and will necessitate a refactoring soon so I can keep my pride. Most aesthetically affronting issue right now is that we read the full Beagle files at initialization and keep them in memory, but a more delicate implementation would just read from files on a marker by marker basis. Issue that currently prevents this is that BufferedReader() instances don't seem to play nice when called from the map() function.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3488 348d0f76-0448-11de-a6fe-93d51630548a
priors were flat, but fixed nonetheless.
Also, needed to update Tribble.
Minor updates to the Beagle input maker.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3461 348d0f76-0448-11de-a6fe-93d51630548a
@Requires(value={},referenceMetaData=@RMD(name="eval",type= VCFCodec.class))
you'd say:
@Requires(value={},referenceMetaData=@RMD(name="eval",type= VCFRecord.class))
Which is more in-line with what was done before. All instances in the existing codebase should be switched over.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3457 348d0f76-0448-11de-a6fe-93d51630548a
- For speedup in large number of samples, base counts are done on a per read group level, then
merged into counts on larger partitions (samples, libraries, etc)
+ passed all integration tests before next item
- Added additional summary item, a coverage threshold. Set by (possibly multiple) -ct flags,
the summary outputs will have columns for "%_bases_covered_to_X"; both per sample, and
per sample per interval summary files are effected (thus md5s changed for these)
NOTE:
This is the last revision that will include the per-gene summary files. Once DesignFileGenerator is sufficiently general, and has integration tests, it will be moved to core and the per-gene summary from Depth of Coverage will be retired.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3437 348d0f76-0448-11de-a6fe-93d51630548a
passed in as a Tribble SAM text feature. If the generated pileup contains a valid set of reads according to
the downsampling rules, the test passes.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3421 348d0f76-0448-11de-a6fe-93d51630548a