Commit Graph

3163 Commits (80a5ddfa2f0d6f76ee8dde03c64141552eca1ff0)

Author SHA1 Message Date
ebanks 52c534a8f2 Updating to VCF 4.0
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3770 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-12 20:18:30 +00:00
delangel 5992b79159 a) Simplify normalization code in ProduceBeagleInputWalker, as to always normalize, and use MathUtils.normalizeFromLog10 to do this.
b) Several improvements to BeagleOutputToVCFWalker:
1. If a Hapmap input track is provided (e.g. -B comp,VCF,file), Hapmap sites will be annotated with Hapmap Allele count and allele frequency (key ACH, AFH).
2. If probability of correct genotype is lower than ncthr (optional argument provided by user, default = 0.0), walker will keep original calls instead of using Beagle calls.
3. Instead of annotating just whether Beagle had modified a site, annotate instead HOW MANY genotypes in a site were actually changed by Beagle.

All three improvements are mostly for debugging and analysis only.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3769 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-12 19:54:58 +00:00
ebanks e50627a49e 1. Updated tests and added integration test for liftover code.
2. Updated liftover code (and scripts) to emit vcf 4.0 and no longer depend on VCFRecord.
3. Beagle walker now also emits vcf 4.0.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3767 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-12 17:58:18 +00:00
ebanks 2a7112302a More archiving
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3766 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-12 17:04:41 +00:00
ebanks 221e01fb27 deleting/archiving as instructed
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3765 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-12 16:59:45 +00:00
ebanks 8086ab1f75 Pulled sample/header merging routines out of CombineVariants and into util classes. Added more generalized methods for retrieving samples. Updated the Beagle walkers to use these methods.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3764 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-12 16:51:54 +00:00
ebanks 0c4a32843c No longer uses VCFRecord
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3763 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-12 13:57:39 +00:00
ebanks f130d29318 No longer uses VCFRecord.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3762 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-12 13:34:10 +00:00
ebanks e75b3e13bd updating unit test for previous fix
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3761 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-12 03:23:53 +00:00
ebanks 0427f3554b Bug fix: valid fields were being stripped off the FORMAT for samples because String.match was used instead of String.equals. Also, please use VCFConstants from now on instead of hard-coding e.g. missing values into the code.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3760 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-12 03:06:51 +00:00
ebanks fb717fe128 First pass needed to remove old VCF code: moving all VCF-related constants into a single unified class
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3759 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-11 07:19:16 +00:00
ebanks 6b960bd9c5 Fix for Steve: genotype filters still want to see the values from the VC
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3758 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-11 04:30:15 +00:00
depristo c3c66e853c Improvements for Jason
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3756 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-09 20:18:37 +00:00
ebanks 405be230d0 Various code improvements based on FindBugs
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3755 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-09 15:04:48 +00:00
ebanks abaec13e38 Bug fix: if there are samples in the VCF but all of them are no-calls, we still need to emit GT for the FORMAT field to be on spec. Note that this is a holdover from 3.3 writing but can't easily be fixed there. Fortunately, that code is all going away soon...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3754 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-09 14:08:25 +00:00
chartl ea8fd506bf Update to PickSequenomProbes: Option to ignore mask sites within X bp of a variant (very useful for indels where dbSNP entries near the indel are almost always false SNP calls). Also fixed an integration test where the variant site itself, being in dbSNP, was represented as [N/C] rather than [A/C]. Added integration test for 1bp no-mask window.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3753 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-09 04:03:19 +00:00
depristo 179067e3f4 Support for . values in qual field
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3752 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-09 01:47:02 +00:00
depristo 45fb614296 Fixes to VE for obscure bug, as well as disabled integration test for CombineVariants
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3749 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-09 00:13:07 +00:00
rpoplin 67f1589652 --fdr_filter_level isn't mandatory
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3748 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-08 22:48:30 +00:00
rpoplin 5d39cd5db8 Added --fdr_filter_level to ApplyVariantCuts so that you can create beautiful tranche plots and also decide which tranche level to filter at. The previous version always filtered at the smallest tranche. The tranche filter names are appropriately added to the VCF header.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3747 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-08 22:44:10 +00:00
depristo 760aaeda88 Update to CombineVariants. Now splits merge options into variant and genotype options separately.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3746 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-08 20:09:48 +00:00
ebanks bd2ba3eb37 deal with very large known indels that fall off our ref context
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3745 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-08 20:05:16 +00:00
aaron 12fecc8d8f remove the picard DbSNP ROD.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3743 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-08 17:46:00 +00:00
depristo 56a0c7ee6f All headers are now converted to VCF4 by default.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3741 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-08 14:14:17 +00:00
ebanks 6e6ad36523 reallow MNP events through
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3740 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-08 06:26:52 +00:00
ebanks ed0d0d78fa corresponding fix for dealing with insertions
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3739 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-08 05:25:03 +00:00
ebanks ada8c9931f We were never clipping the VCF-provided ref base off the left end of the alleles for insertions, so the reference allele was never null (and downstream walkers would fail). Didn't this get tested with insertions at some point?
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3738 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-08 05:24:27 +00:00
ebanks 9a81f1d7ef Fixed this tool for chartl so that it now properly handles deletions. Added deletion case to integration tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3737 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-08 04:45:59 +00:00
ebanks 47a42b1507 trivial cleanup
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3736 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-08 04:42:32 +00:00
ebanks b7a3d1e61f Bug fix: if the FORMAT field consisted of just GT, we were exceptioning out. How did we not catch this until now?
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3735 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-08 04:41:40 +00:00
ebanks 1c146aebe8 Fix logic bug
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3734 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-08 04:32:46 +00:00
hanna 9fc05ac2ae eagerDecode is now false.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3733 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-07 22:51:48 +00:00
ebanks 4bc3ad2194 Shame on me: UG was emitting negative QUALs (-0) in all_bases mode. Thanks, Matt.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3732 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-07 20:30:22 +00:00
ebanks 30714ec8d9 As per quick chat with Richard Durban, don't increase the mapping quality of realigned reads too much; for now, arbitrarily increase the MQ by 10. We need to figure out a better solution.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3731 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-07 20:12:59 +00:00
ebanks 8ff1a4b929 Don't try to clean reads that fail the PF, in preparation for Ryan
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3730 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-07 19:49:36 +00:00
depristo b934cc7554 Updates to fix some bugs in merger. Now able to merge into project wide indel VCF files. Integration teests coming tomorrow
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3727 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-07 03:16:33 +00:00
kshakir 7be8c35eb2 Workaround for scala trait erasing parameterized types:
- Requiring explicit @ClassType on parameterized fields in traits.
- Scatter / Gather functions are now abstract classes since @ClassType can't be used on parameterized fields with type parameters.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3726 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-07 03:15:10 +00:00
hanna 120f90da5b Interval support for ref walkers while streaming.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3725 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-07 03:14:59 +00:00
hanna 773a72e6ea An initial fix for performance issues when filtering UG with new StratifiedAlignmentContext.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3724 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-07 01:07:46 +00:00
delangel be75b087ec a) Add input argument (-ncrate) to BeagleOutputToVCFWalker. If the genotype posterior error probability is higher than this threshold, we declare No-call at this genotype.
b) Add "OG" annotation to genotypes. If Beagle changes genotypes, this annotation gets the original genotype call, to ease performance  comparisons. If not, this annotation gets an empty value.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3723 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-06 18:33:28 +00:00
hanna 4213e05aeb Fix for sharding ref walkers via monolithic sharding. Introduces the potential bug (for
monolithic sharding only) that when traversing by read, map() function will not be called for loci
off the end of the reference.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3722 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-06 04:34:38 +00:00
aaron 86031f4034 part two: todo's in combine variants, fixes for InferredGeneticContext, and some other tests and clean-up.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3721 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-05 21:07:53 +00:00
ebanks 36edc60ccc Connected UG to the new comp track annotation system in VA. Also, when emit confidence is lower than call confidence (so that we emit records filtered with LowQual), add a corresponding FILTER header field to the VCF so that the validator doesn't complain.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3720 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-05 13:04:24 +00:00
aaron 3347d1ca7c part one of combining format and info header lines code into a single abstract class for Mark; plus some 'm' removals from access methods for Eric. Adding fixes for CombineVariants next.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3719 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-05 05:57:58 +00:00
ebanks e7220bc885 Variant Context simple merging routine should keep ID if one of the VCs has it
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3718 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-05 01:10:15 +00:00
delangel 3016e1cf80 Fixes to increase robustness in vcf4 writer. We assume that only at most 1 base was clipped from beginning of allele encoding by reader, and improve the way we find if bases were clipped. We still cant deal with some corner cases, and duplicate records may follow, for example if a snp location is followed at the next base by an indel. Also, if we are reading form a 3.3 vcf and the reference is null (ie we have an insertion), the reference base is not computed correctly.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3717 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-04 20:22:04 +00:00
ebanks 07945040f8 Set VariantFiltration's JEXL engine to silent for warning messages
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3716 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-04 18:11:19 +00:00
ebanks be8740b00d Another edge case in left alignment for indels: deal with cases when insertions are ambiguously placed at ends of reads
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3715 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-04 17:26:38 +00:00
weisburd 9ec393bfce Updated md5 - vcf header line change
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3714 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-02 21:02:09 +00:00
weisburd f7593435eb Implemented decodeLoc(..)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3713 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-02 21:01:36 +00:00