ebanks
5a1a3fc79a
Fix bad VariantContext creation in unit test
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3824 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-18 20:21:01 +00:00
depristo
7c42e6994f
FindBugs fixes throughout the code base
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3823 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-18 16:29:59 +00:00
ebanks
693672a461
Refactoring the VCF writer code; now no longer uses VCFRecord or any of its related classes, instead writing directly to the writer. Integration tests pass, but some are actually broken and will be fixed this week.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3822 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-18 13:19:56 +00:00
ebanks
379584f1bf
Re-enable (most of) these tests. Guillermo will re-enable the other one when the VCF->VC conversion is done for indels
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3821 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-18 03:24:28 +00:00
ebanks
982947d328
update to deal with partial indels (I/D with no bases) in the HM records
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3820 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-18 02:56:37 +00:00
depristo
414ec6f20a
Removing version argument constructors that shouldn't be used. Temporary allow -- with global variant to indicate this should be removed -- header records without description fields. Real error checking in the headers.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3818 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-17 22:30:08 +00:00
depristo
14b21e487b
always 4.0
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3817 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-17 22:28:48 +00:00
depristo
d40299840c
indenting clean up
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3816 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-17 22:28:28 +00:00
hanna
9207c58b8f
A fix for the integration test I broke on Friday on my way out the door --
...
some workflows using AlignmentContext were working with it in a way I didn't
expect and wound up treating extended pileups as base pileups. I'll work to
make sure the AlignmentContext interface is crystal clear.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3815 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-17 22:22:44 +00:00
delangel
55b756f1cc
First step in major cleanup/redo of VCF functionality. Specifically, now:
...
a) VCF track name can work again with 3.3 or 4.0 VCF's when specifying -B name,VCF,file. Code will read header and parse automatically the version.
b) Old VCF codec is deprecated. Reader goes now direct from parsing VCF lines into producing VariantContext objects, with no intermediate VCF records. If anyone can't resist the urge to still input files using the old method, a new VCF3Codec is in place with the old code, but it will be eventually deleted.
c) VCF headers and VCF info fields no longer keep track of the version. They are parsed into an internal representation and will be output only in VCF4.0 format.
d) As a consequence, the existing GATK bug where files are produced with VCF4 body but VCF3.3 headers is solved.
e) Several VCF 4.0 writer bugs are now solved.
f) Integration test MD5's are changed, mostly because of corrected VCF4.0 headers and because validation data mostly uses now VCF4.0.
g) Several VCF files in the ValidationData/ directory have been converted to VCF 4.0 format. I kept the old versions, and the new versions have a .vcf4 extension.
Pending issues:
a) We are still not dealing with indels consistently or correctly when representing them. This will be a second part of the changes.
b) The VCF writer doesn't use VCFRecord but it does still use a lot of leftovers like VCFGenotypeEncoding, VCFGenotypeRecord, etc. This needs to be simplified and cleaned.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3813 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-16 22:49:16 +00:00
chartl
75bea4881a
Modified SampleFilter to allow for multiple samples to be given. AminoAcidTransition now turns on when you give VariantEval the right commands.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3812 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-16 21:27:32 +00:00
aaron
36ac73cf9a
comment out broken test until it can be fixed.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3810 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-16 20:04:40 +00:00
hanna
96034aee0e
Cleanup for Steve Hershman's issue. In the midst of doing this, I discovered
...
that the semantics for which reads are in an extended event pileup are not
clear at this point. Eric and I have planned a future clarification for this
and the two of us will discuss who will implement this clarification and when
it'll happen.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3809 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-16 18:57:58 +00:00
asivache
6aedede7f3
Added Type.MNP to allowed variant context types; this does not break the tests (yet)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3808 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-16 15:50:25 +00:00
asivache
1dd8a28a5d
Added new query: isMNP(feature); returns true if dbsnp feature is multi-nucleotide polymorfism (e.g. a di-nuc TA ->CC)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3806 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-16 15:32:10 +00:00
aaron
ec94cfdf05
remove unit test for VCF writer, it's not applicable now that we produce only VCF4. Guillermo, it's up to you if you want to adapt this or remove it.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3803 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-16 14:33:25 +00:00
depristo
b29eda83bb
Parallelized CountCovarites! percent_ref_called_var now a standard genotype concordance module (for validation!). Really much smarter merging of headers for combineVariants. VCF codecs now actually look at the file version and blow up if they are the wrong versions. setHeaderVersion() in VCFHeaderLine.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3802 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-16 14:10:18 +00:00
ebanks
f293eb7de1
Fix for Kim: for some ungodly reason, I was initializing the bins that were maintaining counts to 1 instead of 0.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3801 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-16 03:40:29 +00:00
ebanks
e7e58d7129
The SAM spec has now officially reserved my new tags for original cigar and original alignment start... except that OS has been named OP ('original POS')
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3800 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-16 00:09:36 +00:00
ebanks
ab84ed8c68
Fix for Mark: get rid of old program tags whose IDs clash with the recalibrator/realigner tag (including if the id has a .1 at the end, etc.). Keeping them around is dangerous because we don't know which one refers to the latest run of the tool on the bam.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3798 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-15 19:13:50 +00:00
hanna
dfddf8fd75
- Bring the PaperGenotyper up to code.
...
- Remove some old debugging cruft regarding handling of threaded engine exceptions.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3796 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-14 22:31:21 +00:00
bthomas
f65cba6b9a
Adding support for shared file locking via a new class for file locking, FSLockWithShared. This will eventually take over for FSLock, the current file locking class - I'll work with Aaron to merge the tribble code that uses FSLock right now.
...
FYI: creating an exclusive lock on a file that does not exist will create that file as an empty file, and will NOT delete that file after the program terminates. So watch out if it's possible that the file you're locking does not exist - could end up leaving extra files that confuse users.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3795 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-14 20:45:51 +00:00
hanna
a8caa20378
Previously the hierarchical microscheduler defensively coded around and reported exceptions of
...
the walker itself, but didn't do a great job of catching framework exceptions. This became extremely
unfortunate in the case where walkers caused exceptions that manifested themselves in the framework,
such as when the walker opens more files than file handles are available.
Reworked the exception handling so that framework errors are treated like walker errors and the resulting
exception bubbles out of the walker. Stack traces for threaded walkers are still convoluted and nasty.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3794 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-14 20:34:43 +00:00
ebanks
bf384f48e1
Reverting previous change because it won't always work. More investigation needed.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3793 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-14 19:13:17 +00:00
ebanks
e4bfb06888
Check header type instead of rod type, since rod type will now be VC and not VCF
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3792 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-14 19:10:09 +00:00
ebanks
0226412b11
Add GQ to list of genotype attributes for reg exp
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3791 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-14 19:01:11 +00:00
ebanks
78a4d8ec3d
Removing more references to VCFRecord
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3790 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-14 16:34:15 +00:00
ebanks
af23762778
Removing more references to VCFRecord
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3789 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-14 11:54:23 +00:00
ebanks
a4f8d70d8d
oops, forgot to update this integration test
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3788 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-14 11:38:33 +00:00
ebanks
460283f6d2
No more manually converting VariantContexts to VCFRecords. You should be utilizing VCs and not VCFRecords.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3787 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-14 05:21:28 +00:00
ebanks
6b5c88d4d6
The GATK no longer writes vcf3.3; welcome to the world of vcf4.0. Needed to fix a few output bugs to get this to work, but it's looking great. Much more still to come. Guillermo: hopefully this doesn't break your local build too badly.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3786 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-14 04:56:58 +00:00
chartl
9d2a485532
Update to AminoAcidTransition eval module
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3783 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-13 17:12:03 +00:00
rpoplin
3db7fbb5e9
Fix for added EOF in csv file
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3781 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-13 16:09:48 +00:00
ebanks
9a05e8143d
Move to 4.0 and away from VCFRecord.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3780 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-13 15:54:54 +00:00
ebanks
6442dabf94
Deleting/archiving as instructed
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3779 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-13 15:23:50 +00:00
ebanks
7e7da75d27
Moving over to 4.0 and away from VCFRecord
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3778 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-13 14:07:10 +00:00
ebanks
d896d03554
Moving VF to vcf 4.0. Still need to fix genotype filters.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3777 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-13 11:39:51 +00:00
ebanks
76b3b39720
Technically, Mark broke this with his commit earlier. But since I had an outstanding broken test, I lose and have to fix this one too...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3776 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-13 03:58:38 +00:00
ebanks
1bef7dd170
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3775 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-13 00:56:12 +00:00
depristo
de969f7cc7
logger != null check
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3774 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-12 23:07:14 +00:00
depristo
2e445262f2
Promotion to . for variable numbers of arguments
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3773 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-12 22:53:53 +00:00
delangel
297f15a60c
Protect ProduceBeagleInputWalker against evil users who feed to it VCF's with indels, no variation sites or other interesting markers: Write to Beagle input only in biallelic SNP sites since that's the only thing Beagle can do.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3772 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-12 20:54:42 +00:00
ebanks
52c534a8f2
Updating to VCF 4.0
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3770 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-12 20:18:30 +00:00
delangel
5992b79159
a) Simplify normalization code in ProduceBeagleInputWalker, as to always normalize, and use MathUtils.normalizeFromLog10 to do this.
...
b) Several improvements to BeagleOutputToVCFWalker:
1. If a Hapmap input track is provided (e.g. -B comp,VCF,file), Hapmap sites will be annotated with Hapmap Allele count and allele frequency (key ACH, AFH).
2. If probability of correct genotype is lower than ncthr (optional argument provided by user, default = 0.0), walker will keep original calls instead of using Beagle calls.
3. Instead of annotating just whether Beagle had modified a site, annotate instead HOW MANY genotypes in a site were actually changed by Beagle.
All three improvements are mostly for debugging and analysis only.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3769 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-12 19:54:58 +00:00
ebanks
e50627a49e
1. Updated tests and added integration test for liftover code.
...
2. Updated liftover code (and scripts) to emit vcf 4.0 and no longer depend on VCFRecord.
3. Beagle walker now also emits vcf 4.0.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3767 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-12 17:58:18 +00:00
ebanks
2a7112302a
More archiving
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3766 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-12 17:04:41 +00:00
ebanks
221e01fb27
deleting/archiving as instructed
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3765 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-12 16:59:45 +00:00
ebanks
8086ab1f75
Pulled sample/header merging routines out of CombineVariants and into util classes. Added more generalized methods for retrieving samples. Updated the Beagle walkers to use these methods.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3764 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-12 16:51:54 +00:00
ebanks
0c4a32843c
No longer uses VCFRecord
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3763 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-12 13:57:39 +00:00
ebanks
f130d29318
No longer uses VCFRecord.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3762 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-12 13:34:10 +00:00
ebanks
e75b3e13bd
updating unit test for previous fix
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3761 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-12 03:23:53 +00:00
ebanks
0427f3554b
Bug fix: valid fields were being stripped off the FORMAT for samples because String.match was used instead of String.equals. Also, please use VCFConstants from now on instead of hard-coding e.g. missing values into the code.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3760 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-12 03:06:51 +00:00
ebanks
fb717fe128
First pass needed to remove old VCF code: moving all VCF-related constants into a single unified class
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3759 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-11 07:19:16 +00:00
ebanks
6b960bd9c5
Fix for Steve: genotype filters still want to see the values from the VC
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3758 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-11 04:30:15 +00:00
depristo
c3c66e853c
Improvements for Jason
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3756 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-09 20:18:37 +00:00
ebanks
405be230d0
Various code improvements based on FindBugs
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3755 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-09 15:04:48 +00:00
ebanks
abaec13e38
Bug fix: if there are samples in the VCF but all of them are no-calls, we still need to emit GT for the FORMAT field to be on spec. Note that this is a holdover from 3.3 writing but can't easily be fixed there. Fortunately, that code is all going away soon...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3754 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-09 14:08:25 +00:00
chartl
ea8fd506bf
Update to PickSequenomProbes: Option to ignore mask sites within X bp of a variant (very useful for indels where dbSNP entries near the indel are almost always false SNP calls). Also fixed an integration test where the variant site itself, being in dbSNP, was represented as [N/C] rather than [A/C]. Added integration test for 1bp no-mask window.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3753 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-09 04:03:19 +00:00
depristo
179067e3f4
Support for . values in qual field
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3752 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-09 01:47:02 +00:00
depristo
45fb614296
Fixes to VE for obscure bug, as well as disabled integration test for CombineVariants
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3749 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-09 00:13:07 +00:00
rpoplin
67f1589652
--fdr_filter_level isn't mandatory
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3748 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-08 22:48:30 +00:00
rpoplin
5d39cd5db8
Added --fdr_filter_level to ApplyVariantCuts so that you can create beautiful tranche plots and also decide which tranche level to filter at. The previous version always filtered at the smallest tranche. The tranche filter names are appropriately added to the VCF header.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3747 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-08 22:44:10 +00:00
depristo
760aaeda88
Update to CombineVariants. Now splits merge options into variant and genotype options separately.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3746 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-08 20:09:48 +00:00
ebanks
bd2ba3eb37
deal with very large known indels that fall off our ref context
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3745 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-08 20:05:16 +00:00
aaron
12fecc8d8f
remove the picard DbSNP ROD.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3743 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-08 17:46:00 +00:00
depristo
56a0c7ee6f
All headers are now converted to VCF4 by default.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3741 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-08 14:14:17 +00:00
ebanks
6e6ad36523
reallow MNP events through
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3740 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-08 06:26:52 +00:00
ebanks
ed0d0d78fa
corresponding fix for dealing with insertions
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3739 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-08 05:25:03 +00:00
ebanks
ada8c9931f
We were never clipping the VCF-provided ref base off the left end of the alleles for insertions, so the reference allele was never null (and downstream walkers would fail). Didn't this get tested with insertions at some point?
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3738 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-08 05:24:27 +00:00
ebanks
9a81f1d7ef
Fixed this tool for chartl so that it now properly handles deletions. Added deletion case to integration tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3737 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-08 04:45:59 +00:00
ebanks
47a42b1507
trivial cleanup
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3736 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-08 04:42:32 +00:00
ebanks
b7a3d1e61f
Bug fix: if the FORMAT field consisted of just GT, we were exceptioning out. How did we not catch this until now?
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3735 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-08 04:41:40 +00:00
ebanks
1c146aebe8
Fix logic bug
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3734 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-08 04:32:46 +00:00
hanna
9fc05ac2ae
eagerDecode is now false.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3733 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-07 22:51:48 +00:00
ebanks
4bc3ad2194
Shame on me: UG was emitting negative QUALs (-0) in all_bases mode. Thanks, Matt.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3732 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-07 20:30:22 +00:00
ebanks
30714ec8d9
As per quick chat with Richard Durban, don't increase the mapping quality of realigned reads too much; for now, arbitrarily increase the MQ by 10. We need to figure out a better solution.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3731 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-07 20:12:59 +00:00
ebanks
8ff1a4b929
Don't try to clean reads that fail the PF, in preparation for Ryan
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3730 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-07 19:49:36 +00:00
depristo
b934cc7554
Updates to fix some bugs in merger. Now able to merge into project wide indel VCF files. Integration teests coming tomorrow
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3727 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-07 03:16:33 +00:00
kshakir
7be8c35eb2
Workaround for scala trait erasing parameterized types:
...
- Requiring explicit @ClassType on parameterized fields in traits.
- Scatter / Gather functions are now abstract classes since @ClassType can't be used on parameterized fields with type parameters.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3726 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-07 03:15:10 +00:00
hanna
120f90da5b
Interval support for ref walkers while streaming.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3725 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-07 03:14:59 +00:00
hanna
773a72e6ea
An initial fix for performance issues when filtering UG with new StratifiedAlignmentContext.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3724 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-07 01:07:46 +00:00
delangel
be75b087ec
a) Add input argument (-ncrate) to BeagleOutputToVCFWalker. If the genotype posterior error probability is higher than this threshold, we declare No-call at this genotype.
...
b) Add "OG" annotation to genotypes. If Beagle changes genotypes, this annotation gets the original genotype call, to ease performance comparisons. If not, this annotation gets an empty value.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3723 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-06 18:33:28 +00:00
hanna
4213e05aeb
Fix for sharding ref walkers via monolithic sharding. Introduces the potential bug (for
...
monolithic sharding only) that when traversing by read, map() function will not be called for loci
off the end of the reference.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3722 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-06 04:34:38 +00:00
aaron
86031f4034
part two: todo's in combine variants, fixes for InferredGeneticContext, and some other tests and clean-up.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3721 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-05 21:07:53 +00:00
ebanks
36edc60ccc
Connected UG to the new comp track annotation system in VA. Also, when emit confidence is lower than call confidence (so that we emit records filtered with LowQual), add a corresponding FILTER header field to the VCF so that the validator doesn't complain.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3720 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-05 13:04:24 +00:00
aaron
3347d1ca7c
part one of combining format and info header lines code into a single abstract class for Mark; plus some 'm' removals from access methods for Eric. Adding fixes for CombineVariants next.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3719 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-05 05:57:58 +00:00
ebanks
e7220bc885
Variant Context simple merging routine should keep ID if one of the VCs has it
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3718 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-05 01:10:15 +00:00
delangel
3016e1cf80
Fixes to increase robustness in vcf4 writer. We assume that only at most 1 base was clipped from beginning of allele encoding by reader, and improve the way we find if bases were clipped. We still cant deal with some corner cases, and duplicate records may follow, for example if a snp location is followed at the next base by an indel. Also, if we are reading form a 3.3 vcf and the reference is null (ie we have an insertion), the reference base is not computed correctly.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3717 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-04 20:22:04 +00:00
ebanks
07945040f8
Set VariantFiltration's JEXL engine to silent for warning messages
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3716 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-04 18:11:19 +00:00
ebanks
be8740b00d
Another edge case in left alignment for indels: deal with cases when insertions are ambiguously placed at ends of reads
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3715 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-04 17:26:38 +00:00
weisburd
9ec393bfce
Updated md5 - vcf header line change
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3714 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-02 21:02:09 +00:00
weisburd
f7593435eb
Implemented decodeLoc(..)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3713 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-02 21:01:36 +00:00
depristo
cd2e4b0a1e
merging now very close to working. Bug todo in writer and vcf infrastructure. Can almost create merged snp and indel files
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3712 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-02 20:09:25 +00:00
delangel
b6bdd61283
a) Fix bug when multi-base reference is homopolymeric when writing a VCF4.0 variant context: computation of number of trailing bases was incorrect and we ended up with incorrect position.
...
b) Updated VCF4WriterTestWalker to take either VCF3 or VCF4 as inputs (this walker can also be used to convert from 3.3 to 4.0).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3711 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-02 15:19:42 +00:00
depristo
61e2b2e39b
Nearly finalize merging capabilities for CombineVariants. Support for dealing with inconsistent indel alleles at loci. Improvements to Allele and removal of addAllele to MutableGenotype. We are close to being able to merge all of 1000 genomes -- snps and indels -- into a single combined vcf
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3710 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-02 13:32:33 +00:00
hanna
cab8394103
The sharding system now buffers reads, with a size determined by command-line argument. Will investigate whether/how this
...
impacts performance on low-pass data and, if it works well, will create a more automatic version of the tool.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3709 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-01 22:28:55 +00:00
aaron
f967cae1aa
tiny comment change
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3708 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-01 22:04:25 +00:00
aaron
3093a20a55
fixing VCF header format and info fields so that they propery emit the unbounded count value correctly for vcf4 or vcf3. Eric we should update the vcf4 spec page to indicate format fields are allowed to use the unbounded count as well (if this is true).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3707 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-01 22:02:16 +00:00
delangel
61c07c6f90
Fixes for missing key values that can create null pointer exceptions when reading from 3.3-generated variant contexts. Also, chop missing genotype fields correctly from right to left
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3706 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-01 20:17:03 +00:00
rpoplin
255b036fb5
Variant Recalibrator MLE EM algorithm is moved over to variational Bayes EM in order to eliminate problems with singularities when clustering in higher than two dimensions. Because of this there is no longer a number of Gaussians parameter. Wiki will be updated shortly with new recommended command.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3704 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-01 18:51:07 +00:00