ebanks
6b960bd9c5
Fix for Steve: genotype filters still want to see the values from the VC
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3758 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-11 04:30:15 +00:00
depristo
c3c66e853c
Improvements for Jason
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3756 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-09 20:18:37 +00:00
ebanks
405be230d0
Various code improvements based on FindBugs
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3755 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-09 15:04:48 +00:00
chartl
ea8fd506bf
Update to PickSequenomProbes: Option to ignore mask sites within X bp of a variant (very useful for indels where dbSNP entries near the indel are almost always false SNP calls). Also fixed an integration test where the variant site itself, being in dbSNP, was represented as [N/C] rather than [A/C]. Added integration test for 1bp no-mask window.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3753 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-09 04:03:19 +00:00
depristo
45fb614296
Fixes to VE for obscure bug, as well as disabled integration test for CombineVariants
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3749 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-09 00:13:07 +00:00
rpoplin
67f1589652
--fdr_filter_level isn't mandatory
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3748 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-08 22:48:30 +00:00
rpoplin
5d39cd5db8
Added --fdr_filter_level to ApplyVariantCuts so that you can create beautiful tranche plots and also decide which tranche level to filter at. The previous version always filtered at the smallest tranche. The tranche filter names are appropriately added to the VCF header.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3747 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-08 22:44:10 +00:00
depristo
760aaeda88
Update to CombineVariants. Now splits merge options into variant and genotype options separately.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3746 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-08 20:09:48 +00:00
depristo
56a0c7ee6f
All headers are now converted to VCF4 by default.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3741 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-08 14:14:17 +00:00
ebanks
ed0d0d78fa
corresponding fix for dealing with insertions
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3739 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-08 05:25:03 +00:00
ebanks
9a81f1d7ef
Fixed this tool for chartl so that it now properly handles deletions. Added deletion case to integration tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3737 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-08 04:45:59 +00:00
ebanks
4bc3ad2194
Shame on me: UG was emitting negative QUALs (-0) in all_bases mode. Thanks, Matt.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3732 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-07 20:30:22 +00:00
ebanks
30714ec8d9
As per quick chat with Richard Durban, don't increase the mapping quality of realigned reads too much; for now, arbitrarily increase the MQ by 10. We need to figure out a better solution.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3731 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-07 20:12:59 +00:00
ebanks
8ff1a4b929
Don't try to clean reads that fail the PF, in preparation for Ryan
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3730 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-07 19:49:36 +00:00
depristo
b934cc7554
Updates to fix some bugs in merger. Now able to merge into project wide indel VCF files. Integration teests coming tomorrow
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3727 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-07 03:16:33 +00:00
hanna
773a72e6ea
An initial fix for performance issues when filtering UG with new StratifiedAlignmentContext.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3724 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-07 01:07:46 +00:00
aaron
86031f4034
part two: todo's in combine variants, fixes for InferredGeneticContext, and some other tests and clean-up.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3721 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-05 21:07:53 +00:00
ebanks
36edc60ccc
Connected UG to the new comp track annotation system in VA. Also, when emit confidence is lower than call confidence (so that we emit records filtered with LowQual), add a corresponding FILTER header field to the VCF so that the validator doesn't complain.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3720 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-05 13:04:24 +00:00
aaron
3347d1ca7c
part one of combining format and info header lines code into a single abstract class for Mark; plus some 'm' removals from access methods for Eric. Adding fixes for CombineVariants next.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3719 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-05 05:57:58 +00:00
ebanks
07945040f8
Set VariantFiltration's JEXL engine to silent for warning messages
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3716 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-04 18:11:19 +00:00
depristo
cd2e4b0a1e
merging now very close to working. Bug todo in writer and vcf infrastructure. Can almost create merged snp and indel files
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3712 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-02 20:09:25 +00:00
depristo
61e2b2e39b
Nearly finalize merging capabilities for CombineVariants. Support for dealing with inconsistent indel alleles at loci. Improvements to Allele and removal of addAllele to MutableGenotype. We are close to being able to merge all of 1000 genomes -- snps and indels -- into a single combined vcf
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3710 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-02 13:32:33 +00:00
rpoplin
255b036fb5
Variant Recalibrator MLE EM algorithm is moved over to variational Bayes EM in order to eliminate problems with singularities when clustering in higher than two dimensions. Because of this there is no longer a number of Gaussians parameter. Wiki will be updated shortly with new recommended command.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3704 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-01 18:51:07 +00:00
hanna
c9d5345150
Redo StratifiedAlignmentContext to use ReadBackedPileup's stratification options.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3699 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-01 02:46:05 +00:00
depristo
5f2b2d860e
Final stage of renaming
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3696 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-30 21:39:07 +00:00
depristo
6e7927a47d
Continuing the renaming nightmare...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3695 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-30 20:25:01 +00:00
depristo
9d7d5f1747
Continuing the renaming nightmare...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3694 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-30 20:24:27 +00:00
depristo
aa20c52b88
deleting vcf
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3693 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-30 20:19:15 +00:00
depristo
4195fc5c4e
renaming part 2...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3692 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-30 20:18:11 +00:00
depristo
6c9da5525d
renaming starting
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3691 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-30 20:16:51 +00:00
depristo
b8d6a95e7a
Preliminary commit of new VCFCombine, soon to be called CombineVariants (next commit) that support merging any number of VCF files via a general VC merge routine that support prioritization and merging of samples! It's now possible to merge the pilot1/2/3 call sets into a single (monster) VCF taking genotypes from pilot2, then pilot3, then pilot1 as needed.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3690 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-30 20:13:03 +00:00
chartl
569456850d
Mark pointed out there's differentiation in the filter field. Rolling back.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3687 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-30 17:05:53 +00:00
chartl
52a474b27d
Fixed an issue with VCF combine in sites like the following:
...
Broad: Filtered BC: No call
These were being treated the same as
Broad: Call BC: No call
Added some verbosity to separate them.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3686 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-30 16:49:31 +00:00
ebanks
944dbb94ce
Refactored and generalized the database/comp annotations in VariantAnnotator. Now one can provide comp tracks as with VariantEval (e.g. compHapMap, comp1KG_CEU) and the INFO field will be annotated with the track name (without the 'comp') if the variant record overlaps a comp site (e.g. ...;1KG_CEU;...). This means that you can now pass 1kg calls to the Unified Genotyper and automatically have records annotated with their presence in 1kg.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3684 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-30 16:37:31 +00:00
ebanks
47c4a70ac1
It turns out that it is legitimately possible for there to be reads that won't overlap within a target interval for cleaning. While we don't want to attempt cleaning, we also don't want to fail.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3682 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-30 15:50:44 +00:00
ebanks
12c0de6170
Added ability to clean using only known indels. Added integration test for it. Fixed vcf->vc conversion for indels which was busted.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3678 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-30 01:20:56 +00:00
chartl
610cc7ae2b
Cool package trick Kiran showed me. VariantEvaluator no longer public, AAT specifies the core package even though it lives in oneoffs. Disabled so integration tests pass.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3677 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-29 22:42:04 +00:00
chartl
4c6f4e41c6
Include making VariantEvaluator public within the package so my oneoffs can be seen (not included in previous submit specifically because I didn't want to break the build by changing anything in core...the road to hell is paved with good intentions)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3676 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-29 22:26:52 +00:00
ebanks
baf9479c35
An addition for Sendu since he can't seem to tell when his CountCovariate jobs die in the middle of writing the CSVs. We now write an EOF marker at the end of the covariates table and look for it when reading in the file in TableRecalibrationWalker. By default, we warn the user if the EOF marker isn't present, but we exception out if the user provides the --fail_with_no_eof_marker option.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3670 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-29 18:50:07 +00:00
ebanks
801b47c6e9
For Sendu: a similar addition to the Indel Genotyper allowing it to emit a metrics file (which for now consists only of # of normal/tumor calls made)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3668 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-29 13:19:17 +00:00
ebanks
ddf87e61c2
For Sendu: optionally emit a metrics file with callability info (including number of actual calls made) from UG
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3667 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-29 12:57:28 +00:00
ebanks
929e5b9276
Fix possible null pointer exception
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3666 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-29 09:01:18 +00:00
ebanks
4a451949ba
add parallel option to target creator for masking out reads with bad mates
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3663 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-28 22:13:25 +00:00
ebanks
1292c96e29
The cleaner now adds the OC (original cigar) and OS (original alignment start) tags as appropriate to reads that get realigned; this feature can be turned off. Also, improved integration tests (sorry, Kiran!).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3657 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-28 16:46:47 +00:00
asivache
cc8d8eaedb
Now that we always reserve space for two read ends when collecting stats stratified by libraries, we need to check that the second end was indeed present; otherwise the pointer is null and this was causing an exception
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3656 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-28 16:40:16 +00:00
ebanks
9a24598a98
By default, don't clean reads with mates mapped to other chromosomes
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3654 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-28 15:14:20 +00:00
ebanks
bf5cbad04c
Make the target creator a rod walker (that allows reads) so that we can easily trigger the cleaner on only known indel sites. Adding an integration test to cover this case.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3651 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-28 13:28:37 +00:00
aaron
f9c7803d4e
this got left off my last commit
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3635 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-25 02:42:44 +00:00
aaron
682f9b46c6
Two fixes together:
...
1) Some improvements to the VCF4 parsing, including disabling validation.
2) Reimplemented RefSeq in the new Tribble-style rod system.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3630 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-24 22:17:03 +00:00
aaron
62bc7651a8
fix for PSPW with DbSNP mask. Added an integration test for this case.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3628 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-24 19:31:32 +00:00