depristo
5d2c2bd280
Just refactoring into utils/baq directory
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4795 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-06 17:43:43 +00:00
depristo
44feb4a362
Improved BAQ implementation. Now supports adding BAQ tags to reads on the fly with ADD_TAG_ONLY option. Caching fasta reader implementation, and changes throughout the system to enable this. Many performance improvements throughout the system due to better reference access patterns.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4792 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-05 18:29:39 +00:00
ebanks
8901e63879
Cheap optimization: don't keep calculating the log of a constant. (How did I not catch this before?)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4791 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-05 04:36:21 +00:00
ebanks
bef48e7a42
For Chris, to make his life easier: iterate over all VCF records passed in looking for one with an ALT allele defined instead of assuming all records have one.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4789 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-05 02:23:38 +00:00
depristo
a5b3aac864
Engine-level BAQ calculation now available in the GATK [totally experimental right now]. -baq argument to disable (NONE), to only use the tags in the BAM (USE_TAG_ONLY), use the tag when present but calculate on the fly as necessary (CALCULATE_AS_NECESSARY), and to always recalculate (RECALCULATE_ALWAYS). BAQ.java contains the complete implementation, for those interested. ValidateBAQWalker is a useful QC tool for verifying the BAQ is correct. BAQSamIterator applies BAQ to reads, as needed, in the engine. Let me know if you encounter any problems. Before prime-time, needs a caching implementation of IndexedFastaReader to avoid loading *lots* of reference data all of the time
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4787 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-04 20:23:06 +00:00
fromer
b12cec4302
Added emitOnlyMNPs flag
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4785 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-03 20:34:17 +00:00
fromer
6d4ec7f9e7
Remove RefSeq INFO from MNPs since annotating them properly
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4784 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-03 19:03:35 +00:00
fromer
4719bbc772
Changed dontRequireSomeSampleHasDoubleAltAllele parameter to mean that merging should only start at a polymorphic site
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4783 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-03 17:52:56 +00:00
ebanks
ec174dc0ba
As per Menachem's last commit, there's a minimally more efficient way of doing the MQ cap.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4782 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-03 16:37:08 +00:00
fromer
92cf7744a6
Set minMQ = max(minMQ, minBQ) for phasing since anyway we cap BQ by MQ; also, lowered MIN_BASE_QUALITY_SCORE for phasing to 17 (was previously 20)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4781 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-03 16:31:13 +00:00
ebanks
237ab1d489
1. As discussed in group meeting today, because we cap BQ by MQ, if MQ < minBQ then we filter the read.
...
2. Update to UGCalcLikelihoods for Chris: require a vcf bound to 'allele' to be provided so that we know exactly which alternate allele we should be calculating GLs for at each site. The user is warned when the VC is not biallelic or there are multiple records at a site.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4780 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-03 05:57:06 +00:00
delangel
da6a07ad3b
First round of critical fixes to indel genotyper (more to come tomorrow):
...
a) Avoid complete crash of caller that broke due to a recent refactoring by someone who must not be named <cough>EB<cough>... an integration test to avoid this in the future coming soon.
b) Fixed up strand bias computation for indels
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4779 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-03 02:48:09 +00:00
fromer
e09d6ee56b
write non-MNP VariantContexts records only once (where they start)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4777 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-02 22:14:26 +00:00
fromer
1515bf6de9
Merged common VCF writing logic into phasing/WriteVCF.java
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4776 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-02 22:03:02 +00:00
asivache
4e62de4213
Added method getOriginalReadGroupId(): takes merged (in case of collision) read group id as reported by a read coming from the merged stream and returns this read's read group id as it was listed in the original input bam file.
...
IndelRealigner now uses this functionality to correctly un-mangle read group id's in --nWayOut mode (i.e. when we need to write reads into separate output bams with headers matching the original inputs).
Some hidden changes to IndelRealigner: purely testing and development, transparent to the users (hidden option added)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4775 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-02 21:41:52 +00:00
rpoplin
e5282742f9
Bug fix in CountCovariates, skip over indel records as well as SNPs in the dbsnp file. CountCovariates is now called CountCovariatesWalker. I've always hated that the name was swapped.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4774 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-02 18:43:24 +00:00
rpoplin
0adf505b53
We no longer look at by-hapmap validation status in the VQSR because using the HapMap VCF file is higher quality. As a side effect we now support the dbsnp 132 vcf file. ApplyVariantCuts now requires that the input VCF rod bindings begin with input, matching the other VQSR walkers. Wiki updated with information about how to obtain the hapmap and 1kg truth sets.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4772 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-02 15:38:45 +00:00
ebanks
99b942b0b4
Removing duplicated header args
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4770 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-01 20:16:53 +00:00
fromer
9ac0f98d0d
Fixed bug in retaining proper RefSeq records
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4768 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-01 18:39:02 +00:00
ebanks
7caf666f48
For Sendu: add a hidden option to allow bams to come out unsorted. We've agreed to let him deal with sorting these puppies on his own.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4767 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-01 17:56:13 +00:00
ebanks
3afa841a6a
Fixing docs
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4766 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-01 17:36:47 +00:00
ebanks
6a6cdc1925
Adding minor usage docs
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4765 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-01 17:34:33 +00:00
ebanks
0d1c905df3
Adding UGCalcLikelihoods and UGCallVariants so that GSA members can break up the calling process into separate steps (calculate the GLs and then call off of those) - useful for Chris's new batch merger. As the docs say, these are absolutely not supported or recommended for public use.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4764 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-01 17:32:26 +00:00
fromer
b4ef716aaf
As per Eric and Mark's suggestions, separated the segregating MNP merger (MergeMNPs) from the more general merger employed for annotation purposes (MergeSegregatingAlternateAlleles). Both use the same core MergePhasedSegregatingAlternateAllelesVCFWriter
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4763 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-01 16:42:08 +00:00
ebanks
0892daddb0
Improvement for the TGEN folks: when running in the solid recal mode of SET_Q_ZERO_BASE_N, update the NM tag if one was present in the read to reflect the new N's in the read.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4761 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-01 04:36:44 +00:00
delangel
2ac938fe4e
1)
...
Minor fixes to avoid crashes vs CG indel files:
- Add count for complex events, not just insertions and deletions
- Handle correctly cases of large indels falling out of bounds of histogram array: added a count of indels ouf of bounds and avoid exceptions.
2) Cosmetic fix for R script assessing UG calling performance: draw red y=x line on top of Simulated vs Estimated AC to get a better view of under/over-estimation of AC.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4758 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-30 21:08:25 +00:00
rpoplin
af84462f3e
The dev team has decided to change the filter that is added to records that are set to monomorphic by Beagle. It no longer lists the reference allele. Added those filters to the header of the output VCF file. Finally, we no longer use R2=NaN values coming from Beagle.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4757 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-30 17:19:54 +00:00
ebanks
a181680814
We no longer require dbSNP files to be of the dbsnp rod-type; VCFs will do (provided they are bound to the name 'dbsnp')
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4753 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-30 03:25:18 +00:00
aaron
b03ac61e9d
consolidating the checking of the RMD sequence dictionary against the reference into a single function, and adding an integration test to test that empty VCFs pass (both the indexing and the seq dictionary validation).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4750 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-30 00:01:56 +00:00
hanna
abc13d0a90
Temporary hack: force abort with an intelligent message suggesting that users
...
specify -B:dbsnp,vcf <filename> if the filename passed if the --DBSNP argument
value contains 'vcf'. We'll replace this functionality once dbSNP 132 starts
playing nicely with the tagging system.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4749 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-29 23:37:30 +00:00
ebanks
d89e17ec8c
Fare thee well, UGv1. Here come the days UGv2.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4747 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-29 21:51:19 +00:00
fromer
727dac7b7a
Added MNP annotation of the number of AA changes occuring in the SAME RefSeq entry (numAAchanges), and if this number is > 1 for any of the alt alleles (alleleHasMultAAchanges)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4746 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-29 21:24:30 +00:00
ebanks
ce051e4e9a
Write to sdout when no -o is provided
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4743 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-28 06:19:26 +00:00
ebanks
e3e6d176df
Looking over the daily error log email made me realize that there were 2 implementations of vc.modifyLocation() - the correct one in VC that didn't require lazy loading the genotype data and the bad one in VCUtils that did. Removing the implementation in VCUtils and updating the code accordingly. Also, removing createPotentiallyInvalidGenomeLoc() since no one uses it anymore.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4736 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-26 18:40:34 +00:00
ebanks
6934f83cc7
Two changes to CombineVariants.
...
1. Fix: VCs were padded before the merge, but they were never unpadded afterwards. This leaves us with a VC that doesn't meet our spec.
2. Update: instead of running the merged VC through every standard annotation (which seems really wrong, since this isn't the annotator tool), just update the chromosome count annotations (AC,AF,AN) through VCUtils.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4734 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-25 04:52:12 +00:00
fromer
d775192631
Check if MNP annotation of amino acid is dependent on the MNP, or could it be obtained through some single-base variant?
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4733 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-24 22:38:33 +00:00
rpoplin
0dd40c3684
Updating doc text
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4732 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-24 21:34:14 +00:00
rpoplin
ed08899abc
Overwhelming evidence that maxQ = 50 is now a better default than maxQ = 40 in the base quality score recalibrator, especially when combined with dbsnp build 132. Also, added option in ProduceBeagleInputWalker for Beagle-ing chromosome X calls with male samples which sets the genotype likelihood for the AB allele to zero for those samples.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4731 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-24 21:32:26 +00:00
fromer
ca70ed611c
Totally revamped the MNP annotation and put it in its own walker: AnnotateMNPsWalker
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4730 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-24 18:05:10 +00:00
depristo
8768e1a240
Useful profiling tool that reads in a single rod and evalutes the time it takes to read the file by byte, by line, into pieces, just the sites of the vcf, and finally the full vcf. Emits a useful table for plotting with the associated R script that can be run like Rscript R/analyzeRodProfile.R table.txt table.pdf titleString
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4728 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-24 14:59:16 +00:00
ebanks
7a8b85dd15
Catch the JEXL exception when trying to match a variable that's not in the context - and don't filter in these cases. Now everyone can happily go back to using the stupid (and hopefully temporary) AlleleBalance filter.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4727 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-24 05:00:41 +00:00
ebanks
caf2c21f61
Must close the writer to flush the cache
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4726 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-24 04:33:08 +00:00
ebanks
816c33c821
indel-related fixes to the strict validator
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4725 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-24 04:08:34 +00:00
ebanks
ea6e2218c1
1. dbsnp has some massive indels which my left-aligner was barfing on because there isn't enough reference context; fixed. 2. Lower default calling threshold to Q30 for UGv2.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4722 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-23 19:28:33 +00:00
aaron
53672361cc
capture more details when something IO-related goes wrong in writing a Tribble index
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4720 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-23 17:06:28 +00:00
hanna
8ca5edf89f
Fix issue where non-required file inputs can throw a NullPointerException
...
rather than a UserException when an the input argument is specified without
an argument value.
The magnitude of code required to fix this points to a need to give the
command-line argument system a good spring cleaning.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4714 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-22 01:49:17 +00:00
ebanks
b9a59ea54f
Adding Het/Hom ratio to the temp per sample metrics. Because I'm in a generous mood tonight, I'm going ahead and fixing the paths for the classes I'm touching...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4713 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-21 04:24:42 +00:00
ebanks
cff7c6ddce
These are user exceptions
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4712 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-21 02:08:11 +00:00
bthomas
374c0deba2
Updating the core LocusWalker tools to include the Sample infrastructure that I added last month. This commit touches a lot of files, but only significantly changes a few: LocusIteratorByState and ReadBackedPileup and associated classes.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4711 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-19 19:59:05 +00:00
kshakir
c723db1f4b
Added a -summary jexl argument to VariantEval similar to -validate.
...
Updated the package of ValidationGenotyper to match the file location.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4710 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-19 04:42:46 +00:00