rpoplin
1e7ddd2d9f
Added a validateOldRecalibrator option to CovariateCounterWalker which reorders the output to match the old recalibrator exactly. This facilitates direct comparison of output. Changed the -cov argument slightly to require the user to specify both ReadGroupCovariate and QualityScoreCovariate to make it more clear to the user which covariates are being used. Some speed up improvements throughout.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2010 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-10 15:55:56 +00:00
depristo
7e30fe230a
oops, missing file
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2009 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-10 13:25:18 +00:00
depristo
d316cbad4c
VariantFilteration now accepts a VCF rod in addition to an input geli. It will then annotate this VCF file with filtering information in the INFO field too. --OnlyAnnotate will not write in filtering output
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2008 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-10 13:24:58 +00:00
aaron
f9819d5f13
a little clean-up
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2007 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-10 06:18:34 +00:00
aaron
2ed423ed56
print the current location in read walkers (in addition to the number of reads processed), along with some refactoring to support the change.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2006 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-10 05:57:01 +00:00
ebanks
c9c3cf477a
Based on feedback from Kiran, we know uniquify sample names as sample.rodName (instead of sample.1, sample.2, ...)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2005 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-10 02:41:37 +00:00
depristo
3990c6d950
snpSelector v3 -- bootstrapping support and VCF output
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2004 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-09 22:48:51 +00:00
ebanks
2fa2ae43ec
Enough people have found this useful, so...
...
Moving Callset Concordance tool to core and adding integration test.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2003 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-09 20:59:18 +00:00
ebanks
3793519bd4
-Added convenience method to VCF record to tell if it's a no call and have rodVCF use it before querying for info fields
...
-Don't restrict info fields to 2-letter keys
[about to move these to core]
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2002 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-09 20:52:51 +00:00
rpoplin
740a5484c4
Added some documentation to the code, mostly especially to CovariateCounterWalker but various comments added throughout. Also changed the HashMap data structure to accept an estimated initial capacity. This had a very modest improvement to the speed.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2001 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-09 20:13:56 +00:00
ebanks
74751a8ed3
-Some minor fixes to get accurate vcf record merging done
...
-Improvement to snp genotype concordance test
And with that, it looks like I get revision #2000 .
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2000 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-09 06:40:55 +00:00
ebanks
ab705565cf
Completely refactored the Callset Concordance code. Now, it takes in VCF rods and emits a single VCF file which has merged calls from all inputs and is annotated (in the INFO fields) with the appropriate concordance test(s).
...
Still needs a bit of polish...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1999 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-09 05:03:13 +00:00
ebanks
bc6f24e88f
Added VCFUtils which contains some useful VCF-related functions (e.g. ability to merge VCF records).
...
Also, various minor improvements.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1998 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-09 04:53:32 +00:00
ebanks
cff645e98b
convenience method to deal with genotypes that are unsorted (e.g. CA vs. AC)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1997 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-09 04:45:49 +00:00
kiran
7fde6c0bf4
One more output tweak.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1996 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-09 04:42:55 +00:00
kiran
00a7113d7a
Tweaks to formatting of output table.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1995 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-09 04:33:36 +00:00
ebanks
7ce0df76f8
Added accessors to the rod data sources so that walkers can access the name/file/type triplets for input rods. This is necessary if e.g. you want to create a vcf writer based on all of the samples being input.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1994 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-09 04:25:39 +00:00
ebanks
d07f3bb6f6
Added methods to get strand bias and to test if record has allele freq or bias fields set.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1993 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-09 04:20:35 +00:00
kiran
3313b0ddb4
Fixed a minor bug where the lodThreshold wasn't being printed in the header.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1992 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-08 16:51:36 +00:00
kiran
95d381efe2
Optionally computes the error rate using the best base and a random base.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1991 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-08 16:47:34 +00:00
kiran
567f5758d2
Optionally lists read depths by read group.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1990 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-08 16:39:19 +00:00
kiran
a679bdde18
FindContaminatingReadGroupsWalker lists read groups in a single-sample BAM file that appear to be contaminants by searching for evidence of systematic underperformance at likely homozygous-variant sites.
...
Procedure:
1. Sites that are likely homozygous-variant but are called as heterozygous are identified.
2. For each site and read group, we compute the proportion of bases in the pileup supporting an alternate allele.
3. A one-sample, left-tailed t-test is performed with the null hypothesis being that the alternate allele distribution has a mean of 0.95 and the alternate hypothesis being that the true mean is statistically significantly less than expected (pValue < 1e-9).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1989 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-08 16:36:39 +00:00
kiran
2225d8176e
A convenience class for maintaining a dynamically growing table of values with access to the elements by named row and column identifiers.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1988 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-08 16:34:35 +00:00
hanna
21c5f543fa
Fix sharding bug -- loci to which >100,000 (= 1 shard) reads are assigned an
...
alignment start will confuse the sharding system and cause it to return duplicate reads.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1987 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-08 14:27:26 +00:00
depristo
f777c806d6
snpSelector v2 -- code refactoring and support for comparison with known truth. Looks great.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1986 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-07 19:32:12 +00:00
rpoplin
84ba604611
Sequential quality score calculation is now in place in the refactored recalibrator and matches the quality scores calculated by the old recalibrator exactly; at least on the small sets of data used so far. Validation, documentation, and optimization work is on going.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1985 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-07 15:55:16 +00:00
depristo
bf1bc94060
Fixes for PooledConcordance bugs and lack of safety checking
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1984 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-07 01:54:10 +00:00
depristo
7cb51dbc31
snpSelector v1 -- and supporting changes to VCF reader
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1983 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-06 23:00:46 +00:00
rpoplin
66d4a995e6
Initial check in of refactored Recalibrator. The new walkers are called CountCovariatesRefactored and TableRecalibrationRefactored. More work is needed to finish up the sequential calculation and to document the code sufficiently. These files are not ready to be used by other people quite yet.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1982 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-06 22:33:55 +00:00
ebanks
6fdfc97db6
Added optional field DP to VCF output for Mark.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1981 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-06 20:03:22 +00:00
hanna
f73dd09399
More preparation for bwa patch: create a crude, minimal build system to build
...
against a static library that the bwa compile (now) generates.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1980 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-06 18:41:01 +00:00
ebanks
0a55fa5bb1
Completely refactored the Genotype Concordance module(s).
...
Now PooledConcordance and GenotypeConcordance inherit from the same super class (and can therefore share data structures and functionality). Also, they now use ConcordanceTruthTable to keep track of necessary info.
GenotypeConcordance passes integration tests.
PooledConcordance needs to be finished by Chris.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1979 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-06 16:27:16 +00:00
ebanks
d549347f25
Refactored GenotypeLikelihoods to use an underlying 4-base model.
...
It needs to be modified a bit and then hooked up to a pooled model, but that is now possible.
At this point, there is no difference to the Unified Genotyper.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1978 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-05 21:59:25 +00:00
jmaguire
4d3871c655
don't flush anymore.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1977 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-05 19:11:51 +00:00
aaron
aacd72854f
a fix for a bug Andrey discovered: in read-based interval traversals we're dupplicating reads in rare cases. The problem was that to accomidate a bug in SAM JDK indexing, we were forced to add one to the stop of our QueryOverlapping() calls to ensure we always got all of the overlapping reads.
...
Added a PlusOneFixIterator that wraps other iterators, and eliminates reads that start outside of our intended interval (interval stop - 1). Updated and checked BamToFastqIntegrationTest MD5 sums.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1976 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-05 05:26:33 +00:00
hanna
15e80aec3b
Temporarily adding changed bwa files to Sting repository until I can pull together the official bwa patch.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1975 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-04 21:46:05 +00:00
chartl
eca0942644
Oops. Let's make sure only to write calls that the pool supports to the auxiliary vcf files.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1974 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-04 17:14:55 +00:00
hanna
43c3ee61d5
Fix minor mapping quality bug.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1973 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-04 14:33:23 +00:00
aaron
21b1ece45d
removed some extra debugging commands from the build.xml
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1972 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-04 06:28:12 +00:00
chartl
fc17e75759
Put this puppy through its paces. Eliminated the sorting and header-handling stuff; that isn't the purvey of this script and should be handled downstream or by a script wrapper.
...
I also secretly handled another pesky overlow exception. Occasionally Syzygy could report lods of like -1000; e.g. posterior probabilities of one in one (((googol) googol) googol) googol which of course makes python blow up. Now we safely output an accurate posterior.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1971 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-04 06:05:45 +00:00
chartl
3d9195f8b6
Added - converter from expanded summary to VCF (beautiful thing, really)
...
Removed - the ugly hackjob that was expanded summary to Geli
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1970 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-03 22:20:47 +00:00
ebanks
a545859c62
Joint Estimation model now emits a reasonable slod
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1969 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-03 21:12:42 +00:00
ebanks
11d950abe0
No longer allow the lod_threshold argument - use confidence instead.
...
Have UG output qscores in all cases.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1968 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-03 16:18:51 +00:00
asivache
2fb45dbd73
Make window size a command line argument
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1967 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-03 16:13:35 +00:00
asivache
55f61b1f88
Bug fix in adjustment of the shift position.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1966 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-03 16:08:11 +00:00
depristo
d60c632099
Minor output improvement
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1965 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-03 13:20:55 +00:00
depristo
44ea55d338
Useful library for parsing VCF files, plus a general VCF->table converter
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1964 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-03 13:14:04 +00:00
depristo
5d5dc989e7
improvements to VCF and variant eval support of VCF -- now listens to the filter field
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1963 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-03 12:09:30 +00:00
hanna
c313abf0be
Restore a jar packaging change that got clobbered.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1962 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-03 00:15:45 +00:00
hanna
c63af32fc7
The BWA/C bindings were triggering the local aligner to repeatedly reload the
...
ref genome. Make sure the reference genome is cached.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1961 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-03 00:01:55 +00:00