kiran
4ee6a478e3
Creates a table of reference allele percentage and alternate allele percentage at Hapmap-chip sites in a BAM file.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2428 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-22 20:43:44 +00:00
ebanks
a5f75cbfd4
The previous commit broke the build, so this is a temporary patch to get it to compile. ConcordanceTruthTable should use enums (esp. now that all of the concordance variables need to be public), but VariantEval will need to be rewritten soon anyways so I'll just push it off until then.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2413 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-20 02:34:41 +00:00
depristo
ee8bcdc61d
PooledConcordance calculations have been reformatted and bugs fixed. Now properly handles monomorphic sites. Also works with -G option now, correctly
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2412 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-19 23:22:36 +00:00
depristo
9bf2d12c64
Misc. improvements to the LMW code. Support for emitting all sites, regardless of genotype. Min and max quality scores.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2411 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-19 23:20:57 +00:00
aaron
c39675d2c1
VCFTool.java got left off of the last commit
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2407 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-18 21:33:53 +00:00
ebanks
4ea31fd949
Pushed header initialization out of the GenotypeWriter constructors and into a writeHeader method, in preparation for parallelization.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2406 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-18 19:16:41 +00:00
jmaguire
98839193b7
compatibility with VCF lib's switch to GenomeLoc.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2397 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-18 00:52:48 +00:00
jmaguire
8787dd4c5e
Various and sundry additions to VCF tools. Some useful to the general public, some one-offs.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2396 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-18 00:35:45 +00:00
andrewk
36875fca89
Update documentation in the new help system
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2380 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-16 21:33:12 +00:00
sjia
2deae95df9
Updated documentation
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2370 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 21:31:47 +00:00
hanna
555976d575
One more walker with formatting to fix.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2369 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 21:23:13 +00:00
hanna
cf46472419
Fix up Sherman's new docs in compliance with javadoc specs.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2368 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 21:20:38 +00:00
sjia
df79ed8db1
Updated documentation
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2367 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 20:53:41 +00:00
sjia
a80a5f1036
Updated documentation
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2366 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 20:52:08 +00:00
sjia
18f61d2586
Updated documentation
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2365 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 20:45:19 +00:00
sjia
5974c42468
Updated documentation
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2364 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 20:41:35 +00:00
sjia
d8cfd707bc
Updated documentation
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2363 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 20:35:18 +00:00
sjia
4322beeb35
Updated documentation
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2362 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 20:33:38 +00:00
sjia
4148991d81
Now also encodes amino acids, includes documentation.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2361 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 20:26:56 +00:00
depristo
a810586418
Check-in without javadoc = smackdown
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2359 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 15:32:39 +00:00
depristo
0d2a761460
Bugfix for minBaseQuality to ignore deletion reads. LocusMismatch walker now allows us to skip every nths eligable site
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2357 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 14:38:39 +00:00
depristo
faa638532a
Correct location
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2353 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 02:42:21 +00:00
depristo
1da97ebb85
Walker for calculating non-independent base errors, v1. Will be moved to somewhere not in core
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2352 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 02:40:15 +00:00
chartl
b42fc905e8
Added - new tests (Hapmap was re-added)
...
Modified - Hapmap now takes a -q command to filter out variants by quality
Modified - MathUtils - cumBinomialProbLog now uses BigDecimal to handle some numerical imprecisions
Modified - PowerBelowFrequency - returns 0.0 if called with a negative number (can't be done from inside the walker itself, but since it's called elsewhere one can't be too careful)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2350 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-14 21:57:20 +00:00
ebanks
c7b23d6ca5
Now that VCFGenotypeRecords implement SampleBacked (as they should), a quick fix was needed to get the GenotypeConcordance working when no direct samples were provided in a samples file.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2348 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-14 04:27:16 +00:00
ebanks
97618663ef
Refactored and generalized the VCF header info code.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2346 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-13 21:02:45 +00:00
depristo
05b8782d5f
Documentation updates. Moved CountX.java walkers to QC
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2345 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-13 18:40:22 +00:00
kiran
2748eb60e1
Added short documentation for each class so that it appears in the walker command-line documentation.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2340 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-12 21:41:07 +00:00
hanna
6955b5bf53
Cleanup of the doc system, and introduce Kiran's concept of a detailed summary
...
below the specific command-line arguments for the walker. Also introduced
@help.summary to override summary descriptions if required.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2337 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-12 04:04:37 +00:00
hanna
0da2105e3c
Moving DuplicateQualsWalker to oneoffprojects.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2332 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-11 19:22:32 +00:00
hanna
f97ac939fa
Punch up the help documentation for CombineDuplicates.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2325 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-11 17:09:35 +00:00
aaron
86dc98bfb5
update the documentation for CombineDuplicates for the new help system.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2324 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-11 17:01:42 +00:00
depristo
8f7554d44f
A few improvements to pooled concordance calcluations. Now will show you FN with the -V option. BasicGenotype now prints out a reasonable representaiton wiwth toString
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2320 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-10 23:09:10 +00:00
aaron
f64a4c66ac
some tweaks for the GATK paper genotyper to better work with shared memory parallelization, added documentation changes for Matt's new help system.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2319 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-10 22:33:51 +00:00
andrewk
a7cd172628
Added 8x coverage field and minimum base quality command line option in order to be able to compare to U. Wash. exome metrics.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2318 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-10 22:14:44 +00:00
ebanks
0fae798b3a
1. Discoverable base calculations don't care about Genotypes (use Variation's PError regardless of whether the call is ref or var - it's the correct value even for ref calls).
...
2. Call a base genotypable if any of the Genotypes is above the threshold (you can't assume there's a single Genotype associated with the Variation).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2306 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-10 04:26:06 +00:00
ebanks
78d5ac9bc2
Don't check het count when there are multiple Genotypes per Variation.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2304 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-10 04:07:47 +00:00
ebanks
8d67d9ade3
-Minor fix in UG for all-bases mode
...
-Make minConfidenceScore in VariantEval a double so non-integer values can be used (requested by Steve H).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2290 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-09 03:49:10 +00:00
ebanks
e8822a3fb4
Stage 3 of Variation refactoring:
...
We are now VCF3.3 compliant.
(Only a few more stages left. Sigh.)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2287 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-08 21:43:28 +00:00
depristo
8f461d3c40
Critical bug fix for VariantEval dbSNP calculations. Moved the system over to the new improved ROD iterators, resulting in dbSNP rates jumping 5% or so, due to masking of true SNPs by preceding indels.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2274 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-07 03:36:38 +00:00
hanna
8089aa3c50
Adding support to override the help text.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2273 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-07 00:16:26 +00:00
ebanks
b6f8e33f4c
Stage 2 of Variation refactoring:
...
VCFRecord now implements Variation, VCFGenotypeRecord now implements Genotype.
Because of this change, RodVCF is now just a wrapper around the VCFRecord and does nothing else. Also, one can call toVariation on the VCFGenotypeRecord and it returns the VCFRecord.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2271 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-06 06:48:03 +00:00
hanna
3b440e0dbc
Add a taglet to allow users to override the display name in command-line help.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2270 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-06 04:12:10 +00:00
ebanks
08f2214f14
Stage 1 of massive Variation/Genotype refactoring.
...
This stage consists only of the code originating in the Genotyper and flowing through to the genotype writers. I haven't finished refactoring the writers and haven't even touched the readers at all.
The major changes here are that
1. Variations which are BackedByGenotypes are now correctly associated with those Genotypes
2. Genotypes which have an associated Variation can actually be associated with it (and then return it when toVariation() is called).
The only integration tests which need to be updated are MSG-related (because the refactoring now made it easy for me to prevent MSG from emitting tri-allelic sites).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2269 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-06 03:12:41 +00:00
ebanks
aef4be5610
Moved CoarseCoverageWalker to core and packaged both coverage walkers in coverage/
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2249 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-03 17:53:36 +00:00
ebanks
df4e001a07
Renamed to more accurately describe its function.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2248 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-03 17:34:49 +00:00
ebanks
c2017cc91b
PrintCoverageWalker functionality moved to DepthOfCoverageWalker. Added integration tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2247 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-03 17:23:59 +00:00
ebanks
01cf5cc741
1. Merged CoverageHistogram into DepthOfCoverageWalker
...
2. Fixed bug in histogram calculation for small intervals
3. Better output in DoCWalker
4. Comments added to code
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2245 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-03 17:01:53 +00:00
ebanks
44b9f60735
PercentOfBasesCovered functionality moved to DepthOfCoverageWalker. Added integration tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2244 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-03 16:11:09 +00:00
ebanks
126d1eca35
Move to core (qc/)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2243 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-03 15:45:58 +00:00
ebanks
9da5cc25ad
More archiving (with permission from Andrey) plus a move to core.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2242 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-03 15:40:27 +00:00
ebanks
d7e4cd4c82
Moving some useful and stable walkers to core:
...
- ClipReads
- PrintRODs (generalized to print all RODs that are Variations)
- FixBAMSortOrderTag (added documentation to walker so that people know what it does and why)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2238 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-03 03:00:45 +00:00
depristo
c776f9fb90
Simple utilities for dealing with Complete Genomics data
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2230 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-02 22:51:41 +00:00
ebanks
a09fee2b5e
Moved some more walkers to oneoffprojects and killed an old indel-related walker that isn't being used.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2228 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-02 20:28:07 +00:00
ebanks
a3343c75db
Move and rename a hybrid-selection-specific coverage calculation to hybridselection/
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2225 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-02 20:11:22 +00:00
ebanks
2c83f2f2bc
Move MSG - plus now obsolete classes which it depends on -- to oneoffprojects (with permission from Jared).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2224 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-02 20:04:22 +00:00
jmaguire
c180a76b05
Added option "append": if set, and the specified discovery output already exists, don't re-call anything that's already present in that file. Append new calls to it.
...
Great for resuming long jobs that died partway through.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2219 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-02 18:56:19 +00:00
ebanks
0a2304eff8
- Rename minConfidenceScore in VariantEval to minPhredConfidenceScore
...
- Moved validation walkers to new qc dir
- Killed unused test
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2218 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-02 17:59:19 +00:00
aaron
d487428468
remove incorrect parentheses
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2211 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-02 06:46:32 +00:00
ebanks
b979bd2ced
- Optimized implementation of -byReadGroup in DoCWalker
...
- Added implementation of -bySample in DoCWalker
- Removed CoverageBySample and added a watered down version to the examples directory
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2209 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-02 03:39:24 +00:00
ebanks
ba8a8febc6
Thanks to Steve Hershman for finding this bug:
...
getNegLog10PError() does not equal the confidence score (you need to multiply by 10 as confidence is traditionally phred scaled). Probably we should change the method to be getNeg10Log10PError(). Anyone have strong feelings on this?
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2207 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-02 01:59:03 +00:00
ebanks
3303808a8f
Yet more walkers moved to oneoffprojects.
...
Made hybridselection subdir in playground.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2205 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-01 21:29:12 +00:00
ebanks
05923f7fba
Started transition to oneoffprojects.
...
Moved/killed a few other walkers (with permission).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2204 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-01 21:19:02 +00:00
jmaguire
74f6526e09
VCFHomogenizer: A class that extends InputStream and dynamically re-writes pilot1 VCF's to be on-spec.
...
VCFTool: A command-line tool with various useful VCF functions (validate, grep, concordance).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2202 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-01 17:55:42 +00:00
ebanks
e581cceab6
Got Kris's permission to delete these walkers.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2200 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-01 16:57:28 +00:00
chartl
21a9a717e4
Some minor changes and test:
...
- DepthOfCoverage is now by reference (so locus-by-locus output correctly reports zero-coverage bases)
- VariantsToVCF now lets you bind variants with any string except intervals and dbsnp (not just NA######)
- A PileupWalker integration test on a particularly nasty FHS site
- Two second-base annotation related integration tests on that same site
+ outputs were all hand-validated in matlab; within a certain tolerance for the annotations
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2197 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-01 15:15:54 +00:00
ebanks
084337087e
Removing deprecated code and walkers for which I had the green light from repository.
...
Moved piecemealannotator and secondarybases to archive.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2195 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-01 05:58:20 +00:00
ebanks
2c16c18a04
Move Andrey's old indel code (plus MSG accuracy test, which depends on it) to archive.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2194 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-01 05:29:00 +00:00
depristo
af22ca1b47
Bug fixes for VariantEval. dbCoverage now reports dbSNP rate, not some wierd eval_snps_in_db as before. We now separate non-indel and non-snp db sites in dbcoverage. Some dbSNP records don't fit into these two categories. Also fixed a consistency issue where novel / known sites where being determined solely by whether dbSNP had a record there, rather than the stricter dbcoverage screen for isSNP().
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2180 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-30 01:39:01 +00:00
chartl
27651d8dc2
Oops. numReads is now called size
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2175 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-29 06:59:17 +00:00
chartl
21744e024b
Quick walker that determines % of bases covered at (user - defined depth)x . I've been maintaining it in my directories alone, but now that i've accidentally deleted it twice, into playground it goes
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2174 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-29 06:51:19 +00:00
depristo
db40e28e54
ReadBackedPileup in all its glory. Documented, aligned with the output of LocusIteratorByState, and caching common outputs for performance
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2165 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-25 20:54:44 +00:00
aaron
cfbd9332b0
small cleanups for the GATK paper genotyper; switched to the managed output system.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2156 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-25 08:04:13 +00:00
depristo
03342c1fdd
Restructuring and interface change to ReadBackedPileup. We now lower support the Pileup interface, the BasicPileup static methods, and the ReadBackedPileup class. Now everything is a ReadBackedPileup and all methods to manipulate pileups are off of it. Also provides the recommended iterable() interface of pileup elements so you can use the syntax for (PileupElement p : pileup) and access directly from p.getBase() and p.getQual() and p.getSecondBase(). Only a few straggler walkers use the old style interface -- but those walkers will be retired soon. Documentation coming in the AM. Please everyone use the new syntax, it's safer, and will be more efficient as soon as the LocusIteratorByState directly emits the ReadBackedPileup for the Alignment context, as opposed to the current interface. In the process of the change over, discovered several bugs in the second-best base code due to things getting out of sync, but these changes were resolved manually. All other integrationtests passed without modification.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2154 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-25 03:51:41 +00:00
andrewk
3fca23cd16
Added a stub treeReduce function for debugging multi-threaded execution.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2146 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-24 18:51:19 +00:00
andrewk
e4546f802c
Accumulates coverage across hybrid selection bait intervals to assess effect of bait adjacency. Requires input bait intervals that have an overhang beyond the actual bait interval to capture coverage data at these points. Outputs R parseable file that has all data in lists and then does some basic plotting.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2144 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-24 18:12:34 +00:00
andrewk
e5106c9924
Hybrid selection performance statistics now include counts of the number of adjacent baits (0,1,2) using OverlapDetector and optionally include assayed bait quantities input via interval lists.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2143 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-24 18:07:23 +00:00
ebanks
c90bea39a1
read.getReadString().charAt(offset) --> read.getReadBases()[offset]
...
[As a courtesy I fixed all instances once I was updating GenotypeLikelihoods]
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2136 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-24 04:25:19 +00:00
rpoplin
1d46de6d34
The old recalibrator is replaced with the refactored recalibrator. Added a version message to the logger output. These walkers start at version 2.0.0
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2117 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-23 14:58:33 +00:00
rpoplin
b24240664f
Reduced the number of calls to new ArrayList() in TableRecalibration. This results in a speed up of perhaps up to 6 percent (timed trials are hard).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2112 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-22 17:24:31 +00:00
rpoplin
98f921fe24
The refactored CountCovariates now hashes the read object into a HashMap which holds all the properties the covariates pull out of the read over and over again such as read group string, bases string and its complement string, quality scores, etc. This results in a big speed up. CountCovariatesRefactored is now just slightly slower than CountCovariates (perhaps 1.07x according to my latest time trial). Thanks to Alec for suggesting IdentityHashMap. CycleCovariate now warns the user that is is defaulting to the Solexa definition of cycle when the platform string pulled out of the read is unrecognized instead of halting with an Exception.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2108 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-21 20:38:17 +00:00
ebanks
b434c1c240
Check for null entries before adding
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2099 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-20 03:12:20 +00:00
aaron
33dcfc858d
updates to the paper genotyper based on Mark's comments. There's still more work to do, including more testing.
...
Also a 250% improvement in the getBases() and getQuals() of BasicPileup, which was nearly all of the runtime for the genotyper (using primitives instead of objects when possible).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2097 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-19 23:06:49 +00:00
rpoplin
22aaf8c5e0
Added the old recalibrator integration tests to the refactored recalibrator sitting in playground.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2096 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-19 22:43:28 +00:00
aaron
6ba1f3321d
Fixed the sample mix-up bug Kiran discovered, and added a unit test in the VCF reader class (Thanks for the good example files Kiran). Also renamed the toStringRepresentation function to toStringEncoding, and added a matching method in VCFGenotypeRecord.
...
Updated the integration tests that were failing to due to different ordering of genotyping entries in VCF, I'll check in the VCF diff tool I wrote when I get a cycle or two.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2092 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-19 18:17:47 +00:00
chartl
b4babb82eb
adding an extra bit of data to come out of CTT (number of chips with actual data)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2091 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-19 17:46:10 +00:00
alecw
b2b4ff7eca
Cache SAMReadGroup rather than get it twice
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2087 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-19 17:27:18 +00:00
depristo
eeb3a3fffb
comments for Aaron
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2081 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-19 12:56:04 +00:00
aaron
7997455f38
first go of the genotyper for the GATK paper. More testing and review tomorrow to call it done.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2080 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-19 07:55:24 +00:00
rpoplin
0fbd81766b
CountCovariates now uses any rod of type VariationRod with the name dbsnp as the source of known variant sites to skip over. It also grabs the platform string out of the read group when deciding which algorithm to use to calculate machine cycle. In this way it can now handle multi-platform bams. I added a new covariate: PositionCovariate. This is simply the offset regardless of which platform the read came from. This will be useful for comparing between the two covariates. Finally, this message serves as a warning that I will be killing the old recalibrator tomorrow after I've updated and verified new integration tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2077 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-18 23:03:47 +00:00
chartl
405c6bf2c1
VariantEval genotype concordance for pools! Integration test coming soon
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2071 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-18 17:24:54 +00:00
depristo
6fe1c337ff
Pileup cleanup; pooled caller v1
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2070 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-18 17:03:48 +00:00
rpoplin
f0a234ab29
TableRecalibration is now much smarter about hashing calculations, taking advantage of the sequential recalibration formulation. Instead of hashing RecalDatums it hashes the empirical quality score itself. This cuts the runtime by 20 percent. TableRecalibration also now skips over reads with zero mapping quality (outputs them to the new bam but doesn't touch their base quality scores).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2069 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-18 16:47:44 +00:00
chartl
be31d7f4cc
Added - a walker that outputs relevant information about false negatives given a bunch of hapmap individuals and corresponding integration tests for it.
...
This will output for hapmap variant sites:
chromosome position ref allele variant allele number of variant alleles of the individuals depth of coverage power to detect singletons at lod 3 number of variant bases seen whether or not variant was called
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2068 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-18 15:47:52 +00:00
rpoplin
ec1a870905
Working with byte arrays is faster than working with Strings so the Covariates now take in byte arrays. None of the Covariates themselves used the reference base so I removed it. DinucCovariate now returns a Dinuc object which implements Comparable instead of returning a String because it was too slow. CountCovariates now uses a read filter to filter out unmapped reads and allows the user to specify -cov all which will use all of the available covariates, of which there are 7 now. If no covariates are specified it defaults to ReadGroup and QualityScore, the two required covariates. Initial code in place to leave SOLID bases alone if they have bad color space quality. TableRecalibration uses @Requires to tell the GATK to not give the reference bases since they weren't being used for anything.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2062 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-17 21:50:52 +00:00
rpoplin
eb07c7f7f8
CountCovariates now warns the user if they didn't supply a dbSNP rod file. Thanks Kiran for the use case.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2054 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-16 18:44:54 +00:00
kiran
97ed945797
Example code for a bug in the VCF implementation. See JIRA entry at http://jira.broadinstitute.org:8008/browse/GSA-225
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2050 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-15 09:27:12 +00:00
rpoplin
88fd762436
The -rf argument is now being used for read filter and is colliding with my walkers. Changed mine to -recalFile
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2048 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-14 19:37:46 +00:00
rpoplin
b05119987c
Clarified some of the comments in the individual covariates now that things have been moved around to speed up the code. In general most error checking and adjustments to the data are done per read instead of per base. This means that functionality was moved out of the covariate modules and into CovariateCounterWalker and TableRecalibrationWalker.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2047 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-14 18:44:05 +00:00
rpoplin
672472789e
Added some documentation to the helper classes. Fixed an error case in TableRecalibrationWalker.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2046 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-14 18:13:43 +00:00
rpoplin
d1b525b428
Default window size for NQS covariate is 3
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2040 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-13 19:24:27 +00:00
rpoplin
394c839974
Implemented NQS covariate. Extended Cycle covariate to handle 454 and SOLID reads. Added a Primer Round covariate for SOLID reads.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2039 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-13 19:22:21 +00:00
rpoplin
b1376e4216
structure refactored throughout for performance improvements
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2036 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-13 15:41:09 +00:00
mmelgar
72825c4848
A walker that generates a table of secondary base counts in a bam file.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2031 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-13 02:11:23 +00:00
ebanks
61b5fb82ce
2 major changes:
...
1. Add dbsnp RS ID to VCF output from genotyper; to do this I needed to fix the dbsnp rod which did not correctly return this value.
2. Remove AlleleBalanceBacked and instead generalize the arbitrary info fields backing VCFs (and potentially others) in preparation for refactoring VariantFiltration next week.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2028 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-12 22:51:49 +00:00
ebanks
578dcc54a4
Don't create a record if ref=N
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2018 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-11 04:32:17 +00:00
rpoplin
a13cbe1df0
The refactored recalibrator now passes the integration tests as well as my own validation tests. I'm ready to have other people start jamming on the files. I'll make an updated wiki page soon. The refactored recalibrator is currently a bit slower than the old one but there were a lot of great, easy ideas today for how to improve it.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2013 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-10 22:20:06 +00:00
rpoplin
1e7ddd2d9f
Added a validateOldRecalibrator option to CovariateCounterWalker which reorders the output to match the old recalibrator exactly. This facilitates direct comparison of output. Changed the -cov argument slightly to require the user to specify both ReadGroupCovariate and QualityScoreCovariate to make it more clear to the user which covariates are being used. Some speed up improvements throughout.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2010 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-10 15:55:56 +00:00
ebanks
2fa2ae43ec
Enough people have found this useful, so...
...
Moving Callset Concordance tool to core and adding integration test.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2003 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-09 20:59:18 +00:00
ebanks
3793519bd4
-Added convenience method to VCF record to tell if it's a no call and have rodVCF use it before querying for info fields
...
-Don't restrict info fields to 2-letter keys
[about to move these to core]
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2002 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-09 20:52:51 +00:00
rpoplin
740a5484c4
Added some documentation to the code, mostly especially to CovariateCounterWalker but various comments added throughout. Also changed the HashMap data structure to accept an estimated initial capacity. This had a very modest improvement to the speed.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2001 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-09 20:13:56 +00:00
ebanks
74751a8ed3
-Some minor fixes to get accurate vcf record merging done
...
-Improvement to snp genotype concordance test
And with that, it looks like I get revision #2000 .
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2000 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-09 06:40:55 +00:00
ebanks
ab705565cf
Completely refactored the Callset Concordance code. Now, it takes in VCF rods and emits a single VCF file which has merged calls from all inputs and is annotated (in the INFO fields) with the appropriate concordance test(s).
...
Still needs a bit of polish...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1999 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-09 05:03:13 +00:00
kiran
7fde6c0bf4
One more output tweak.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1996 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-09 04:42:55 +00:00
kiran
00a7113d7a
Tweaks to formatting of output table.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1995 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-09 04:33:36 +00:00
kiran
95d381efe2
Optionally computes the error rate using the best base and a random base.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1991 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-08 16:47:34 +00:00
kiran
a679bdde18
FindContaminatingReadGroupsWalker lists read groups in a single-sample BAM file that appear to be contaminants by searching for evidence of systematic underperformance at likely homozygous-variant sites.
...
Procedure:
1. Sites that are likely homozygous-variant but are called as heterozygous are identified.
2. For each site and read group, we compute the proportion of bases in the pileup supporting an alternate allele.
3. A one-sample, left-tailed t-test is performed with the null hypothesis being that the alternate allele distribution has a mean of 0.95 and the alternate hypothesis being that the true mean is statistically significantly less than expected (pValue < 1e-9).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1989 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-08 16:36:39 +00:00
kiran
2225d8176e
A convenience class for maintaining a dynamically growing table of values with access to the elements by named row and column identifiers.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1988 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-08 16:34:35 +00:00
rpoplin
84ba604611
Sequential quality score calculation is now in place in the refactored recalibrator and matches the quality scores calculated by the old recalibrator exactly; at least on the small sets of data used so far. Validation, documentation, and optimization work is on going.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1985 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-07 15:55:16 +00:00
depristo
bf1bc94060
Fixes for PooledConcordance bugs and lack of safety checking
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1984 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-07 01:54:10 +00:00
rpoplin
66d4a995e6
Initial check in of refactored Recalibrator. The new walkers are called CountCovariatesRefactored and TableRecalibrationRefactored. More work is needed to finish up the sequential calculation and to document the code sufficiently. These files are not ready to be used by other people quite yet.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1982 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-06 22:33:55 +00:00
ebanks
0a55fa5bb1
Completely refactored the Genotype Concordance module(s).
...
Now PooledConcordance and GenotypeConcordance inherit from the same super class (and can therefore share data structures and functionality). Also, they now use ConcordanceTruthTable to keep track of necessary info.
GenotypeConcordance passes integration tests.
PooledConcordance needs to be finished by Chris.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1979 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-06 16:27:16 +00:00
ebanks
d549347f25
Refactored GenotypeLikelihoods to use an underlying 4-base model.
...
It needs to be modified a bit and then hooked up to a pooled model, but that is now possible.
At this point, there is no difference to the Unified Genotyper.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1978 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-05 21:59:25 +00:00
jmaguire
4d3871c655
don't flush anymore.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1977 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-05 19:11:51 +00:00
depristo
5d5dc989e7
improvements to VCF and variant eval support of VCF -- now listens to the filter field
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1963 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-03 12:09:30 +00:00
ebanks
3a33401822
2nd stage of the genotyper output refactoring is complete.
...
Now, all output is generalized and all of the intelligence lies where it is supposed to.
Next stage is syncing up old and new models and making sure we're outputting exactly what we should.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1960 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-02 22:43:08 +00:00
ebanks
af6d0003f8
-Generalized the GenotypeConcordance module to deal with any number of individuals (although it will default to its old behavior if the -samples argument is left out).
...
-Make rods return the appropriate type of Genotype calls from getGenotype().
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1954 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-01 05:35:47 +00:00
depristo
7d0ac7c6f2
Fix for long-term VariantEval bug plus new intergration test to catch it
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1951 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-31 00:00:33 +00:00
ebanks
51fffc7f69
Comments for Ryan (which also apply to ReadQualityScoreWalker).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1944 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-30 14:44:04 +00:00
ebanks
ccd7440730
We can actually make this a bit simpler (and faster)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1943 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-30 04:21:03 +00:00
ebanks
1b6333e4ab
Enough people have asked for this that it just needed to get written.
...
One can now split up any number of sets into an N-way Venn (although it doesn't check for discordance in the calls, so you'll still want to use SimpleVenn for 2-way comparisons).
Wiki docs are updated.
To do: update to use Ryan's generic hash map when it's ready for public use.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1942 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-30 04:08:45 +00:00
ebanks
4bdb5b03bd
tell UnifiedGenotyper to return calls at all bases
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1941 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-30 03:10:44 +00:00
ebanks
4ee1d6f733
-Have the calculation models determine whether a call passes the lod/confidence thresholds (as opposed to returning everything and letting the UG decide); this way, walkers which call map() will get only the good calls.
...
-Do the right thing in all models for all-base-mode (for Kiran).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1940 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-30 02:35:51 +00:00
ebanks
64ac956885
Okay, I caved in:
...
CallsetConcordance now gets possible concordance types by looking at classes that implement ConcordanceType instead of having them hard-coded in.
Thanks to Kiran this was pretty easy...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1939 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-30 00:32:26 +00:00
ebanks
3091443dc7
Sweeping changes to the genotype output system, as per several discussions with Matt & Aaron.
...
Some things still need to be changed, but it will entail some more design decisions first (which means I get to bug M&A again tomorrow!).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1930 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-29 03:46:41 +00:00
chartl
c4359bc340
Whoops. Forgot the implements.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1927 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-28 19:59:57 +00:00
chartl
863d3023d5
IndelCounterWalker -- a new little walker that counts indels over a region (want to see what kind of havoc BWA may be resulting in). Don't know when BasicPileup.indelPileup() was written, but kudos to whoever wrote it.
...
BTTJ - remove 'N's from previous base analysis -- even if both read and ref are 'N' (which does happen, occasionally)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1925 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-28 19:50:50 +00:00
aaron
04e9a494e9
removed the GenotypesBacked interface, which is currently unused. Also cleaned up some documentation lines
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1924 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-28 18:08:14 +00:00
rpoplin
06ff81efe5
Added NeighborhoodQualityWalker.java and ReadQualityScoreWalker.java which are used to calculate a read quality score based on attributes of the read and the reads in the neighborhood.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1922 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-28 13:24:11 +00:00
depristo
68fa6da788
Initial graph-based reference implementation and alignment assessor. Not suitable for public use
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1921 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-27 21:54:47 +00:00
depristo
31d143a841
now only needs READS
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1920 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-27 21:54:14 +00:00
chartl
4192b093b8
More robust error handling with parallelization + usePreviousBase. Added forceReadBasesToMatchRef to use in conjunction with nPreviousReadBases as a less stringent approximation of usePreviousBases (requiring previous pileups only had mismatches, and that read mapping quality be high was throwing everything away)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1916 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-27 17:20:44 +00:00
chartl
31d5df2859
Previous base now checks that the read matches the reference in the previous base window.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1915 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-27 15:58:20 +00:00
ebanks
e96b1791ab
Need to check for biallelic snp or exception gets thrown.
...
Also, update to new tracker calls.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1913 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-27 02:43:43 +00:00
chartl
62c1001790
BTTJ is now correct. What a terrible waste of time, turns out I'd just reversed the header. Because of this the MD5 had to be updated in the tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1910 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-26 19:24:18 +00:00
sjia
24c7f694e6
Handles allele frequencies for any specified population, changed user input for mismatch filter options
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1909 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-25 22:51:56 +00:00
chartl
db9419df49
@ Hack to allow output from onTraversalDone()
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1908 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-25 15:19:04 +00:00
depristo
b4f55df600
Bugfix for Jason F
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1906 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-24 22:09:27 +00:00
aaron
ad1fc511b1
intermediate commit for some changes in the Variation system, so Eric can go ahead with his changes. Everything is pretty set, but the Variation interface could use a convenience method that joins all the alternate alleles.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1903 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-23 06:31:15 +00:00
chartl
a6dc8cd44e
BTTC is now Tree Reducible allowing for parallelization.
...
Integration test comment changed to reflect actual date of last md5 update.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1901 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-22 23:19:29 +00:00
chartl
af761fb9bd
Base transition table now forces epsilon/3 (three-state) model for the unified genotyper. Verified to be identical with changing the default model to being epsilon/3. This of course changes the observed counts, so the integration test has been updated.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1897 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-21 21:18:26 +00:00
chartl
8e3f72ced9
BTTJ - Code refactoring (major) - passes integration test
...
VariantEvalWalker - whoops, wrote PooledGenotypeAnalysis rather than PooledAnalysis, now passes tests again
- PooledFrequencyAnalysis - don't bother initializing matrices if this isn't a pool
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1895 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-21 19:04:51 +00:00
depristo
15a1849758
notes for chartl
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1894 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-21 18:31:31 +00:00
chartl
77863d4940
@PowerBelowFrequency
...
+ Changes to doc
@ BasicPoolVariantAnalysis
+ use char rather than ReferenceContext
+ calculate # alleles
@ PooledFrequencyAnalysis
+ breakdown of call metrics by estimated number of alleles in pool
@ VariantEvalWalker
+ add PooledFrequencyAnalysis to analysis set
@ PooledGenotypeConcordance
+ correctly calculate maximal allele frequency for output
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1893 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-21 15:17:11 +00:00
chartl
967128035e
Make command like args default to false.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1892 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-21 13:59:35 +00:00
depristo
caa3187af8
Enabling correct high-performance ROD walker and moved VariantEval over to it. Performance improvements in variantEval in general. See wiki for full description
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1890 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-20 23:31:13 +00:00
chartl
4a8a6468be
Use read group as a condition for confusion tables. With an integration test.
...
Changed BaseTransitionTable to comparable objects for consistent ordering of output
( e.g. so the integration test doesn't yell so much )
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1889 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-20 19:39:32 +00:00
chartl
b83df5616a
Change for lower-case references (always compare upper case bases)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1888 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-20 17:36:31 +00:00
chartl
3b1fabeff0
Major code refactoring:
...
@ Pooled utils & power
- Removed two of the power walkers leaving only PowerBelowFrequency, added some additional
flags on PowerBelowFrequency to give it some of the behavior that PowerAndCoverage had
- Removed a number of PoolUtils variables and methods that were used in those walkers or simply
not used
- Removed AnalyzePowerWalker (un-necessary)
- Changed the location of Quad/Squad/ReadOffsetQuad into poolseq
@NQS
- Deleted all walkers but the minimum NQS walker, refactored not to use LocalMapType
@ BaseTransitionTable
- Added a slew of new integration tests for different flaggable and integral parameters
- (Scala) just a System.out that was added and commented out (no actual code change)
- (Java) changed a < to <= and a boolean formula
Chris
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1887 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-20 14:58:04 +00:00
aaron
4be6bb8e92
added a check to ensure the eval track variation is bi-allelic. Also changed some string constants over to enums. For some reason my check-ins from home wouldn't work last night, so this is the actual changes for 1884.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1886 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-20 14:15:33 +00:00
depristo
449a6ba75a
Deleting lots of code as part of my cleanup. More classes tagged for removal. Many more walkers have their days numbered.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1885 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-20 12:23:36 +00:00
aaron
d749a5eb5f
added a check to ensure the eval track variation is bi-allelic. Also changed some string constants over to enums
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1884 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-20 04:56:51 +00:00
depristo
a8a2c1a2a1
Replaced SSG with UG in packaging utils. Minor performance and formatting improvements for ClipReads
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1882 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-20 01:19:58 +00:00
depristo
2a26bb42dd
Softclipping support in clip reads walker. Minor improvement to WalkerTest -- now can specify file extensions for tmp files. Matt -- I couldn't easily create non-presorted SAM file. The softclipper has an impact on this.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1878 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-19 21:54:53 +00:00
chartl
055a99fb05
Change in ordering for a disjunctions. Walker will no longer try to calculate number of simple mismatches in the pileup if the pileup includes 'N's.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1877 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-19 18:24:14 +00:00
chartl
3d50c72d74
Forgot a dumb little System.out.println. You will be flooded with "This read will not be used." statements until, overwhelmed, you give in to my demands.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1874 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-19 16:13:48 +00:00
chartl
225ef52973
Now produces same output as the Scala walker for unconditioned tables (no 2bb, no previous base, etc.)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1873 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-19 16:10:44 +00:00
depristo
d6385e0d88
simpleComplement function() in BaseUtils. Generic framework for clipping reads along with tests. Support for Q score based clipping, sequence-specific clipping (not1), and clipping of ranges of bases (cycles 1-5, 10-15 for example). Can write out clipped bases as Ns, quality scores as 0s, or in the future will support softclipping the bases themselves.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1868 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-16 22:29:35 +00:00
chartl
ad777a9c14
@BasicPileup - made the counts public so they can be used
...
@PoolUtils - split reads by indel/simple base
@BaseTransitionTable - complete refactoring, nicer now
@UnifiedArgumentCollection - added PoolSize as an argument
@UnifiedGenotyper - checks to ensure pooled sequencing uses the appropriate model
@GenotypeCalculationModel - instantiates with the new PoolSize argument
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1867 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-16 21:56:56 +00:00
andrewk
bdb34fcf38
Updated integration tests for VariantEval. Hooray for IT!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1866 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-16 20:00:29 +00:00
andrewk
d1a4cd2f73
Added ValidationData analysis type to VariantEvalWalker; this eval takes a GFF file with validated truth data positions (bound to "validation")and calculates the accuracy of the genotype calls bound to "eval".
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1862 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-16 15:39:08 +00:00
ebanks
418e007ca6
A cleaner interface: now everyone can use UG's initialize method
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1860 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-16 14:09:16 +00:00
aaron
96972c3a5c
a fix for a bug Eric found: if your first call contains fewer samples than calls at other loci, your VCFHeader got setup incorrectly.
...
Also moved a buch of Lists over to Sets for consistancy.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1859 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-16 04:57:50 +00:00
aaron
a69ea9b57c
Cleaning up the VCF code, adding lots of tests for a variety of edge cases. Two issues are still outstanding: updating the no call string with the standard 1000g decided on today, and fixing Eric's issue where not all the VCF sample names are present initially.
...
also: their, I hope your happy Eric, from now on I'll try not to flout my awesomest grammer in the future accept when I need to illicit a strong response :-)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1858 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-16 04:11:34 +00:00
chartl
b9544d3f89
Output formatting change (very slight)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1854 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-15 16:47:29 +00:00
kcibul
79993be46c
changed blank gene name to UNKNOWN
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1851 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-15 13:47:00 +00:00
ebanks
a32470cea1
Deal with the fact that walkers can call UG's init/map functions directly.
...
We need to filter contexts in that case since the calling walkers don't get UG's traversal-level filters.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1848 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-15 02:31:45 +00:00
ebanks
e740e7a7ce
Because walkers call UG's map function, we need to move the actual writing out
...
to UG's reduce function.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1845 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-14 20:49:26 +00:00
kcibul
825e6c7a4d
added calculation for bases over 2x,10x,20x,30x plus gene name
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1844 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-14 20:32:26 +00:00
chartl
1f66738c8e
Fix a hashing function bug. Ignore reads with non-reference bases in the pileup.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1842 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-14 19:41:26 +00:00
ebanks
52d2e0ca07
All walkers now use read.getReadGroup()
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1839 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-14 19:27:40 +00:00
chartl
0a09fa4d5c
Rename to distinguish this transition table calculator from the scala version.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1838 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-14 18:52:21 +00:00
chartl
1d055011bd
Getting rid of this so I can rename it without the world blowing up.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1837 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-14 18:45:11 +00:00
ebanks
0c95d6906f
Merge both versions of the Sequenom assay design maker: use Jared's base code and add in indels. [Jared, this still emits the same output for SNPs as your original version)
...
Remove all sequenom stuff from the FastaAlternateReferenceMaker so it can just concentrate on making alternate references...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1831 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-14 17:11:45 +00:00
ebanks
49af5269e5
Jared: feel free to change or revert, but until we move over to UG version...
...
Only print out positions with at least one non-ref call
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1830 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-14 17:08:57 +00:00
chartl
f5a2e6dd50
Fix!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1829 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-14 16:15:20 +00:00
chartl
8d0e057d83
I got bored today and decided to write the confusion matrix calculator. At present it is untested. I'm submitting it to subversion to make sure
...
I have previous revision to revert back to.
This is a calculator that will calculate:
P[ True base is X | read base mismatches, secondary base is Y, previous K bases are Z1,Z2,...ZK ]
where the number of pervious reference bases to take into account is user-defined. The secondary base is optional as well.
--usePreviousBases k
tells the walker to use the k previous reference bases in the transition table
--useSecondaryBase
tells the walker to use the secondary base at a locus in the transition table
these can be used together.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1816 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-13 02:55:29 +00:00
chartl
ec83bc6ec5
This somehow didn't make it into subversion the last time.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1814 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-12 21:11:13 +00:00
chartl
ecbb11e017
Modified PowerBelowFrequency to ignore reads below a user-defined mapping quality. Request from Jason Flannick.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1813 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-12 20:59:24 +00:00
chartl
ec68ae3bc5
Added a filter that will split the read set by a threshold of mapping quality (Request from Jason Flannick)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1812 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-12 20:58:37 +00:00
chartl
0d73fe69e7
Recalibrator by NQS. Had this puppy running all afternoon. Thing had got through 100,000,000 reads before I decided to delete my sting tree. *sigh*, a little more delay.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1811 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-12 20:55:02 +00:00
chartl
ee0afba0af
Recalibration stuff...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1810 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-12 20:51:39 +00:00
aaron
62c484b57a
Fixes for GSA-201, where enumerated types in command line arguments had to be defined as all uppercase for the system to work.
...
Also a little playground walker that changes the sort order flag of a BAM file.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1805 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-09 18:11:32 +00:00
jmaguire
d9f5a314ac
avoid an out of memory error by no putting more than 5000 reads in the cache. on pilot1 at least those are crazy loci anyway.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1802 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-09 14:56:55 +00:00
chartl
6d7f4481e4
Changed traversal type slightly
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1800 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-09 04:11:48 +00:00
ebanks
a9f3d46fa8
Your time has come, SSG.
...
Fare thee well.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1799 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-08 20:27:56 +00:00
jmaguire
8fdb8922b8
now output in the exact format that works with sequenom software.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1798 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-08 20:06:27 +00:00
aaron
98e3a0bf1a
VCF can now be emitted from SSG. The basic's are there (the genotype, read depth, our error estimate), but more fields need to be added for each record as nessasary.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1797 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-08 19:50:04 +00:00
kiran
94d82d1915
Matthew Bainbridge's duplicate removal utility for 454 data. This code should eventually be moved into a read walker. For now, it's being introduced into the repository as-is (well, with one minor change to make the handling of command-line arguments a little more straightforward).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1794 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-08 18:32:37 +00:00
chartl
f89a89ffe3
Use of AlleleFrequency as an input to PowerAndCoverage is deprecated by the new walker. Reverting to the standard "power at 1 allele" calculation.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1788 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-08 16:07:45 +00:00
chartl
ae05f5c7ad
Fixin the header.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1787 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-08 15:49:28 +00:00
chartl
11ff1e09b8
A new power walker for the user to feed in a number of alleles. Call that number k. Output is:
...
Locus Power_for_k_alleles Power_for_k-2_alleles Power_for_k-2_alleles ... Power_for_1_allele
This was a request from Jason Flannick & the T2DB group.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1786 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-08 15:35:35 +00:00
jmaguire
32128e093a
misc. changes to get the numbers back to the baseline while keeping the speedup.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1784 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-08 12:27:07 +00:00
jmaguire
d38a0d04b9
fix a snp mask offset error.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1783 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-08 12:25:40 +00:00
jmaguire
02d2492d68
Simple tool for picking sequenom probes for SNPs. Can be extended to indels if necessary.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1780 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-07 23:46:41 +00:00
sjia
5bdcc2b4dc
Included HLA class 2 genes in CreatePedFileWalker
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1776 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-07 18:46:51 +00:00
sjia
8f896b734f
Included HLA class 2 genes in CreatePedFileWalker
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1775 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-07 18:28:01 +00:00
chartl
225b9bccc1
Modifications to NQSClusteredZScoreWalker to output empirical mismatch rates on bins by both Z-score and reported Q-score, rather than averaging over all Q-score bins for each Z-score.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1773 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-07 13:45:12 +00:00
depristo
8dd0924b37
Minor performance improvements to VariantEval -- now all of the CPU time is spent dealing with the ROD system...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1772 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-06 23:40:30 +00:00
aaron
3aec76136f
Removing the AllelicVariant interface, which is replaced by the Variation interface.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1770 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-06 17:44:24 +00:00
depristo
1bd0c3c145
variant eval allows non Variation rod objects
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1768 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-06 13:04:26 +00:00
sjia
98076db6b4
Modified CreatePedFileWalker to output PED file given HLA allele names
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1763 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-05 03:06:42 +00:00
chartl
7605ee500c
Idiocy! All tests were being disabled because I forgot the instanceof
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1760 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-02 20:04:56 +00:00
chartl
88d0890cc3
Made PooledGenotypeConcordance a standard test in VariantEval
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1759 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-02 20:03:31 +00:00
chartl
68cb2ee54b
Tweaks to parameters for NQS analysis walkers; change to PowerAndCoverage for Jason Flannick (can input the number of alleles to compute power for - i.e. doubletons, tripletons; rather than statically checking singletons.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1757 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-02 19:11:27 +00:00
aaron
e885cc4b21
changes for corrected GLF likelihood output, along with better tests
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1754 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-01 20:45:05 +00:00
hanna
2309d19f6f
Bug fix from Michael Ross: mark second read in sequence as second of pair.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1753 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-01 14:34:36 +00:00
aaron
b1c321f161
Adjusted Genotype concordance to more accurately use the new Genotyping code, fixed the VCF rod, and temp. fix the build by reintroducing Shermans ReadCigarFormatter
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1745 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-29 21:28:21 +00:00
sjia
9b78a789e2
HLA Caller 2.0 Walkers:
...
CalculateBaseLikelihoodsWalker.java walks through reads calculates likelihoods using SSG at each base position
CalculateAlleleLikelihoodsWalker.java walks through HLA dictionary and calculates likelihoods for allele pairs given output of CalculateBaseLikelihoodsWalker.java
CalculatePhaseLikelihoodsWalker.java walks through reads and calculates likelihoods score for allele pairs given phase information
File Readers:
BaseLikelihoodsFileReader.java reads text file of likelihoods outputted by SSG
FrequencyFileReader.java reads text file of HLA allele frequencies
PolymorphicSitesFileReader.java reads text file of polymorphic sites in the HLA dictionary
SAMFileReader.java reads a sam file (used to read HLA dictionary when in another walker)
SimilarityFileReader.java reads a text file of how similar each read is to the closest HLA allele (used to filter misaligned reads)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1744 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-29 20:45:55 +00:00
chartl
281a77c981
Bugfix. isMismatch() was actually computing isMatch().
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1743 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-29 20:04:59 +00:00
chartl
e28b45688c
More NQS Related Walkers to play with
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1742 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-29 20:01:04 +00:00
andrewk
6134f49e3c
Convert de novo SNP caller to run using parent1 and parent2 BAM files (by splitting contexts by reader using getMergedReadGroupsByReaders) instead of geli files providing a large speed-up and obviating the need for large whole-genome geli files.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1738 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-29 06:42:21 +00:00
andrewk
5662a88ee1
Cosmetic change to list sampling functions: the typical usage of n and k were reversed. No change in functionality of the classes has been made and unit tests still pass.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1736 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-28 18:12:32 +00:00
aaron
39598f1f0a
switching the concordance walker over to the new Variation system
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1735 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-28 15:46:36 +00:00
asivache
92c6efabb7
moving IndelGenotyper out of playground
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1732 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-25 19:44:49 +00:00
chartl
fe6d810515
Some basic commits that I've been sitting on for a while now:
...
@ PooledGenotypeConcordance - changes to output, now also reports false-negatives and false-positives as interesting sites. It's been like this in my directory for ages, just never committed.
@NQSExtendedGroupsCovariantWalker - change for formatting.
@NQSTabularDistributionWalker - breaks out the full (window_size)-dimensional empirical error rate distribution by the window. So if you've got a window of size 3; the quality score sequences 22 25 23 and 22 25 24 have their own bins (each of the 40^3 sequences get one) for match and mismatch counts.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1730 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-25 19:35:50 +00:00
sjia
f7684d9e1b
ImputeAllelesWalker fills missing portions of HLA dictionary based on best allele matches
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1729 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-25 18:51:46 +00:00
sjia
235de38c2e
Updates to FindClosestAlleleWalker and CreateHaplotypesWalker
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1728 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-25 16:41:58 +00:00
aaron
7ffc1d97ef
Cut DeNovoSNPWalker over to the new Variation system, some renaming of methods on the Variation interface, and some corrections on the interface.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1724 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-25 04:35:52 +00:00
depristo
392152f149
1000x performance improvements to MSG for crisis control
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1723 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-24 23:44:33 +00:00
aaron
d262cbd41c
changes to add VCF to the rod system, fix VCF output in VariantsToVCF, and some other minor changes
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1715 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-24 15:16:11 +00:00
sjia
1ee8ba590c
Reads cigar files
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1713 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-24 03:14:10 +00:00
sjia
9422156e09
Finds closest allele for each read in bam file
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1712 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-24 03:12:20 +00:00
sjia
5c5151c4e7
Creates ped file from reads
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1711 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-24 02:48:29 +00:00
sjia
b446b3f1b6
CreateHaplotypeWalker now gives correct output
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1709 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-23 21:13:52 +00:00
sjia
3916e165fb
New walker to output haplotypes for each read (for SNP analysis or imputation, etc)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1707 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-23 20:26:43 +00:00
chartl
63f3d45ca4
fixing the build
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1705 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-23 20:04:09 +00:00
chartl
540e1b971f
And we fix one boneheaded mistake, which was actually causing the problem; though the last change was still correct.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1704 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-23 19:26:45 +00:00
chartl
124ca68fa8
And an IMMEDIATE minor fix (want neighborhood quality > base quality to be represented correctly)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1703 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-23 19:21:09 +00:00
chartl
8cdb78ebee
More sophisticated version of the NQSCovariantWalker - modified to be more explicit about how much higher the
...
quality score of a particular base is than the quality score of its neighbors. The granularity of the binning
jumps from 32 groups to 860 groups.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1702 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-23 19:18:24 +00:00
aaron
f783cb30e0
adding an interface so that the current @Requires with ROD annotations work in walkers like VariantEval
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1700 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-23 18:24:05 +00:00
asivache
fa87dd386d
Now uses rodRefSeq in its new reincarnation
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1698 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-23 18:19:36 +00:00
asivache
fe36289e44
Noone needs this, probably... Old experimental code.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1695 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-23 18:11:50 +00:00
sjia
aa66074a0e
Compares each read to the HLA dictionary and outputs closest allele, as well as other stats
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1693 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-23 16:17:23 +00:00
aaron
11c32b588f
fixing VariantEvalWalkerIntegrationTest md5 sums, a couple comment changes, and a little bit of cleanup
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1690 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-22 20:54:47 +00:00
sjia
22932042ea
Combined Scores, bug fixed for printing HLA-C
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1685 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-22 18:28:16 +00:00
asivache
d7d0b270d1
now supports blacklisting lanes (with -BL option will ignore reads from any of the specified lanes)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1682 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-22 16:46:57 +00:00
asivache
fb09835ef8
Changed to accomodate new ROD system
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1671 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-21 17:10:56 +00:00
asivache
f4d270cba4
These classes now use BrokenRODSimulator class to pass the test. CHANGE the code to use new ROD system directly and MODIFY MD5 in corresponding tests, since a few snps are seen differently now.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1669 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-21 17:03:15 +00:00
aaron
3a487dd64e
little fixes; also fixed a tyPo
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1662 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-18 22:38:51 +00:00