depristo
56f769f2ce
Output improvements to GenotypeConcordance calculations
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1331 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-29 12:54:46 +00:00
ebanks
72dda0b85c
Fixed calculations for Mark
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1330 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-29 03:21:43 +00:00
ebanks
f0378db9b7
added accuracy numbers
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1329 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-29 01:38:33 +00:00
ebanks
a5a56f1315
At this point, we are convinced that the new priors are the way to go...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1328 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-28 17:25:25 +00:00
depristo
df4fd498c5
Improvements and bug fixes galore. (1) Now properly handles Q0 bases, filtering them out, you can disable this if you need to (2) support for three-state base probabilities (see email), which is disabled by default (still experimental) but appears to be more emppowered to detect variants (see email too)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1327 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-28 13:21:38 +00:00
depristo
46643d3724
Improvements and bug fixes galore. (1) Now properly handles Q0 bases, filtering them out, you can disable this if you need to (2) support for three-state base probabilities (see email), which is disabled by default (still experimental) but appears to be more emppowered to detect variants (see email too)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1326 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-28 13:21:27 +00:00
depristo
d665d9714f
By default now writes output to JOBID.lsf.output instead of going to email -- based on recommendations from the cancer group
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1325 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-28 13:18:58 +00:00
ebanks
3c4410f104
-add basic indel metrics to variant eval
...
-variants need a length method (can't assume it's a SNP)!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1324 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-28 03:25:03 +00:00
kcibul
1d6d99ed9c
walk by reference
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1323 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-27 20:21:04 +00:00
ebanks
089ae85be7
1. output grep-able strings for genotype eval
...
2. free DB coverage from isSNP restriction
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1322 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-27 17:36:59 +00:00
kcibul
1bca9409a4
calculate freestanding intervals
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1321 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-27 16:40:27 +00:00
asivache
2499c09256
added minIndelCount (short: minCnt) command line argument. The call is made only if the number of reads supporting the consensus indel is equal or greater than the specified value (default: 0, so only minFraction filter is on in default runs!)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1320 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-27 15:22:51 +00:00
ebanks
73ddf21bb7
SNPs no longer fail this filter if they are actually hom in reads
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1319 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-27 15:20:43 +00:00
asivache
f2b3fa83ac
fix for another bug found by Eric: some indels were printed into the output stream twice (when there's another indel within MISMATCH_WINDOW bases and that other indel requires delayed print in order to accumulate coverage)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1318 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-27 15:07:07 +00:00
aaron
f1109e9070
Added the interator to SAMDataSource to prevent seeing dupplicate reads, only in a byReads traversal. The iterator discards any reads in the current interval that would have been seen in the previous interval.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1317 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-25 22:36:29 +00:00
asivache
5eca4c353c
IndelGenotyper now uses GATK::getMergedReadGroupsByReaders() to sort out which read in the merged stream is for normal, and which is for tumor (in --somatic mode, apparently)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1316 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-24 23:01:18 +00:00
asivache
a361e7b342
SAMDataSource is now exposed by GATK engine; SamFileHeaderMerger is exposed from Resources all the way up to SAMDataSource, so now we can see underlying individual readers should we need them; GATK engine has new methods getSamplesByReaders(), getLibrariesByReaders(), and getMergedReadGroupsByReaders(): each of these methods returns a list of sets, with each element (set) holding, respectively, samples, libraries, or (merged) read groups coming from an individual input bam file (so now when using multiple -I options we can still find out which of the input bams each read comes from)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1315 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-24 22:59:49 +00:00
hanna
2024fb3e32
Better division of responsibilities between sources and type descriptors.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1314 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-24 22:15:57 +00:00
asivache
64221907a2
fixed a bug found by Eric: genotyper would crash in the case of an indel too close to the window end, with the next read mapping sufficiently far away on the ref
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1313 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-24 21:00:31 +00:00
hanna
2db86b7829
Move the cleaned read injector test from playground to core. Remove CovariateCounterTest's dependency on the CleanedReadInjector. Start doing a bit of cleanup on the CLP's FieldParsers.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1312 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-24 19:44:04 +00:00
hanna
e2ec703a32
Added indel cleaner and quality scores recalibrator to the GATK package.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1311 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-24 16:20:38 +00:00
hanna
df44bdce7d
Retire the pooled caller...its been eclipsed by other walkers in the tree.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1310 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-24 14:49:03 +00:00
kiran
884806fc16
Broken and unused. It goes away now.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1309 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-24 14:26:52 +00:00
ebanks
d044681fbe
change paths to new ones
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1308 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-24 07:28:43 +00:00
ebanks
59f0c00d77
-set indel cleaning walkers to be in core package
...
-move Andrey's alignment utility classes to core
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1307 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-24 05:23:29 +00:00
kiran
bb20462a7c
A better way: down-scale second-base ratios until the infinities disappear. This way, high-coverage sites don't cause binomialProbability to explode.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1306 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-24 03:02:00 +00:00
aaron
0b16253db3
an iterator to fix the problem where read-based interval traversals are getting duplicate reads because reads span the two intervals.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1305 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-23 23:59:48 +00:00
kiran
7c20be157c
Added ability to sample from a list *without* replacement.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1304 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-23 21:00:19 +00:00
kiran
038cbcf80e
If the result from the secondary-base test is 0.0, replace the result with a minimum likelihood such that the log-likelihood doesn't underflow.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1303 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-23 20:59:52 +00:00
kiran
093550a3f2
Removed secondary-base test from SingleSampleGenotyper. It now lives in the variant filtration system.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1302 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-23 20:58:41 +00:00
ebanks
477502338f
moved major indel cleaning pieces to core (yippee!)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1301 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-23 19:59:51 +00:00
ebanks
4efe26c59a
Major: allow genotyper to optionally output in 1KG format, including outputting the samples in which indels are found.
...
Minor: refactor 454 filtering
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1300 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-23 19:53:51 +00:00
aaron
f7168bd7cf
added the abilty to build the jar's to a different location, like the following:
...
ant -Ddist=altDist
or
ant -Ddist=/where/you/want/jars
Also changed the test build to depend on playground build instead of core, right now tests fail if you only build the core.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1299 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-23 04:06:58 +00:00
ebanks
f8b1dbe3b3
getBestGenotype() does not necessarily return hets in alphabetical order;
...
the string (unfortunately) needs to be sorted for lookup in the table (otherwise we throw a NullPointerException)
TO DO: have the table be smarter instead of sorting each genotype string
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1298 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-23 01:58:47 +00:00
ebanks
ee8ed534e0
print full genotype for alt allele
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1297 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-23 01:35:23 +00:00
hanna
298cc24524
Fix minor bug introduced in filtration, and cleaned up the artificial sam records so that they use SAMRecord.NO_ALIGNMENT_REFERENCE_INDEX and SAMRecord.NO_ALIGNMENT_START rather than hardcoded -1's.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1296 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-22 22:37:41 +00:00
hanna
cac04a407a
For Manny: filter out reads where the the ref index ==
...
NO_ALIGNMENT_REFERENCE_INDEX but the alignment start != NO_ALIGNMENT_START.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1295 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-22 21:19:24 +00:00
depristo
9c12c02768
AlleleBalance and on/off primary base filters -- version 0.0.1 -- for experimental use only
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1294 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-22 17:54:44 +00:00
andrewk
00f9bcd6d1
CoverageEval.py tool right before some major changes to the core of the code
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1293 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-22 16:58:23 +00:00
ebanks
24e81e3e7b
moved to wiki
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1292 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-22 16:35:23 +00:00
ebanks
c54fd1da09
Beautify the genotype concordance printouts
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1291 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-22 02:53:02 +00:00
hanna
6e4fd8db4a
Better formatting of available walkers, and only output them along with help. Cleanup JVMUtils.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1290 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-21 22:23:28 +00:00
depristo
761d70faa1
Better printing of multiple rods -- now produces a comma-separated set of values
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1289 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-21 21:58:27 +00:00
depristo
8588f75eb6
Better printing with toSimpleString() -- now prints out chip-genotype string
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1288 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-21 21:57:59 +00:00
hanna
1843684cd2
Cleanup: GATKEngine no longer needs to be lazy loaded, b/c the plugin directory no longer exists.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1287 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-21 18:50:51 +00:00
hanna
b43925c01e
Switched to Reflections ( http://code.google.com/p/reflections/ ) project for
...
inspecting the source tree and loading walkers, rather than trying to roll
our own by hand.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1286 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-21 18:32:22 +00:00
kiran
436a196e2b
Bug fixes to support hapmap genotyping concordance.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1285 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-21 16:20:10 +00:00
depristo
7e04313b4e
Bug fixes and improvements to CoverageHistogram. Now displays the frequency of the bin. Also correctly prints out the last element in the coverage histogram (<= vs. <)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1284 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-21 11:55:05 +00:00
aaron
f13a1e8591
adding a couple of small changes to support contract with VariantEval
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1283 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-21 03:49:15 +00:00
aaron
b4adb5133a
GLF rod as a AllelicVariant object.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1282 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-21 00:55:52 +00:00