hanna
ee2f022c71
Make new TraverseByLociByReference the default.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@532 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-24 19:50:11 +00:00
hanna
e50ae97fe1
Introduce new index-based fasta reader. Clean up MicroManager code, pushing necessary code back into TraversalEngine.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@531 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-24 19:40:21 +00:00
depristo
40a2b3eeb3
Basic logistic regression support for calibrating qualities; mostly for Andrew to experiment with
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@529 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-24 19:09:50 +00:00
andrewk
061f4328b1
Covariate counter now outputs files used by R to do logistic regression.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@527 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-24 17:11:57 +00:00
jmaguire
4e4fd33584
First draft of actual pooled EM caller.
...
Produces sane looking output on region of 1kG pilot1:
CALL NA12813.SRP000031.2009_02.bam CC 0.609084 0.609084
CALL NA12003.SRP000031.2009_02.bam CC 2.114234 2.114234 CCCCC
CALL NA06994.SRP000031.2009_02.bam CC 0.910114 0.910114 C
CALL NA18940.SRP000031.2009_02.bam CT 2.589749 0.910114 T
CALL NA18555.SRP000031.2009_02.bam CC 0.609084 0.609084
Next up, eval vs. Baseline pilot1 calls and pilot3 deep-coverage truth.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@526 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-24 13:43:41 +00:00
jmaguire
dd408a2a9a
First draft of actual pooled EM caller.
...
Produces sane looking output on region of 1kG pilot1:
CALL NA12813.SRP000031.2009_02.bam CC 0.609084 0.609084
CALL NA12003.SRP000031.2009_02.bam CC 2.114234 2.114234 CCCCC
CALL NA06994.SRP000031.2009_02.bam CC 0.910114 0.910114 C
CALL NA18940.SRP000031.2009_02.bam CT 2.589749 0.910114 T
CALL NA18555.SRP000031.2009_02.bam CC 0.609084 0.609084
Next up, eval vs. Baseline pilot1 calls and pilot3 deep-coverage truth.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@525 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-24 13:42:15 +00:00
ebanks
13d4692d2e
1. Added a by-interval traversal.
...
2. Added a shell for the indel cleaner walker (it's currently being used to test the interval traversal).
3. Fixed small bug in downsampling (make sure to downsample the offsets too)
4. GenomeAnalysisTK.execute => anyone object to my change to "instanceof" instead of trying to catch a ClassCastException (yuck)?
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@524 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-24 04:33:35 +00:00
kiran
1984bb2d13
Made num_loci_total public because I'm lazy. I'll change it back later.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@523 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-24 03:57:23 +00:00
kiran
7ce11e152b
Simplified. Added option to perform four-base retest of a putative variant.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@522 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-24 03:56:15 +00:00
kiran
135d3eabeb
Now only distributes 80% of the residual probability to the secondary base, 10% each to the other two bases. Nicer labelling for stringified probability distribution output.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@521 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-24 03:34:43 +00:00
kiran
3cda85f2e3
New implementation of binomial probability that accurately computes values down to around 1e-237.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@520 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-24 03:32:04 +00:00
kiran
305584b69e
Test class for MathUtils with a test for binomialProbability().
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@519 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-24 03:31:02 +00:00
aaron
bd4cacb832
Added code to make a read group and sample name for BAM files that don't annotate them on reads. The defaults for both are now the filename, but this may be shortened in the future.
...
The sample name for a read can be retrieved with the command:
read.getAttribute(SAMTag.RG.toString());
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@518 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-24 00:31:00 +00:00
hanna
45d962e491
I understood the contig index incorrectly when I initially wrote this code. Fixed.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@517 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-23 22:31:43 +00:00
aaron
635bfd8604
Added a little bit of hack to get the header back to the walker by initialization time, which was before sharding in the last version.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@516 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-23 21:07:11 +00:00
aaron
0208d201c7
Forgot this in the last commit...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@515 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-23 20:47:22 +00:00
aaron
3dc2afd7ab
Added the ability to get a merged header in a LociByReference traversal
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@514 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-23 20:34:52 +00:00
hanna
282f1d88b8
Make the operation 'read from the iterator and place on the queue' atomic with respect to hasNext(), next().
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@513 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-23 20:16:26 +00:00
aaron
998763950c
Oops, contig index is a zero not one based value
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@512 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-23 19:08:16 +00:00
aaron
8c13940c5a
A lot of changes to support by-read sharding and some from debugging of the by loci traversals
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@511 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-23 19:03:14 +00:00
andrewk
32715a6c47
First check-in of walker that produces tables showing covariation of read cycle, and dinucleotide with quality score in a format usable for R analysis and for doing logistic regression.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@510 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-23 18:58:25 +00:00
aaron
0720d248ce
Adding the test case for by reads sharding of BAM data sources
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@509 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-23 18:01:22 +00:00
ebanks
cae54ec52d
Walker for creating intervals to be used in the indel cleaner
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@508 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-23 17:58:19 +00:00
kiran
96db1477d4
I meant for default lod threshold to be 5.0, not 0.0.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@507 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-23 17:46:08 +00:00
kiran
ca66cccd2f
Privatized constructor to prevent instantiation.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@506 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-23 17:45:39 +00:00
kiran
77e1e9e2f1
Added a static class to house useful math methods. All this has at the moment are methods for comparing doubles and floats, but I suggest that the bulk of our little math methods should be added here to avoid filling up Utils.java with so much random stuff.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@505 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-23 17:45:19 +00:00
hanna
3d7575bbb8
Oops...omitted walker.initialize().
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@504 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-23 17:35:28 +00:00
kiran
11e85f1969
Four-base mode now estimates the genotype using the one-base method and retests the site if the one-base method suggests the site is a het.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@503 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-23 17:23:24 +00:00
kiran
bd719f9c06
When checking that values are not infinite, also prints out the position so that I know which site was giving the error and I can just go there and debug it.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@502 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-23 17:21:58 +00:00
kiran
efba30f1a1
Added a constructor in which the lod threshold can be set.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@501 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-23 17:20:48 +00:00
jmaguire
8c1905c7d9
Simple walker to print all of the sample names present in a merged bam file.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@500 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-23 12:26:56 +00:00
kiran
a3a1c9dae8
Suppressed emission of duplicate paths through a four-base pileup.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@498 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-22 21:08:45 +00:00
jmaguire
6cef8bd76c
added k-best quality path enumeration.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@497 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-22 20:26:51 +00:00
ebanks
d99d67d51c
Refactored to clean it up a bit
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@495 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-22 19:18:46 +00:00
hanna
1bf4d040d8
Increase default shard size from 5 to 100000.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@494 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-22 18:29:44 +00:00
hanna
3af66a462e
Make PrintLocusContextWalker less verbose.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@493 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-22 18:28:02 +00:00
kiran
ffcd672c1c
Intermediate commit while working on getting four-base probs to work in the single sample genotyper. Has infrastructure for the new combinatorial approach and just choosing the best base more intelligently given a probability distribution over bases and the reference base.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@492 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-22 18:06:50 +00:00
hanna
4cafb95be8
TraverseByLoci / TraverseByLociByReference suffered from the same sam-triggered off-by-one (?) bug as TraverseByReference; it was just less obvious here because these versions don't shard.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@491 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-22 15:48:20 +00:00
kcibul
cb2f621d01
reverting accidental commit of change to shard size
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@490 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-22 00:33:28 +00:00
kcibul
b820130dce
* added ability to load multiple BAM files from command line
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@489 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-22 00:28:08 +00:00
kiran
5b8502745a
Added an epsilon (1e-4) to the tertiary and quaternary base hypotheses.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@488 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-22 00:01:37 +00:00
kiran
2ac240d78b
Removed an extraneous print statement.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@487 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-21 23:36:36 +00:00
kiran
0149c887ff
Fixed a bug wherein the residual probability was not being distributed properly when a file had secondary probs and the best and next-best base agreed.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@486 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-21 23:36:09 +00:00
kiran
5abfc7d079
Added an argument ('extended' or 'ext') that outputs the four-base probs in a long format.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@485 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-21 22:27:26 +00:00
kiran
dac76f041b
Added some methods to retreive the probability distributions of individual bases.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@484 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-21 22:26:25 +00:00
kiran
5b2a7c9c23
Added some methods to complement a single simple base ([AaCcGgTt]) and reverse-complement a byte-array of bases.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@483 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-21 22:25:33 +00:00
asivache
521e202a10
updated interface
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@482 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-21 21:07:20 +00:00
asivache
55ca272919
reimplemented; now implements Genotype interface instead of AllelicVariant
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@481 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-21 21:06:42 +00:00
asivache
5f37ba8f26
now can be asked to log at INFO level all concordant or discordant sites, or both
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@480 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-21 21:03:44 +00:00
asivache
1f84b9647d
auxiliary data structure for mendelian concordance reporting; it's nice to have the latest version checked in in order for the code to compile...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@479 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-21 21:02:40 +00:00