kiran
758f8aa89b
Experimental refactoring.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@674 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 19:46:34 +00:00
andrewk
1518f8f9bf
Update training data creation in CovariateCounterWalker to output much smaller files by counting the number of occurences of each data point combination rather than outputting a line for each data point (i.e. each base). Also fixed bug in LogisticRecalibrationWalker where a null SAMHeader was being pulled from a function that is now marked deprecated.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@673 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 19:23:14 +00:00
ebanks
4c12df372c
Dumb, dumb bug.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@672 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 19:21:33 +00:00
ebanks
630066cc0a
1. Merge LocusWindows whose reads overlap.
...
2. Fix bug (we weren't clearing the "to emit" list)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@670 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 17:33:23 +00:00
jmaguire
c4d89997ca
put in a dummy sample_name so it'll compile
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@668 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 15:12:42 +00:00
jmaguire
c8d7223789
do pooled calling properly for 1kg
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@667 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 15:12:13 +00:00
jmaguire
313a6d0fb5
lots of changes to facilitate calling indels and 1kG
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@666 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 15:11:42 +00:00
jmaguire
add7b6cf65
add sample_name to constructor, misc bug fixes
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@665 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 15:10:17 +00:00
jmaguire
0267ccae7f
add code for computing indel genotype likelihoods
...
make reference lods negative
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@664 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 15:09:29 +00:00
hanna
ee9077fc69
LocusIterator iterated through LocusContexts, which was fine until now when we need something
...
that iterates through loci (GenomeLocs). Rename LocusIterator to LocusContextIterator.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@662 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 13:54:57 +00:00
hanna
0bca588629
Botched some boolean logic.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@658 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-11 22:53:52 +00:00
hanna
23e9e29964
Changed reads traversals from providing a LocusContext from which the reference sequence
...
could be extracted to a char[] containing the reference bases.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@657 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-11 22:45:11 +00:00
hanna
052819bed5
Switched dependencies of GenomeAnalysisTK to depend on GenomeAnalysisEngine.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@656 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-11 22:33:00 +00:00
ebanks
009e71fcd9
We need to sort cleaned reads ourselves (instead of letting SAMFileWriter
...
do it) because the SAM headers are often screwed up and claim to be
"unsorted". While here, I broke off the module from the SortSamIterator
in case someone else wants to use it.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@654 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-11 15:43:42 +00:00
ebanks
3aabc144c6
Added functionality to allow for a contract between LocusWindowTraversalEngine and LocusWindowWalker which allows the Walker to act upon reads outside of the provided intervals.
...
(Really, all we want to do is spit out all reads, but this allows the Walker to do other things with the reads if it wants)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@641 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-08 17:28:16 +00:00
hanna
226edbdef6
Hypen-style xml output. Much sexier.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@635 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-08 01:04:40 +00:00
aaron
21536df308
Change the sample XML marshalling code over to simple XML, and take out the castor lines in the ivy.xml
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@633 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-08 00:08:25 +00:00
depristo
5a6892900e
fixing oddities in duplicates
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@628 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-07 18:55:45 +00:00
depristo
4a26f35caa
new default syntax
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@627 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-07 18:16:53 +00:00
ebanks
283a4d1b54
Fix some special-case cleaner issues.
...
We now do the same as brute force in all examples to date.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@626 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-07 18:16:35 +00:00
depristo
2204be43eb
System for traversing duplicate reads, along with a walker to compute quality scores among duplicates and a smarter method to combine quality scores across duplicates -- v1
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@624 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-07 18:06:02 +00:00
hanna
752928df94
Switch to better mechanism for supplying a default.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@615 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-07 01:22:01 +00:00
asivache
072808858e
added COUNT_CUTOFF arg: it is nor possible to tell the code to try to realign all read piles over trains of nearby indels with at least one indel observed in COUNT_CUTOFF or more different alignments (set the arg to 1 to realign around all indels); also, some diagnostic printouts added to the output (time spent on loading the reference, time spent on scrolling through the input bam file, counts of discarded reads)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@611 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-06 21:59:33 +00:00
ebanks
5be75e0ae6
First version of indel cleaner walker that works on intervals
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@607 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-06 20:20:48 +00:00
hanna
521aa40baa
Bring new command-line argument parsing system live.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@603 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-06 18:16:11 +00:00
hanna
b0cdba8bb3
Acting on Kiran's suggestion to make the doc tag in the @Argument annotation required.x
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@598 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-05 22:43:40 +00:00
depristo
8925df2e1e
More information from the duplicate combiner quality metrics
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@590 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-04 21:51:01 +00:00
kcibul
2b6466ea00
coverage calculator based on Gabor's Pilot 3 Coverage Metrics
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@589 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-04 14:18:16 +00:00
kiran
df88c4d6b0
Added some code to determine the on-genotype and off-genotype secondary base distributions (which, at the moment, is commented out).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@582 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-01 06:48:19 +00:00
kiran
e7534b292f
Optionally applies secondary base distribution priors to normal single-sample genotyper posteriors.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@581 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-01 06:36:32 +00:00
kiran
58c80d8d87
For on and off-genotype primary bases, optionally compute the concordance of the secondary bases to their expected distributions. Each genotype has slightly different profiles.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@580 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-01 06:33:48 +00:00
depristo
84dae06d5a
Initial version of ByDuplicates traversal, as well as a duplicate quality score estimator
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@576 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-30 22:16:21 +00:00
andrewk
b630f2f2f1
More tables output by CovariateCounterWalker AND made CovariateCounterWalker and LogisticRecalibration aware of positive and negative strandedness of data which changes the regression output significantly.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@568 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-30 01:22:50 +00:00
kiran
0a707a887b
Added ability to evaluate best + random base.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@564 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-29 20:05:36 +00:00
kcibul
334f158e5a
added parameters for mapping quality and duplicate filters
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@563 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-29 18:05:34 +00:00
ebanks
7de5da7065
Start getting the cleaner working in Walker
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@561 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-29 14:59:53 +00:00
kcibul
f557da0a78
Calculate interval-based statistics for Hybrid Selection
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@558 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-29 04:01:24 +00:00
andrewk
58b2578c44
Several changes to CovariateCounter walker to print more tables (called vs. observed Q scores), bug fixes to LogisticRecalibrationWalker and LogisticRegressor, and print string functionality added to Pair.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@550 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-28 00:37:48 +00:00
ebanks
a0a581171b
print out the last interval
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@549 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-27 20:43:06 +00:00
aaron
a343f3eab7
Fixed bug where we weren't setting the reads group correctly. Also added code to set the printMetrics field of the singleSampleGenotyper from the Pool caller, it was null excepting out for me without that set.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@548 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-27 15:17:20 +00:00
kiran
1daf8e0987
A utility to compare the results of the SingleSampleGenotyper in 1-base and 4-base mode.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@547 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-27 15:10:08 +00:00
kiran
444bc18183
Removed binomialProb() method. Set better values for qHom, qHet, and qHomNonRef and allowed those to be set from the command-line.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@546 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-27 15:09:02 +00:00
ebanks
0c76a70313
Renamed traversal by "interval" to "locusWindow"
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@537 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-26 02:26:08 +00:00
depristo
40a2b3eeb3
Basic logistic regression support for calibrating qualities; mostly for Andrew to experiment with
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@529 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-24 19:09:50 +00:00
andrewk
061f4328b1
Covariate counter now outputs files used by R to do logistic regression.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@527 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-24 17:11:57 +00:00
jmaguire
4e4fd33584
First draft of actual pooled EM caller.
...
Produces sane looking output on region of 1kG pilot1:
CALL NA12813.SRP000031.2009_02.bam CC 0.609084 0.609084
CALL NA12003.SRP000031.2009_02.bam CC 2.114234 2.114234 CCCCC
CALL NA06994.SRP000031.2009_02.bam CC 0.910114 0.910114 C
CALL NA18940.SRP000031.2009_02.bam CT 2.589749 0.910114 T
CALL NA18555.SRP000031.2009_02.bam CC 0.609084 0.609084
Next up, eval vs. Baseline pilot1 calls and pilot3 deep-coverage truth.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@526 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-24 13:43:41 +00:00
jmaguire
dd408a2a9a
First draft of actual pooled EM caller.
...
Produces sane looking output on region of 1kG pilot1:
CALL NA12813.SRP000031.2009_02.bam CC 0.609084 0.609084
CALL NA12003.SRP000031.2009_02.bam CC 2.114234 2.114234 CCCCC
CALL NA06994.SRP000031.2009_02.bam CC 0.910114 0.910114 C
CALL NA18940.SRP000031.2009_02.bam CT 2.589749 0.910114 T
CALL NA18555.SRP000031.2009_02.bam CC 0.609084 0.609084
Next up, eval vs. Baseline pilot1 calls and pilot3 deep-coverage truth.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@525 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-24 13:42:15 +00:00
ebanks
13d4692d2e
1. Added a by-interval traversal.
...
2. Added a shell for the indel cleaner walker (it's currently being used to test the interval traversal).
3. Fixed small bug in downsampling (make sure to downsample the offsets too)
4. GenomeAnalysisTK.execute => anyone object to my change to "instanceof" instead of trying to catch a ClassCastException (yuck)?
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@524 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-24 04:33:35 +00:00
kiran
1984bb2d13
Made num_loci_total public because I'm lazy. I'll change it back later.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@523 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-24 03:57:23 +00:00
kiran
7ce11e152b
Simplified. Added option to perform four-base retest of a putative variant.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@522 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-24 03:56:15 +00:00
aaron
3dc2afd7ab
Added the ability to get a merged header in a LociByReference traversal
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@514 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-23 20:34:52 +00:00
andrewk
32715a6c47
First check-in of walker that produces tables showing covariation of read cycle, and dinucleotide with quality score in a format usable for R analysis and for doing logistic regression.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@510 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-23 18:58:25 +00:00
ebanks
cae54ec52d
Walker for creating intervals to be used in the indel cleaner
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@508 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-23 17:58:19 +00:00
kiran
96db1477d4
I meant for default lod threshold to be 5.0, not 0.0.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@507 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-23 17:46:08 +00:00
kiran
11e85f1969
Four-base mode now estimates the genotype using the one-base method and retests the site if the one-base method suggests the site is a het.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@503 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-23 17:23:24 +00:00
kiran
bd719f9c06
When checking that values are not infinite, also prints out the position so that I know which site was giving the error and I can just go there and debug it.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@502 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-23 17:21:58 +00:00
kiran
efba30f1a1
Added a constructor in which the lod threshold can be set.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@501 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-23 17:20:48 +00:00
jmaguire
8c1905c7d9
Simple walker to print all of the sample names present in a merged bam file.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@500 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-23 12:26:56 +00:00
kiran
a3a1c9dae8
Suppressed emission of duplicate paths through a four-base pileup.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@498 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-22 21:08:45 +00:00
jmaguire
6cef8bd76c
added k-best quality path enumeration.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@497 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-22 20:26:51 +00:00
ebanks
d99d67d51c
Refactored to clean it up a bit
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@495 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-22 19:18:46 +00:00
kiran
ffcd672c1c
Intermediate commit while working on getting four-base probs to work in the single sample genotyper. Has infrastructure for the new combinatorial approach and just choosing the best base more intelligently given a probability distribution over bases and the reference base.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@492 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-22 18:06:50 +00:00
asivache
5f37ba8f26
now can be asked to log at INFO level all concordant or discordant sites, or both
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@480 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-21 21:03:44 +00:00
asivache
1f84b9647d
auxiliary data structure for mendelian concordance reporting; it's nice to have the latest version checked in in order for the code to compile...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@479 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-21 21:02:40 +00:00
asivache
ece3e9969e
one trivial walker to filter reads; bam in -> filter -> bam out
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@478 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-21 20:39:29 +00:00
asivache
61e855200d
latest version...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@477 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-21 20:38:37 +00:00
kcibul
64b2fd866f
* extracted core quality-score based genotype likelihood code
...
* precompute expensive operations (log/pow) based on Picard experience
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@476 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-21 18:58:43 +00:00
jmaguire
11c520b283
completed my old draft of the old school single sample genotype walker
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@475 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-21 05:38:04 +00:00
depristo
b8233d92c8
Simple IO walker to test / crush file systems and evalute I/O performance in general
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@474 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-20 14:07:14 +00:00
jmaguire
bf76eab955
whoops; fix a comment line.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@473 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-19 17:54:54 +00:00
jmaguire
bcba1ff424
Fix a minor rounding bug and putz around with fractional counts in the pooled caller.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@472 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-19 17:52:24 +00:00
jmaguire
af6788fa3d
Misc:
...
1. Added logGamma function to utils
2. Required asserts to be enabled in the allele caller (run with java -ea)
3. put checks and asserts of NaN and Infinity in AlleleFrequencyEstimate
4. Added option FRACTIONAL_COUNTS to the pooled caller (not working right yet)
AlleleFrequencyWalker:
5. Made FORCE_1BASE_PROBS not static in AlleleFrequencyWalker (an argument should never be static! Jeez.)
6. changed quality_precision to be 1e-4 (Q40)
7. don't adjust by quality_precision unless the qual is actually zero.
8. added more asserts for NaN and Infinity
9. put in a correction for zero probs in P_D_q
10. changed pG to be hardy-weinberg in the presence of an allele frequency prior (duh)
11. rewrote binomialProb() to not overflow on deep coverage
12. rewrote nchoosek() to behave right on deep coverage
13. put in some binomailProb() tests in the main() routine (they come out right when compared with R)
Hunt for loci where 4bp should change things:
14. added FindNonrandomSecondBestBasePiles walker.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@471 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-19 15:35:07 +00:00
ebanks
758db73b98
Fixed SLOWNESS issue.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@469 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-17 20:10:34 +00:00
asivache
2a937fa8d3
set SAM file header's sorting order to unsorted, hopefully it will help to speed things up
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@468 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-17 19:32:24 +00:00
asivache
03ec3452f2
a first, simplest version of a walker that filters out reads based on user-specified criteria and writes remaining reads into a new bam file
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@467 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-17 18:51:39 +00:00
asivache
55537c0d1e
chnage class name, now it compiles...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@451 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-16 16:51:00 +00:00
asivache
4f9bc7206f
some cleanup, also ensuring that all reads get written into output
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@450 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-16 16:49:25 +00:00
asivache
e8a6cdb386
renamed standalone main
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@449 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-16 15:56:46 +00:00
asivache
832afd3d60
renamed standalone main
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@448 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-16 15:56:27 +00:00
asivache
85308f4ddc
resurrected indel tool's standalone main
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@447 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-16 15:55:52 +00:00
kcibul
6f56938d42
* added a bit more debugging output
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@446 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-16 15:20:26 +00:00
asivache
240eb18564
fix a few related issues when not all the reads were written into the output files. now cleaned output still contains all reads either with modified alignments or untouched
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@444 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-16 03:56:47 +00:00
kcibul
7e05b43f40
* added some error checking for read groups
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@442 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-16 03:22:49 +00:00
kcibul
3fda8613c3
* minor formatting changes
...
* support for "extended" output
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@428 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-15 15:11:05 +00:00
kiran
7949e377e4
Intermediate commit. Refactored some simple base manipulation stuff into BaseUtils.java. Generalized some likelihood computation logic to make future possible EM-ing easier.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@424 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-15 04:18:07 +00:00
kiran
d0b8d311e6
Can now optionally print the read and the alignment region of the reference.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@423 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-15 04:10:30 +00:00
kcibul
d4aaa1bef4
* fixed (with Matt's help) the argument parsing
...
* outputting UCSC wiggle format
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@422 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-15 02:17:39 +00:00
depristo
24722a442e
Slight code cleanup
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@421 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 22:21:36 +00:00
asivache
baae98c6d5
and don't allocate new 200M string every time please, just pass byte array!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@417 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 21:55:33 +00:00
asivache
9d56355abe
bug fixed when reference name was passed as a string instead of actual reference bases
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@416 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 21:46:27 +00:00
kiran
222c4e5865
Commented out some debugging lines
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@415 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 20:15:41 +00:00
kiran
49d76014d1
Commented out a debugging line
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@414 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 20:15:11 +00:00
kiran
b39e584787
Primary or secondary bases that got a quality score of literally zero led to unfortunate infinities. Added an epsilon (1e-5) to every prob.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@413 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 20:04:49 +00:00
jmaguire
d28e9f9b98
search over q's for finding argmax[q] p(D|q)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@412 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 19:15:45 +00:00
ebanks
647827b18c
Transitioned indel code to use GATK and Walkers
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@410 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 19:14:15 +00:00
jmaguire
961dbbd4ef
Now output bases and qhat and qstar into the GFF.
...
Quals coming soon (four-base)
QHAT : Most likely alt allele freq (unconstrained by number of chromosomes).
QSTAR : Most likely alt allele freq (constrained by number of chromosomes).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@402 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 15:23:00 +00:00
kiran
dafdff1974
All bases are now indexed as A:0, C:1, G:2, T:3.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@401 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 14:49:43 +00:00
asivache
bc43c0eefc
there are really cases when we can not merge until we get just two pilesant now we do not crash in those cases but print a warning and just show the resulting n piles even when n>2
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@390 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 00:45:47 +00:00
kiran
f838a5e511
Changed some double comparisons of the form a == b to abs(a - b) <= precision. Now we shouldn't be passing or failing some if conditions due to floating-point precision.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@388 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-13 20:05:46 +00:00
asivache
d44c30154a
added MAX_READ_LENGTH - now we can ignore long reads (454?); a bad idea in general, but the performance hit is to hard to take, at least for preliminary testing runs...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@384 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-13 16:53:12 +00:00