hanna
ba9a0b5da8
Break out some of the weird inner classes out of the HierachicalMicroScheduler.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@566 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-29 21:07:07 +00:00
hanna
4c5f640eb7
Tweak the arguments passed to the command-line arguments parser so that it fails less often for invalid arguments.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@560 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-29 14:36:27 +00:00
andrewk
58b2578c44
Several changes to CovariateCounter walker to print more tables (called vs. observed Q scores), bug fixes to LogisticRecalibrationWalker and LogisticRegressor, and print string functionality added to Pair.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@550 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-28 00:37:48 +00:00
kiran
b9c9dbb1d7
Added multinomialProbability method.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@545 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-27 15:03:50 +00:00
hanna
e50ae97fe1
Introduce new index-based fasta reader. Clean up MicroManager code, pushing necessary code back into TraversalEngine.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@531 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-24 19:40:21 +00:00
depristo
40a2b3eeb3
Basic logistic regression support for calibrating qualities; mostly for Andrew to experiment with
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@529 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-24 19:09:50 +00:00
jmaguire
dd408a2a9a
First draft of actual pooled EM caller.
...
Produces sane looking output on region of 1kG pilot1:
CALL NA12813.SRP000031.2009_02.bam CC 0.609084 0.609084
CALL NA12003.SRP000031.2009_02.bam CC 2.114234 2.114234 CCCCC
CALL NA06994.SRP000031.2009_02.bam CC 0.910114 0.910114 C
CALL NA18940.SRP000031.2009_02.bam CT 2.589749 0.910114 T
CALL NA18555.SRP000031.2009_02.bam CC 0.609084 0.609084
Next up, eval vs. Baseline pilot1 calls and pilot3 deep-coverage truth.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@525 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-24 13:42:15 +00:00
kiran
135d3eabeb
Now only distributes 80% of the residual probability to the secondary base, 10% each to the other two bases. Nicer labelling for stringified probability distribution output.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@521 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-24 03:34:43 +00:00
kiran
3cda85f2e3
New implementation of binomial probability that accurately computes values down to around 1e-237.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@520 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-24 03:32:04 +00:00
hanna
45d962e491
I understood the contig index incorrectly when I initially wrote this code. Fixed.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@517 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-23 22:31:43 +00:00
aaron
998763950c
Oops, contig index is a zero not one based value
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@512 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-23 19:08:16 +00:00
aaron
8c13940c5a
A lot of changes to support by-read sharding and some from debugging of the by loci traversals
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@511 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-23 19:03:14 +00:00
kiran
ca66cccd2f
Privatized constructor to prevent instantiation.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@506 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-23 17:45:39 +00:00
kiran
77e1e9e2f1
Added a static class to house useful math methods. All this has at the moment are methods for comparing doubles and floats, but I suggest that the bulk of our little math methods should be added here to avoid filling up Utils.java with so much random stuff.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@505 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-23 17:45:19 +00:00
jmaguire
6cef8bd76c
added k-best quality path enumeration.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@497 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-22 20:26:51 +00:00
kiran
5b8502745a
Added an epsilon (1e-4) to the tertiary and quaternary base hypotheses.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@488 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-22 00:01:37 +00:00
kiran
2ac240d78b
Removed an extraneous print statement.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@487 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-21 23:36:36 +00:00
kiran
0149c887ff
Fixed a bug wherein the residual probability was not being distributed properly when a file had secondary probs and the best and next-best base agreed.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@486 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-21 23:36:09 +00:00
kiran
dac76f041b
Added some methods to retreive the probability distributions of individual bases.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@484 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-21 22:26:25 +00:00
kiran
5b2a7c9c23
Added some methods to complement a single simple base ([AaCcGgTt]) and reverse-complement a byte-array of bases.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@483 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-21 22:25:33 +00:00
jmaguire
af6788fa3d
Misc:
...
1. Added logGamma function to utils
2. Required asserts to be enabled in the allele caller (run with java -ea)
3. put checks and asserts of NaN and Infinity in AlleleFrequencyEstimate
4. Added option FRACTIONAL_COUNTS to the pooled caller (not working right yet)
AlleleFrequencyWalker:
5. Made FORCE_1BASE_PROBS not static in AlleleFrequencyWalker (an argument should never be static! Jeez.)
6. changed quality_precision to be 1e-4 (Q40)
7. don't adjust by quality_precision unless the qual is actually zero.
8. added more asserts for NaN and Infinity
9. put in a correction for zero probs in P_D_q
10. changed pG to be hardy-weinberg in the presence of an allele frequency prior (duh)
11. rewrote binomialProb() to not overflow on deep coverage
12. rewrote nchoosek() to behave right on deep coverage
13. put in some binomailProb() tests in the main() routine (they come out right when compared with R)
Hunt for loci where 4bp should change things:
14. added FindNonrandomSecondBestBasePiles walker.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@471 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-19 15:35:07 +00:00
asivache
df5aae5ed4
got read of a couple of warnings and added percentage(x,base) methods
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@462 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-17 15:15:21 +00:00
hanna
01be8f09e3
Exception cleanup. All our non-runtime exceptions should extend from StingException, StingException needs to be lower in the tree to build.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@457 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-16 22:17:25 +00:00
depristo
f47f640df6
Better debugging output and testing
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@455 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-16 21:54:56 +00:00
hanna
56f6847456
Changed interface from contig,pos,length to more common contig,start,stop interface.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@441 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-16 00:04:41 +00:00
depristo
76d833e39b
Meaningful assert messages
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@435 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-15 19:12:28 +00:00
aaron
180ff13290
Added a bunch of changes to support the new MicroManager code
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@431 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-15 18:29:38 +00:00
hanna
339261c4a9
Load the dictionary and sanity check it against the index.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@430 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-15 18:04:13 +00:00
hanna
26e84d7fd6
Added index iteration for ReferenceSequenceFile interface compatibility.
...
Added better error checking for querying past the end of a contig.
Lots more testing.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@429 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-15 17:17:11 +00:00
hanna
182626576f
Basic indexed fasta POC in place. Requires a more complete implementation of the ReferenceSequenceFile interface,
...
and much more testing.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@425 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-15 13:46:56 +00:00
depristo
24722a442e
Slight code cleanup
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@421 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 22:21:36 +00:00
depristo
72a3d84ed2
General purpose pileup code -- you can use these features to obtain detailed pileup data from reads and offsets. Useful for all pileup based walkers. Expanded support for rodSAMPileup to enable the new ValidatingPileupWalker, which takes a samtools pileup output and checks that GATK gives identical output as samtools on a per base and per qual pileup. It's going to be a very useful validation tool.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@418 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 22:13:10 +00:00
ebanks
b363eedd2c
Deal with screwy reads by changing logic to determine whether we are
...
past the last interval
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@409 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 19:13:16 +00:00
hanna
0629f79049
Moved fasta support files into their own package.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@408 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 18:13:23 +00:00
hanna
186c799ffc
Class to read an .fai file.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@405 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 17:37:18 +00:00
kiran
40ea22eb17
Added some methods to return the cross-talk partner base of a given base or base index.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@400 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 14:49:12 +00:00
kiran
998fad76c6
Some utility methods for creating pileups of secondary bases and secondary quals.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@397 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 13:57:54 +00:00
depristo
11377ef390
Added lots of assertions to check for problems. The current GenomeLoc needs to be cleaned up and refactored but at least it runs. We need unit tests ASAP
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@392 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 00:53:08 +00:00
depristo
bb666ce392
Added mappingQualPileup function for use in the verbose mode of Pileup
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@391 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 00:51:26 +00:00
jmaguire
f39092526d
Added function RandomSubset
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@379 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-13 12:14:53 +00:00
kiran
1fb16d54e0
For SAM files that have no alignments and when no reference is specified, contigInfo.getSequence() is null, causing an error when getSequenceName() is called on the resulting null pointer. Check for null instead and return that instead of barfing here.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@374 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-13 00:48:21 +00:00
kiran
5e96ab6161
Helpful functions for converting a base (char) to a base index (A:0, C:1, G:2, T:3, alphabetical and consistent with Illumina conventions to minimize confusion.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@373 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-13 00:46:23 +00:00
kcibul
ce72932a45
* refactored GenomeLoc to use contigIndex internally for performance and fixed several calling classes
...
* added basic unit test for GenomeLoc
* fixed bug when parsing genome locations like chr1:5000 the start position was being left as maxint rather than being set to the same as the stop position.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@365 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-12 02:25:17 +00:00
depristo
17b3d5b554
New ROD accessing system, including a generalized interface for binding ROD on the command line that doesn't require you to chance GenomeAnalysisTK.java
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@355 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-09 22:04:59 +00:00
hanna
8a1207e4db
Bringing up scaffolding for integration of locus traversals by reference with Aaron's data source code.
...
Reverts to original TraverseByLociByReference behavior unless a special combination of command-line flags are used.
Lightly tested at best, and major flaws include:
- MicroManager is not doing MicroScheduling right now; it's driving the traversals.
- New database-ish data providers imply by their interface that they're stateless, but they're highly stateful.
- Using static objects to circumvent encapsulation.
- Code duplication is rampant.
- Plus more!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@346 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-09 20:28:17 +00:00
kiran
59b2e6a90f
Added some stuff for retreiving the base index and probability of a compressed base.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@329 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-08 15:52:58 +00:00
depristo
b49f713336
Enabled multiple argument for GATK driver; first step towards generalized -rods <name> <type> <file> argument structure
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@325 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-08 01:52:13 +00:00
depristo
00722e19bc
The system now requires a dictionary file for a fasta file, or it throws an error. You can't just operate without a sequence dictionary any longer. We will transition to a GenomeLoc system that assumes a dictionary is available.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@319 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-07 22:19:54 +00:00
asivache
9c4fc633aa
Make it symmetric: if there is no sequence dictionary, also send a message to the logger, just like we do when we find the dict
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@318 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-07 21:44:39 +00:00
ebanks
d1c5e986d5
Another check to deal with bad reads (BWA output throws bad exceptions)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@298 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-06 04:58:22 +00:00