aaron
887adcfc7f
Some minor fixes to the last check-in
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@387 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-13 18:24:51 +00:00
aaron
f2d0d73309
removed old shard strategy code
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@386 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-13 18:13:45 +00:00
aaron
dd604799dc
Added some new code for shard support over reads
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@385 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-13 18:11:43 +00:00
asivache
d44c30154a
added MAX_READ_LENGTH - now we can ignore long reads (454?); a bad idea in general, but the performance hit is to hard to take, at least for preliminary testing runs...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@384 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-13 16:53:12 +00:00
hanna
e91a429c58
A class to print out as much context about the given locus site as is possible. Useful for testing traversal engines -- run old and new code across a given region and diff the output to make sure they have the same context.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@383 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-13 15:29:55 +00:00
jmaguire
6652f13a17
more verbose gff output!
...
EVEN MORE verbosity to come!
Tremble in anticipation.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@382 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-13 15:21:23 +00:00
hanna
cf929a8275
Get rid of test case's dependence on transient methods.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@381 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-13 15:16:42 +00:00
jmaguire
6e180ed44e
Unified caller is go.
...
AlleleFrequencyWalker and related classes work equally well for 2 or 200 chromosomes.
Single Sample Calling:
Allele Frequency Metrics (LOD >= 5)
-------------------------------------------------
Total loci : 171575
Total called with confidence : 168615 (98.27%)
Number of variants : 111 (0.07%) (1/1519)
Fraction of variant sites in dbSNP : 87.39%
-------------------------------------------------
Hapmap metrics are coming up all zero. Will fix.
Pooled Calling:
AAF r-squared after EM is 0.99.
AAF r-squared after EM for alleles < 20% (in pools of ~100-200 chromosomes) is 0.95 (0.75 before EM)
Still not using fractional genotype counts in EM. That should improve r-squared for low frequency alleles.
Chores still outstanding:
- make a real pooled caller walker (as opposed to my experiment framework).
- add fractional genotype counts to EM cycle.
- add pool metrics to the metrics class? *shrug* we don't really have truth outside of a contrived experiment...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@380 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-13 12:29:51 +00:00
jmaguire
f39092526d
Added function RandomSubset
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@379 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-13 12:14:53 +00:00
asivache
b4136b6d6e
a few tweaks to make it more robust: ignore reads with cigars containing anything but I,D,M; don't set up contig ordering manually, rely upon reference sequence and its dictionary; don't die if a record does not have NM tag, but faal back to direct counting instead; now requires reference as a cmdline arg
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@378 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-13 04:49:19 +00:00
kiran
756e6c61d8
Strictness args are presented as lowercase in the help, but only accepted if uppercase. Changed help to list the valid arguments in uppercase.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@376 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-13 00:50:19 +00:00
kiran
c51f51f255
Make sure we always write at least 1000 points per base in each cycle's scatterplot. Print the disagreement rate between Bustard and FourBaseRecaller.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@375 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-13 00:49:41 +00:00
kiran
1fb16d54e0
For SAM files that have no alignments and when no reference is specified, contigInfo.getSequence() is null, causing an error when getSequenceName() is called on the resulting null pointer. Check for null instead and return that instead of barfing here.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@374 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-13 00:48:21 +00:00
kiran
5e96ab6161
Helpful functions for converting a base (char) to a base index (A:0, C:1, G:2, T:3, alphabetical and consistent with Illumina conventions to minimize confusion.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@373 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-13 00:46:23 +00:00
kiran
35fc002d5d
Debugging information is now written in such a way to make it easier to import into R.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@372 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-12 19:45:33 +00:00
kiran
6ee4fe5a20
Fixed a Bustard/Firecrest file synchronization bug.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@371 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-12 19:44:07 +00:00
kiran
817278be46
If a SAMRecord is on the negative strand, reverse complement the SQ tag.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@370 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-12 19:42:24 +00:00
kiran
1d5a22cacf
Extracts a Fastq file and the SQ tags to a separate file.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@369 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-12 19:41:44 +00:00
kiran
e410c005c0
A debugging tool to ensure the SQ tag in a four-prob SAM file matches the SAMRecord strand orientation.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@368 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-12 19:40:42 +00:00
hanna
9c37400c4f
Added basic performance testing so I can make sure concurrent access doesn't slow down overall fasta access.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@367 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-12 18:05:56 +00:00
kcibul
c7777d46d6
* re-enabled setting of sequence dictionary information on GenomeLoc
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@366 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-12 02:44:14 +00:00
kcibul
ce72932a45
* refactored GenomeLoc to use contigIndex internally for performance and fixed several calling classes
...
* added basic unit test for GenomeLoc
* fixed bug when parsing genome locations like chr1:5000 the start position was being left as maxint rather than being set to the same as the stop position.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@365 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-12 02:25:17 +00:00
hanna
49fd951d8c
Initial test suite for FastaSequenceFile2, so I can add parallelism support with abandon.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@364 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-11 21:10:42 +00:00
hanna
608a66e6ab
TbyLocibyRef previously didn't seem to support traversals with no interval specified. Put in a temporary fix until the threaded approach is in place.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@363 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-10 22:14:06 +00:00
hanna
c2669021b8
Cleanup, and support either by-interval traversals or full traversals in data source-backed code.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@362 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-10 22:09:01 +00:00
hanna
2322bb7d86
Workaround: use a single ReferenceIterator for an entire micromanaged traversal. We'll have to
...
do something about ReferenceIterator thread safety later.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@361 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-10 20:50:28 +00:00
hanna
95753e1b34
Should've been calling queryOverlapping in locus mode.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@360 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-10 20:22:04 +00:00
kiran
2b59110dca
CombineSamAndFourProbs is better.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@358 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-10 04:19:53 +00:00
kiran
56aa98ad30
Ignore null values.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@357 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-10 04:18:20 +00:00
kiran
2ef2c9e121
Fixed an issue wherein the SQ field was only being pulled from the first read of the pileup, no matter what. Fixed an issue wherein Andrew enumerates his bases as A:0, C:1, T:2, G:3, and Kiran's QualityUtils methods enumerate bases as A:0, C:1, G:2, T:3 (we should standardize this). Fixed an issue wherein the remaining probability was being divided by 3 rather than 2 when four-base probs are enabled.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@356 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-10 04:17:53 +00:00
depristo
17b3d5b554
New ROD accessing system, including a generalized interface for binding ROD on the command line that doesn't require you to chance GenomeAnalysisTK.java
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@355 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-09 22:04:59 +00:00
kiran
f5cc2d8b0b
Commented out import of IlluminaParser.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@354 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-09 21:30:29 +00:00
hanna
0d825ccfc1
Oops. Fixed duplicate reference to the reference.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@353 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-09 21:27:57 +00:00
aaron
9afa101465
Add interval support to the
...
.__ __ __
_____| |__ _____ _/ |__/ |_ ___________
/ ___/ | \\__ \\ __\ __\/ __ \_ __ \
\___ \| Y \/ __ \| | | | \ ___/| | \/
/____ >___| (____ /__| |__| \___ >__|
\/ \/ \/ \/
classes!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@352 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-09 21:23:43 +00:00
kiran
c5220c0822
Four-base probs are now decoded with the relevant method in QualityUtils
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@351 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-09 20:52:17 +00:00
kiran
9bc763a835
A better (aka 'working') tool for combining four-base probs with an aligned sam file.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@350 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-09 20:51:37 +00:00
kiran
b7a2e82b46
Can optionally process raw or corrected intensities.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@349 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-09 20:50:11 +00:00
kiran
6cdad10dd1
Make output type identical to the bustard parser so the values can be easily swapped for one another.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@348 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-09 20:49:34 +00:00
kiran
d0ce56e018
Remember to take the strand flag into account when calculating error rate per cycle as a surrogate for instrument performance.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@347 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-09 20:48:45 +00:00
hanna
8a1207e4db
Bringing up scaffolding for integration of locus traversals by reference with Aaron's data source code.
...
Reverts to original TraverseByLociByReference behavior unless a special combination of command-line flags are used.
Lightly tested at best, and major flaws include:
- MicroManager is not doing MicroScheduling right now; it's driving the traversals.
- New database-ish data providers imply by their interface that they're stateless, but they're highly stateful.
- Using static objects to circumvent encapsulation.
- Code duplication is rampant.
- Plus more!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@346 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-09 20:28:17 +00:00
aaron
8e2f5471a1
Some cleanup to the data source, and another JUnit test case.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@344 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-09 14:58:05 +00:00
aaron
d56193b6df
Cleanup of a couple of output statements
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@343 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-09 14:09:07 +00:00
kcibul
c556a97f17
Skeleton of Somatic Coverage tool
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@342 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-09 02:34:03 +00:00
aaron
12752cf893
Added a bunch of fixes: MSRI wasn't working, sharding had broken edge cases, and SAMBAM DS needed to close the file handles.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@341 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-09 00:20:15 +00:00
kiran
089bf30cf4
Send things to the out file via the logger.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@339 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-08 21:49:03 +00:00
kiran
6db9a00a0b
SAMFileWriter doesn't appear to flush the buffer when its destructor is called. You have to call the close() method. Also, choose a random base for Ns in the forward and reverse strands so that samtools doesn't pitch a fit.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@338 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-08 21:48:24 +00:00
kiran
eb2f0ebd62
If the first base of a read is 'N', and the alignment cigar says every base matches, samtools calls shennanigans. Now I just output an A, but the real way to do this is to modify the cigar string accordingly.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@337 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-08 19:58:18 +00:00
kiran
0e7d962eca
Oops. Slight twiddle of the math here so that I'm not asking if bestBase == nextBestBase.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@336 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-08 19:56:54 +00:00
aaron
d4ab95c098
Added a constructor, took out a copy constructor, and changed some SAMBAM code.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@335 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-08 19:53:20 +00:00
kcibul
0b81a76420
added support for Picard IntervalList files to --interval_file
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@334 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-08 16:49:43 +00:00