depristo
00722e19bc
The system now requires a dictionary file for a fasta file, or it throws an error. You can't just operate without a sequence dictionary any longer. We will transition to a GenomeLoc system that assumes a dictionary is available.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@319 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-07 22:19:54 +00:00
asivache
9c4fc633aa
Make it symmetric: if there is no sequence dictionary, also send a message to the logger, just like we do when we find the dict
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@318 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-07 21:44:39 +00:00
asivache
b64e4d1a04
seekForwardOffset changed (improved?): first, compareContigs does *not*, in general, return -1,0 or 1 if no dictionary is available; second, be more flexible in trying to jump to the right contig (current implementation of FastaFile2 will still through an exception if there's no dictionary, but iterator itself behaves transparently)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@317 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-07 21:42:33 +00:00
aaron
2663ac3e4a
documentation fix
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@316 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-07 21:39:50 +00:00
aaron
8a357a88a2
right...exponential should be exponential, so I might want to increment the exponent
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@315 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-07 20:12:05 +00:00
aaron
6ce9e0f941
delete the old strategy
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@314 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-07 19:40:03 +00:00
aaron
08fddd43af
-Replaced adaptive and linear strategies with an adaptive linear strategy
...
-Added the exponential growth strategy
-Added factory code that allows you to transitition between strategies, so if you want to move from linear to exp at a point, and then back when you've hit a runtime threshold, it will take care of it for you.
-Changed the code to return a Shard instead of a GenomeLoc
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@313 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-07 19:37:38 +00:00
aaron
6369d23b43
renamed; these files are more strategy than actual shards
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@312 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-07 16:50:56 +00:00
asivache
e95f427965
Added isReference() to AllelicVariant and updated rodDbSNP accordingly
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@311 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-07 14:49:20 +00:00
kiran
99579a1ef8
Math correction.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@310 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-07 02:18:13 +00:00
kiran
9be978e006
Intermediate commit (debugging info).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@309 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-07 01:20:15 +00:00
aaron
b42d8df646
the new shatter method, independent of the underlying data. The only thing needed to create a Shard is the reference seq, which may be a problem in reference less traversals, so the builder class is there so we can make different construction schemes.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@308 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-07 00:32:57 +00:00
aaron
0baa8c0f76
We need a base exception so we can differentiate between exceptions we've generated and those external to our code. All our exceptions should extend this exception. I'll migrate the ones I can find later on.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@307 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-07 00:13:45 +00:00
aaron
150bca30aa
typO in the documentation...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@306 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-06 23:05:59 +00:00
aaron
4aa9c0d591
Matt make a good point that the Reference Iterator we were using wasn't bounded; The BoundedReferenceIterator takes a GenomeLoc to bound the iterations by
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@305 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-06 23:03:56 +00:00
kiran
5a5c6d1276
Added some debugging stuff (writes model parameters to one file per cycle).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@304 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-06 22:00:58 +00:00
aaron
0fc8a90553
removing some files from the old approach to dataSource
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@303 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-06 21:57:34 +00:00
aaron
5feb7ee627
temperary fix, relying on a old reference order data constructor
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@302 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-06 21:38:41 +00:00
aaron
af5a443e5a
add Synchronized to the has_next and next methods
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@301 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-06 21:17:11 +00:00
aaron
97d14abe85
Interface check-in for Matt
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@300 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-06 21:14:19 +00:00
ebanks
d1c5e986d5
Another check to deal with bad reads (BWA output throws bad exceptions)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@298 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-06 04:58:22 +00:00
ebanks
3f75fc4e83
Unfortunately, because BWA occasionally outputs crazy reads, we need
...
to make sure not to have an ArrayIndexOutOfBoundsException thrown.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@297 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-06 03:51:35 +00:00
kiran
f12d40dde8
Simplified SAMRecord construction and emission.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@296 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-05 04:48:31 +00:00
asivache
0d25e71953
a declaration is made generic
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@295 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-04 21:55:02 +00:00
asivache
551ce9130f
added isBiallelic() to the AllelicVariant interface and to rodDbSNP implementation. We probably don't really know how to deal with non-biallelic sites just as yet...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@294 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-04 21:31:16 +00:00
ebanks
2e89d5e46f
That was an annoying bug to find. Mark, I want a beer.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@293 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-03 20:05:24 +00:00
depristo
4eac3193f7
Added RefMetaDataTracker system as a replacement for the List<RefenenceOrderedData> going into walkers. This system allows you to more easily get a tracker for processing using the lookup(name, default) system. See Pileup for an example.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@292 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-03 19:54:54 +00:00
depristo
c1abcfb014
Fixed problem where we were considering reads out of order because their stop positions where out of order, but with equal starts. This involved a change in the ordering feature of GenomeLoc, which now no longer sorts by both start and stop. So as long as the start positions are equal, things are considered "in order". Perhaps this isn't a good idea to change...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@291 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-03 19:53:33 +00:00
kiran
ef06924f73
JavaDocs!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@290 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-03 19:19:17 +00:00
ebanks
42eb356782
1. modifed by read traversals with indexes to be more general
...
2. GenomeLocs for reads should have ends spanning the read
(moved it to GenomeLoc from Utils)
3. Got rid of those stupid unmappable characters from comments in various files
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@289 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-03 18:24:08 +00:00
andrewk
86fc18e9fc
Fixed merge bug
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@288 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-03 17:41:58 +00:00
andrewk
bef475778f
- Updated --hapmap switch to --hapmap-chip to reflect the data being chip data for an individual rather than population allele frequency data in Hapmap
...
- Corrected some bugs to get metrics logging working
- Added a switch --force_1base_probs to ignore 4-base probalities if they exist
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@287 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-03 17:32:31 +00:00
depristo
edc44807af
rod's now have names. Use getName() to access it. Next step is better interface to accessing rods
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@286 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-03 16:41:33 +00:00
kiran
5019971290
Now outputs four-base SAM record (read name prefixed with KIR) and bustard SAM record (prefixed with BUS) for easy debugging.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@285 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-03 15:48:51 +00:00
kiran
15151ac125
Corrected the use of the prior.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@284 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-03 15:47:47 +00:00
kiran
b854c24575
Oops. I gave this method the wrong name first time around.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@283 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-03 15:46:26 +00:00
kcibul
9bbce32064
Basic dbSNP and HapMap frequency aware SNP caller... still in progress
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@282 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-03 14:24:09 +00:00
depristo
f031d882c6
ByReference traversals!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@281 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-03 13:23:18 +00:00
andrewk
e3ac0cb500
- A lot of code cleaned up; separated metrics code from AlleleFrequencyMetricsWalker into AlleleMetrics and eliminated the former class. AFMW (aside from being a name so long that it warrants an acronym) can now be implemented by passing an option to AlleleFreqeuncyWalker that logs metrics to a file.
...
- AlleleMetrics and AlleleMetricrsWalker are now ready to take a list of clasess that implement the AllelicVariant interface
- Switched a genome location in AlleleFrequencyEstimate from String to GenomeLoc which makes way more sense.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@280 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-03 02:09:10 +00:00
asivache
c6ab60ee04
change variable type to Boolean from boolean to make cmdline parser happy
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@279 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-02 22:35:30 +00:00
asivache
16aa979e34
make -A a true flag not an argument that asks for 'true/false' value!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@278 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-02 22:23:46 +00:00
kiran
7d889c0661
Refactored into oblivon.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@276 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-02 22:12:15 +00:00
kiran
dffc879240
Should now be appropriately using Bustard data to call bases (there are some mathematical subtleties that arise when no longer using ICs as initialization data. Also writes some more relevant fields in the SAM records. WAAAAAY simpler than old version. Like, super way.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@275 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-02 22:10:13 +00:00
kiran
59334b0270
A convenience class for manipulation base probability distributions.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@274 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-02 22:08:31 +00:00
kiran
399d9b8c1e
A class that represents the model parameters for all of the Gaussian models for all cycles.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@273 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-02 22:08:10 +00:00
kiran
f0f94b6c72
A class that represents the model parameters for all of the Gaussian models at a given cycle. Handles the accumulation of parameter initialization data and provides for efficient computation of base probability distribution.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@272 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-02 22:07:47 +00:00
kiran
a8a6c63a32
A class with some static methods that aid the manipulation of quality scores and probabilities (including a method to compress a base and quality score into a byte for SAM output.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@271 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-02 22:06:15 +00:00
jmaguire
b7a67da775
Expose the underlying SAM reader to the walkers.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@270 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-02 21:38:00 +00:00
jmaguire
8ce4dabd7c
Print coverage per reference base for each sample in a merged BAM file.
...
This is a good example for how to untangle a merged BAM file.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@269 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-02 21:35:31 +00:00
asivache
5d9b068b8b
generic declarations added here and there to eliminate a few annoying warnings; no consequential changes
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@268 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-02 20:53:01 +00:00