aaron
|
2e4949c4d6
|
Rev'ing Picard, which includes the update to get all the reads in the query region (GSA-173). With it come a bunch of fixes, including retiring the FourBaseRecaller code, and updated md5 for some walker tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1751 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-09-30 20:37:59 +00:00 |
hanna
|
433ad1f060
|
Cleanup...deprecate FastaSequenceFile2 in favor of IndexedFastaSequenceFile or ReferenceSequenceFile from Picard, depending on the application.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1196 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-07-08 18:49:08 +00:00 |
kiran
|
e3cdf7ef4b
|
A single class that can be handed reads for training and basecalling. When in training mode, we accumulate no more than 10000 reads and always replace the lowest-quality reads with superior quality reads. Thus, the training set always contains 10000 of the best reads available. After training is complete, the class can be interrogated to return the SQ tag for a given RawRead object.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1125 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-06-30 16:03:15 +00:00 |
kiran
|
ee2af3b423
|
I committed this too soon... reverting...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1106 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-06-25 20:49:12 +00:00 |
kiran
|
23680a9a16
|
Replaced an expensive sort with an inexpensive direct computation.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1104 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-06-25 20:25:12 +00:00 |
kiran
|
7b5d8d7604
|
Changed the intensities array order from cycle,channel to channel,cycle. This, I'm told, is a far more efficient allocation strategy.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1084 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-06-24 15:41:06 +00:00 |
kiran
|
03fe166994
|
Wrote a public static version of loadFirstNReasonableReadsTrainingSet() so Alec can call it.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1046 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-06-18 20:18:17 +00:00 |
kiran
|
e7f222108d
|
More accessors. Can compute the sum of the quality scores in the read (useful for sorting) and can return a subset of itself.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@948 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-06-09 01:02:48 +00:00 |
kiran
|
6506504a60
|
Updates after seeing a certain number of reads, not a certain number of bases.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@947 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-06-09 01:01:36 +00:00 |
kiran
|
65d0675a4e
|
Some changes regarding what to do when a cycle is completely busted.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@946 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-06-09 01:01:13 +00:00 |
kiran
|
0bd78d72d7
|
Some changes regarding what to do when a cycle is completely busted.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@945 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-06-09 01:00:33 +00:00 |
hanna
|
5e8c08ee63
|
Update to latest version of picard. Change imports in all classes dependent on picard public from import edu.mit.broad.picard... to import net.sf.picard...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@849 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-05-28 20:13:01 +00:00 |
kiran
|
cd80e3f372
|
Replaced dumb training function with a version that creates a training set slightly more sensibly.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@806 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-05-22 19:34:33 +00:00 |
kiran
|
02c0afdb85
|
Added the ability to specify the sorted, unaligned bam and/or the sorted, aligned bam such that broken computations can be restarted.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@805 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-05-22 19:33:34 +00:00 |
kiran
|
287bb52e81
|
Refreshes the mount points that we'll be using (so that the program will play nicely with LSF).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@783 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-05-21 20:36:12 +00:00 |
kiran
|
5f67914b08
|
Added loads of documentation.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@777 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-05-21 19:40:47 +00:00 |
kiran
|
747521c849
|
Fixed the simplest of typos.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@761 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-05-20 16:00:30 +00:00 |
kiran
|
e48078b476
|
Updated to reflect change to BasecallingReadModel constructor.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@760 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-05-20 15:43:26 +00:00 |
kiran
|
505f588768
|
Forgot to say that the mate is unmapped too. This is necessary to prevent SAM-JDK from yelling at me about an invalid SAM file.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@759 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-05-20 15:38:51 +00:00 |
kiran
|
6c5fbb988b
|
Now basecalls an entire read (both ends of the pair, barcode... everything) at once. After, RawRead and FourProbRead can be asked to return a specified subset (corresponding to the ranges specified for each end of the read.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@754 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-05-20 00:09:20 +00:00 |
kiran
|
e293d65ede
|
Refactored to allow the user to specify the range of cycles they wish to call. Simply specify a single range (i.e. '0-75') or two ranges ('0-75,76-151'). This allows single and paired-end read processing to coexist happily. Also implements annotation of an aligned bam file (which should hopefully fit in under two gigs now, but I'm waiting on a bug fix or a clarification from the Picard team.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@753 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-05-20 00:07:24 +00:00 |
kiran
|
08c9f4d86b
|
Renamed to BasecallingTrainer.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@752 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-05-20 00:03:46 +00:00 |
kiran
|
7c615c8fb0
|
Some changes to the system for annotating a pre-aligned bam file. Doesn't fit within 2gigs.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@746 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-05-18 17:42:08 +00:00 |
kiran
|
28bf7ec8ad
|
Aesthetic cleanup.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@735 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-05-17 04:09:23 +00:00 |
kiran
|
a0464633fd
|
Whoops. Changed denominator from reads to bases.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@734 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-05-17 03:42:25 +00:00 |
kiran
|
5d60efc498
|
Factored out some simple stats accumulation.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@733 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-05-17 03:37:57 +00:00 |
kiran
|
6f1559bd77
|
Cleaned up a bit. Added some documentation.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@728 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-05-15 21:22:24 +00:00 |
kiran
|
dae77bf14a
|
Fixed a typo in a comment.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@723 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-05-15 20:07:31 +00:00 |
kiran
|
bfc40f54f0
|
Nicer output when training off of perfect reads. Not that that works yet...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@722 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-05-15 20:07:08 +00:00 |
kiran
|
36db44620b
|
Improved output. Can optionally limit the number reads actually called.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@720 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-05-15 00:07:57 +00:00 |
kiran
|
5858f20902
|
Documentation.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@712 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-05-14 18:58:43 +00:00 |
kiran
|
3761c0900b
|
Added Bustard vs. Four-prob percent bases consistent output.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@710 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-05-14 18:01:41 +00:00 |
kiran
|
959cf09d4b
|
Removed some debugging print statements.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@707 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-05-14 17:12:42 +00:00 |
kiran
|
2f42a643a8
|
A new, much simpler (and now, complete) driver program for four-base probs. Serves as a model for anyone who wants to write their own driver program that trains and calls with data from a different source than the raw Illumina data.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@706 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-05-14 16:58:22 +00:00 |
kiran
|
5824dea0c1
|
Trains and calls a read at a time rather than a base at a time (which, given it's name, it should have done in the first place)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@705 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-05-14 16:57:00 +00:00 |
kiran
|
e4770885fd
|
The four-probs for all bases in a single read. Some utility functions for generating the primary and secondary base strings, as well as generating the SQ tag byte array in a manner that's consistent with the Bustard base calls (meaning the primary Bustard call and the secondary Four-Prob call are not permitted to be the same).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@704 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-05-14 16:55:49 +00:00 |
kiran
|
fdd123fe16
|
A parser the raw Illumina data. Allows one to arbitrarily jump from one tile to another.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@703 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-05-14 16:53:07 +00:00 |
kiran
|
6d98234555
|
Holds raw intensities, sequence, and quality scores.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@701 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-05-14 16:52:03 +00:00 |
kiran
|
241de0b235
|
A class that implements multiple training strategies and presents the training data in a common form.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@700 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-05-14 16:51:29 +00:00 |
depristo
|
5b47c5ab6c
|
fixing kiran's busted build
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@686 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-05-12 21:29:04 +00:00 |
kiran
|
4f2c8bf0a3
|
Fixed an import statement that broke when all the files were moved to this directory.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@685 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-05-12 20:43:16 +00:00 |
kiran
|
cedc4c9ccb
|
Refactored into oblivion.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@684 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-05-12 20:33:07 +00:00 |
kiran
|
688358190c
|
Moved secondary base stuff out of playground for the purpose of making it a core utility. Modified package names and imports such that things would build properly.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@680 348d0f76-0448-11de-a6fe-93d51630548a
|
2009-05-12 20:24:18 +00:00 |