hanna
2953c9f069
Efficiency improvement requested by the Picard team in IndexedFastaSequenceFile: improve the memory efficiency
...
(and loading time) of long reference sequences by better controlling the input buffer size.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3665 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-29 07:22:07 +00:00
hanna
db1383d0b2
Rev the latest version of Picard.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3575 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-16 23:55:07 +00:00
bthomas
99b684ea89
Adding new support for reference data. ReferenceDataSource is a new class that manages reference data, and allows IndexedFastaSequenceFile to be a simple reader. This checkin also includes FastaSequenceIndexBuilder, which reads a fasta file and creates an index, like samtools faidx. Right now this is not enabled, because we are still working out thread safety. So the only new UI change is that GATK can be run without a fai file. Soon, we will enable 1) GATK to be run without a dict file too, and 2) both dict and fai files will be saved on disk for future program executions. For more info, see ReferenceDataSource.java
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3527 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-10 20:10:23 +00:00
hanna
9e2f831206
A bit of cleanup in preparation for Picard patch.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2286 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-08 16:09:04 +00:00
hanna
85a4fbc256
Bumping version of Picard for firehose compatibility.
...
Integration tests were validated against svn rev 1861, before the wonder
twins committed their changes.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1864 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-16 19:38:56 +00:00
hanna
32d55eb2ff
Fix issue Eric was seeing with java.lang.Error in unmap0.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1804 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-09 17:46:56 +00:00
hanna
f4b6afb42c
JVM issue id 5092131 ( http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=5092131 )
...
was causing OOM issues with the new mmapping fasta file reader during large jobs.
Temporarily reverting the reader until a workaround can be found.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1801 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-09 04:45:46 +00:00
hanna
fcb6a992c8
Switched IndexedFastaSequenceFile over to use memory mapping to load data rather than
...
the loop-with-small block size. Performance improvements in loading refs are extreme;
segments can be loaded in <1ms. chr1 in its entirety can be loaded in 1.5sec (down
from 30sec).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1781 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-08 00:07:15 +00:00
aaron
2e4949c4d6
Rev'ing Picard, which includes the update to get all the reads in the query region (GSA-173). With it come a bunch of fixes, including retiring the FourBaseRecaller code, and updated md5 for some walker tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1751 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-30 20:37:59 +00:00
hanna
60a86fb34a
Better handling of fasta files with non-standard extensions.x
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1206 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-09 18:18:48 +00:00
hanna
fc7320133c
Cleaned up error when fasta index is missing. Code still throws an exception, but the message is more direct (no more 'error while micromanaging') and tells the user to run 'samtools faidx' to fix the issue.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@867 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-01 15:34:38 +00:00
hanna
5e8c08ee63
Update to latest version of picard. Change imports in all classes dependent on picard public from import edu.mit.broad.picard... to import net.sf.picard...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@849 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-28 20:13:01 +00:00
hanna
d35e20ce21
Better error checking for missing .dict file.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@741 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-17 21:57:12 +00:00
hanna
e50ae97fe1
Introduce new index-based fasta reader. Clean up MicroManager code, pushing necessary code back into TraversalEngine.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@531 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-24 19:40:21 +00:00
hanna
45d962e491
I understood the contig index incorrectly when I initially wrote this code. Fixed.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@517 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-23 22:31:43 +00:00
hanna
56f6847456
Changed interface from contig,pos,length to more common contig,start,stop interface.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@441 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-16 00:04:41 +00:00
hanna
339261c4a9
Load the dictionary and sanity check it against the index.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@430 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-15 18:04:13 +00:00
hanna
26e84d7fd6
Added index iteration for ReferenceSequenceFile interface compatibility.
...
Added better error checking for querying past the end of a contig.
Lots more testing.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@429 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-15 17:17:11 +00:00
hanna
182626576f
Basic indexed fasta POC in place. Requires a more complete implementation of the ReferenceSequenceFile interface,
...
and much more testing.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@425 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-15 13:46:56 +00:00