Commit Graph

19 Commits (2953c9f06946e56c1db8d1976a044569fbe3d122)

Author SHA1 Message Date
hanna 2953c9f069 Efficiency improvement requested by the Picard team in IndexedFastaSequenceFile: improve the memory efficiency
(and loading time) of long reference sequences by better controlling the input buffer size.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3665 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-29 07:22:07 +00:00
hanna db1383d0b2 Rev the latest version of Picard.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3575 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-16 23:55:07 +00:00
bthomas 99b684ea89 Adding new support for reference data. ReferenceDataSource is a new class that manages reference data, and allows IndexedFastaSequenceFile to be a simple reader. This checkin also includes FastaSequenceIndexBuilder, which reads a fasta file and creates an index, like samtools faidx. Right now this is not enabled, because we are still working out thread safety. So the only new UI change is that GATK can be run without a fai file. Soon, we will enable 1) GATK to be run without a dict file too, and 2) both dict and fai files will be saved on disk for future program executions. For more info, see ReferenceDataSource.java
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3527 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-10 20:10:23 +00:00
hanna 9e2f831206 A bit of cleanup in preparation for Picard patch.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2286 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-08 16:09:04 +00:00
hanna 85a4fbc256 Bumping version of Picard for firehose compatibility.
Integration tests were validated against svn rev 1861, before the wonder
twins committed their changes.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1864 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-16 19:38:56 +00:00
hanna 32d55eb2ff Fix issue Eric was seeing with java.lang.Error in unmap0.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1804 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-09 17:46:56 +00:00
hanna f4b6afb42c JVM issue id 5092131 (http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=5092131)
was causing OOM issues with the new mmapping fasta file reader during large jobs.
Temporarily reverting the reader until a workaround can be found.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1801 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-09 04:45:46 +00:00
hanna fcb6a992c8 Switched IndexedFastaSequenceFile over to use memory mapping to load data rather than
the loop-with-small block size.  Performance improvements in loading refs are extreme;
segments can be loaded in <1ms.  chr1 in its entirety can be loaded in 1.5sec (down
from 30sec).


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1781 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-08 00:07:15 +00:00
aaron 2e4949c4d6 Rev'ing Picard, which includes the update to get all the reads in the query region (GSA-173). With it come a bunch of fixes, including retiring the FourBaseRecaller code, and updated md5 for some walker tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1751 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-30 20:37:59 +00:00
hanna 60a86fb34a Better handling of fasta files with non-standard extensions.x
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1206 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-09 18:18:48 +00:00
hanna fc7320133c Cleaned up error when fasta index is missing. Code still throws an exception, but the message is more direct (no more 'error while micromanaging') and tells the user to run 'samtools faidx' to fix the issue.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@867 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-01 15:34:38 +00:00
hanna 5e8c08ee63 Update to latest version of picard. Change imports in all classes dependent on picard public from import edu.mit.broad.picard... to import net.sf.picard...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@849 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-28 20:13:01 +00:00
hanna d35e20ce21 Better error checking for missing .dict file.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@741 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-17 21:57:12 +00:00
hanna e50ae97fe1 Introduce new index-based fasta reader. Clean up MicroManager code, pushing necessary code back into TraversalEngine.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@531 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-24 19:40:21 +00:00
hanna 45d962e491 I understood the contig index incorrectly when I initially wrote this code. Fixed.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@517 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-23 22:31:43 +00:00
hanna 56f6847456 Changed interface from contig,pos,length to more common contig,start,stop interface.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@441 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-16 00:04:41 +00:00
hanna 339261c4a9 Load the dictionary and sanity check it against the index.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@430 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-15 18:04:13 +00:00
hanna 26e84d7fd6 Added index iteration for ReferenceSequenceFile interface compatibility.
Added better error checking for querying past the end of a contig.
Lots more testing.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@429 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-15 17:17:11 +00:00
hanna 182626576f Basic indexed fasta POC in place. Requires a more complete implementation of the ReferenceSequenceFile interface,
and much more testing.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@425 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-15 13:46:56 +00:00