Commit Graph

88 Commits (4f51a02dea94bc669796f3ae9e6ff6d4581a3e5b)

Author SHA1 Message Date
kshakir 4f51a02dea Changed logging level to default at INFO instead of WARN.
Changes to StingUtils command line for use in Queue, replacing Queue's use of property files.
Updates to walkers used in existing QScripts to add @Input/@Output.
RMD used in @Required/@Allows now has a new default equal to "any" type.
New QueueGATKExtensions.jar generator for auto wrapping walkers as Queue CommandLineFunctions.
Added hooks to modify the functions that perform the Scattering and Gathering (setting their jar files, other arguments, etc.)
Removed dependency on BroadCore by porting LSF job submitter to scala.
Ivy now pulls down module dependencies from maven.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3984 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-09 16:42:48 +00:00
aaron b3fd145161 fix for a bug deep in the tribble indexing: if you had a single record in the first contig, the second contig's index blocks would point to the wrong file seek location, and you'd see no
features in that contig. Thanks to Mark for finding this.  I'm not rev'ing the index version (which would cause all indexes to be rebuilt), since this seems like a pretty rare edge case.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3865 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-23 18:39:55 +00:00
aaron 9579aace1f updates to code dependent on Tribble, as well as the following Tribble changes:
- makes writing to disk optional for indexes using the indexCreator classes (allow the user to specify the index file, if null don't write it)
- removed some system.out debugging code
- fixed version checking in interval tree 
- made indexes store and return a LinkedHashSet for sequence names (to ensure they've preserved the ordering in the file)
- index creators now read the file before creating the index
- changed the Index.write() method to take a LEDataStream instead of a file
- removed the sequence dictionary code on the header
- added utils for getting LEDataStreams
- added a base Tribble exception




git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3857 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-23 01:56:10 +00:00
aaron 1cba81c16f updates to tribble with fixes for some bugs I've found in some new indexing code.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3842 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-20 22:08:04 +00:00
aaron af6b5f000e updating the Tribble library; added writing of indexes to the index interface for working with the tree index.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3836 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-20 07:02:08 +00:00
aaron 250ab70fed update the Tribble library too.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3827 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-19 05:00:37 +00:00
hanna b6af17b82d Rev Picard with new IndexedFastaSequenceFile patch.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3700 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-01 03:05:43 +00:00
aaron dff4c06763 Rev'ing Tribble with a special version that has excluded VCF 3.3
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3640 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-25 18:20:51 +00:00
hanna 4840ef6d3e Another rev of picard for /dev/null writing.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3627 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-24 19:22:47 +00:00
hanna c32f9d78ae Rev picard again, this time for error writing to /dev/null.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3626 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-24 04:08:26 +00:00
aaron 54ae0b8e4e some updates to tribble for the svn commit that will follow
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3621 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-23 20:20:07 +00:00
hanna 26d51bbe14 Another round of optimizations from Alec. Switching the header merger to
an IdentityHashMap provides another 10x+ performance boost over his previous
optimization for us.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3616 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-23 14:54:58 +00:00
hanna 003dd4de3e Rev Picard with performance enhancements.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3615 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-22 22:54:23 +00:00
aaron 5b87a00a5f updating with associated Tribble changes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3605 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-22 07:54:15 +00:00
depristo 57a13805da GATK now uses a optimized indexing scheme in Tribble. 5x or more performance gain on files with many genotypes. Updated integrationtest that was failing and was clearly wrong. DB=; isn't a valid annotation.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3596 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-19 21:36:41 +00:00
aaron 32f6781ac7 updating tribble with the VCF header changes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3583 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-18 08:20:44 +00:00
hanna db1383d0b2 Rev the latest version of Picard.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3575 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-16 23:55:07 +00:00
ebanks 8c28be5933 Fixing a VCF bug for Sendu: we weren't emitting flags (booleans) correctly in VCF3.3 (rev'ed tribble for this).
Updated dbsnp/hapmap membership info fields to be flags now instead of ints.
While I was there, I added the change in the Annotator for Jan to force reads to be from a specific sample.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3536 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-11 16:42:06 +00:00
aaron e27951ab39 re-updating the VCF code to handle spaces in sample names
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3528 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-10 20:18:34 +00:00
aaron ad98512f6c adding changes so that we look at the headers already loaded by the engine for samples and other VCF utils, and not create readers for each file to get them (this caused Tribble to regerenate indices if the index file can't be written to disk).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3518 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-09 17:21:12 +00:00
aaron 6febd0291d rev tribble to include some dbsnp clean-up and fixes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3510 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-09 03:08:31 +00:00
aaron 6d5556939d updating Tribble with a couple of important Tabix fixes, and updating the variant eval integration tests to run each test with both plain vcf and gzipped tabix (added the tabix version
to the vlidation directory), using the same md5sum.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3509 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-09 01:47:04 +00:00
hanna c1ecf75dd5 Update to the latest rev of the picard sharding patch. Includes updates reflecting
the imminent move of IlluminaUtil into picard public.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3493 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-06 20:33:21 +00:00
aaron 0b03e28b60 updating the tribble library to include the reference dictionary reading / writing. We now check the dictionaries of any tracks that have them against the reference (all new tribble tracks and out-of-date tracks will have this). Also renamed some classes to be more reflective of their function.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3485 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-04 06:34:26 +00:00
ebanks ffeb3fd80d Thanks to Guillermo, I found a bug in the Unified Genotyper output: GL was posteriors instead of likelihoods. Not a huge deal because the
priors were flat, but fixed nonetheless.
Also, needed to update Tribble.
Minor updates to the Beagle input maker.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3461 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-28 19:28:26 +00:00
aaron 98350da177 rev tribble
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3456 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-28 14:52:39 +00:00
aaron f7c9f131ea revisioning tribble to version 85, which includes tabix and bug fixes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3428 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-25 04:28:32 +00:00
kshakir e9ee55d7dd A cleaned up functioning early, early access version of Queue for others to play with and provide feedback about next steps.
Current version only has syntatic sugar for accessing the graph via rules ex. "bam" -> "bam.bai", "samtools index ${bam}" and DOES NOT have sugar for constructing your own graph.
Usage info on the internal wiki at https://iwww.broadinstitute.org/gsa/wiki/index.php/Queue


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3420 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-23 20:21:09 +00:00
aaron 7cfb9ff3dc updates for Tribble 82, fixes for Ryans case where multiple processes would attempt to read/write to the same index, and a couple other Tribble-centric bug fixes.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3382 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-18 19:34:45 +00:00
aaron 2c55ac1374 fixes for parallel processing problems with Tribble, a small bug in the resource pool, and some more documentation.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3349 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-12 06:13:26 +00:00
hanna 76efa757f0 Switched over to reviewed version of Picard patch. In process, did some optimization to the IntervalSharder
which improved startup time 5-10x when dynamically merging many BAMs.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3331 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-08 14:12:22 +00:00
aaron 7d2df3f511 example windowed ROD walker for Kristian, and updates to Tribble
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3325 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-07 17:12:50 +00:00
aaron 78409dca0d turned off the progress output from tribble when making an index, and fixing a case where the index file isn't writable so we instead make the index in memory.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3312 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-06 16:36:58 +00:00
aaron d91b27aca1 updating Tribble with VCF changes from Eric
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3310 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-06 04:03:21 +00:00
aaron 7467ec2fd6 updating the reflections library; Matt found a problem where the reflections library doesn't sort out non-java objects from the classpath (affects only OS X so far). I'll push back the changes to
the reflections library people.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3307 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-06 02:08:41 +00:00
aaron 97dd04cbf0 updating Tribble ahead of the big VCF commit
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3297 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-05 05:17:54 +00:00
aaron 447081583a rev tribble with updated version
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3287 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-03 04:07:28 +00:00
aaron b648e89096 updating Tribble with a bunch of bug and performance fixes found while performance testing GeliText in the GATK
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3267 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-28 18:45:10 +00:00
aaron 64c5f287c5 fixes for edge-cases when using reflections to find classes outside of the main jar. Will push as a patch to reflections
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3264 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-27 17:46:46 +00:00
aaron c647153b10 Adding Jama for Ryan.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3262 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-27 14:30:36 +00:00
aaron f6468f9143 a fix for a bug we've worked around in the reflections package: previously it didn't find classes that weren't in the main jar. Fixed in this version.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3261 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-27 04:49:49 +00:00
hanna 4bb8984f80 Updating picard -- switching to Alec's more robust fix for gzip decompression issue
and updating serialization components.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3154 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-12 19:56:45 +00:00
hanna b60197ae10 Another round of cleanup and simplification in Picard -- Picard's unit tests
are now passing for my branch.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3100 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-31 01:02:59 +00:00
hanna 400684542c Revisions to take into account finalization of Picard patch: naming changes, better definition
of public interfaces.  This won't be the last Picard patch, but it should be the last big one.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3096 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-30 19:28:14 +00:00
hanna 85037ab13f Fix for Kiran's sharding issue (Invalid GZIP header). General cleanup of
Picard patch, including move of some of the Picard private classes we use to Picard public.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3087 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-29 03:21:27 +00:00
hanna 46c14ec63f New, much less memory intensive implementation of BAM file sharding. Streams indices together with the expectation
that bins will be present in the bin sparse array, which avoids the problem of having to hold the sparse bin array
stored in every BAM file index in memory at the same time.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3075 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-25 17:41:22 +00:00
hanna 1f451e17e5 Changing preloaded index to only "preload" reference sequences on demand.
Results in drastic lowering of startup cost when multiple BAM files are 
merged.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3066 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-23 22:02:28 +00:00
hanna 884a577013 Phase 2 of Picard patch refactoring: kill off SAMFileReader2/BAMFileReader2, merging the changes back into the base classes.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3065 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-23 16:48:11 +00:00
hanna 169d0c6e8f Up the svn revision number in an attempt to force an update, again due to an
artifact of the way we build picard-private-parts.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3051 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-19 22:39:56 +00:00
hanna c0eb5c27ea Lower memory support for merged sharding. Merged sharding is still not available.
WARNING: If you update frequently, you might have to rm -rf ~/.ant/cache -- this is an unfortunate side effect of the way we
	 distribute picard-private.jar.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3050 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-19 22:03:47 +00:00