gatk-3.8

Commit Graph

Author	SHA1	Message	Date
David Roazen	605a5ac2e3	GATK engine: add ability to do on-the-fly BAM file sample renaming at runtime -User must provide a mapping file via new --sample_rename_mapping_file argument. Mapping file must contain a mapping from absolute bam file path to new sample name (format is described in the docs for the argument). -Requires that each bam file listed in the mapping file contain only one sample in their headers (they may contain multiple read groups for that sample, however). The engine enforces this, and throws a UserException if on-the-fly renaming is requested for a multi-sample bam. -Not all bam files for a traversal need to be listed in the mapping file. -On-the-fly renaming is done as the VERY first step after creating the SAMFileReaders in SAMDataSource (before the headers are even merged), to prevent possible consistency issues. -Renaming is done ONCE at traversal start for each SAMReaders resource creation in the SAMResourcePool; this effectively means once per -nt thread -Comprehensive unit/integration tests Known issues: -if you specify the absolute path to a bam in the mapping file, and then provide a path to that same bam to -I using SYMLINKS, the renaming won't work. The absolute paths will look different to the engine due to the symlink being present in one path and not in the other path. GSA-974 #resolve	2013-07-18 15:48:42 -04:00
David Roazen	c15751e41e	SAMReaderID: fix bug with hash code and equals() method -Two SAMReaderIDs that pointed at the same underlying bam file through a relative vs. an absolute path were not being treated as equal, and had different hash codes. This was causing problems in the engine, since SAMReaderIDs are often used as the keys of HashMaps. -Fix: explicitly use the absolute path to the encapsulated bam file in hashCode() and equals() -Added tests to ensure this doesn't break again	2013-07-15 13:57:00 -04:00
Eric Banks	3759d9dd67	Added the functionality to impose a relative ordering on ReadTransformers in the GATK engine. * ReadTransformers can say they must be first, must be last, or don't care. * By default, none of the existing ones care about ordering except BQSR (must be first). * This addresses a bug reported on the forum where BAQ is incorrectly applied before BQSR. * The engine now orders the read transformers up front before applying iterators. * The engine checks for enabled RTs that are not compatible (e.g. both must be first) and blows up (gracefully). * Added unit tests.	2013-03-06 12:38:59 -05:00
Mark DePristo	be45edeff2	ActivityProfile and ActiveRegions respects engine interval boundaries -- Active regions are created as normal, but they are split and trimmed to the engine intervals when added to the traversal, if there are intervals present. -- UnitTests for ActiveRegion.splitAndTrimToIntervals -- GenomeLocSortedSet.getOverlapping uses binary search to efficiently in ~ log N time find overlapping intervals -- UnitTesting overlap function in GenomeLocSortedSet -- Discovered fundamental implementation bug in that adding genome locs out of order (elements on 20 then on 19) produces an invalid GenomeLocSortedSet. Created a JIRA to address this: https://jira.broadinstitute.org/browse/GSA-775 -- Constructor that takes a collection of genome locs now sorts its input and merges overlapping intervals -- Added docs for the constructors in GLSS -- Update HaplotypeCaller MD5s, which change because ActiveRegions are now restricted to the engine intervals, which changes slightly the regions in the tests and so the reads in the regions, and thus the md5s -- GenomeAnalysisEngineUnitTest needs to provide non-null genome loc parser	2013-02-18 10:40:25 -05:00
Mauricio Carneiro	2a4ccfe6fd	Updated all JAVA file licenses accordingly GSATDG-5	2013-01-10 17:06:41 -05:00
Eric Banks	35d9bd377c	Moved (nearly) all Walkers from public to protected and removed GATKLite utils	2013-01-07 14:42:40 -05:00
Mauricio Carneiro	116885a450	Removed the "Walker" suffix from all walkers that had it. * Did not touch archived walkers... those can be named whatever. * Kept abstract classes that end in Walker untouched (e.g. LocusWalker, ReadWalker, ...) * Renamed a few inner classes due to conflict when stripping off Walker from their outer classes: ContigStats, FlagStats and FastaStats.	2012-07-20 17:27:11 -04:00
Khalid Shakir	746a5e95f3	Refactored parsing of Rod/IntervalBinding. Queue S/G now uses all interval arguments passed to CommandLineGATK QFunctions including support for BED/tribble types, XL, ISR, and padding. Updated HSP to use new padding arguments instead of flank intervals file, plus latest QC evals. IntervalUtils return unmodifiable lists so that utilities don't mutate the collections. Added a JavaCommandLineFunction.javaGCThreads option to test reducing java's automatic GC thread allocation based on num cpus. Added comma to list of characters to convert to underscores in GridEngine job names so that GE JSV doesn't choke on the -N values. JobRunInfo handles the null done times when jobs crash with strange errors.	2012-06-27 01:15:22 -04:00
Eric Banks	0ca7428e76	Allow processing of empty intervals, but warn user when this case is encountered.	2011-10-28 12:12:14 -04:00
Eric Banks	ccfd853b34	Added further integration tests for rod-based intervals that deal with more complex cases. Good call by Mark to test the empty VCF example because we were failing on it; fixed.	2011-10-27 20:43:50 -04:00
Eric Banks	3273c20c98	Added integration tests for Tribble-based intervals and fixed up some of the other tests based on some method changes.	2011-10-26 15:29:18 -04:00
David Roazen	f18fffd625	Fixing broken paths to the testdata directory throughout the codebase.	2011-06-29 17:36:47 -04:00
David Roazen	3c9497788e	Reorganized the codebase beneath top-level public and private directories, removing the playground and oneoffprojects directories in the process. Updated build.xml accordingly.	2011-06-28 06:55:19 -04:00

13 Commits (cdfd07f9eb4e2ca18b2b6b10d00797cd3a156ebd)