Commit Graph

27 Commits (f5e52b72eaa8df6af88312594ac81ef4acfe87d5)

Author SHA1 Message Date
David Roazen c6581e4953 Update MD5s to reflect version number change in the BAM header
I've confirmed via a script that all of these differences only
involve the version number bump in the BAM headers and nothing
else:

< @HD   VN:1.0  GO:none SO:coordinate
---
> @HD   VN:1.4  GO:none SO:coordinate
2013-02-01 13:51:31 -05:00
Khalid Shakir 4ffb43079f Re-committing the following changes from Dec 18:
Refactored interval specific arguments out of GATKArgumentCollection into InvtervalArgumentCollection such that it can be used in other CommandLinePrograms.
Updated SelectHeaders to print out full interval arguments.
Added RemoteFile.createUrl(Date expiration) to enable creation of presigned URLs for download over http: or file:.
2013-01-16 12:43:15 -05:00
Mauricio Carneiro 2a4ccfe6fd Updated all JAVA file licenses accordingly
GSATDG-5
2013-01-10 17:06:41 -05:00
Khalid Shakir 746a5e95f3 Refactored parsing of Rod/IntervalBinding. Queue S/G now uses all interval arguments passed to CommandLineGATK QFunctions including support for BED/tribble types, XL, ISR, and padding.
Updated HSP to use new padding arguments instead of flank intervals file, plus latest QC evals.
IntervalUtils return unmodifiable lists so that utilities don't mutate the collections.
Added a JavaCommandLineFunction.javaGCThreads option to test reducing java's automatic GC thread allocation based on num cpus.
Added comma to list of characters to convert to underscores in GridEngine job names so that GE JSV doesn't choke on the -N values.
JobRunInfo handles the null done times when jobs crash with strange errors.
2012-06-27 01:15:22 -04:00
Mark DePristo 567dba0f76 Cleanup of VCF header lines and constants, BCF2 bugfixes
-- Created public static UnifiedGenotyper.getHeaderInfo that loads UG standard header lines, and use this in tools like PoolCaller
-- Created VCFStandardHeaderLines class that keeps standard header lines in the GATK in a single place.  Provides convenient methods to add these to a header, as well as functionality to repair standard lines in incoming VCF headers
-- VCF parsers now automatically repair standard VCF header lines when reading the header
-- Updating integration tests to reflect header changes
-- Created private and public testdata directories (public/testdata and private/testdata).  Updated tests to use test
-- SelectHeaders now always updates the header to include the contig lines
-- SelectVariants add UG header lines when in regenotype mode
-- Renamed PHRED_GENOTYPE_LIKELIHOODS_KEY to GENOTYPE_PL_KEY
-- Bugfix in BCF2 to handle lists of null elements (can happen in genotype field values from VCFs)
-- Throw error when VCF has unbounded non-flag values that don't have = value bindings
-- By default we no longer allow writing of BCF2 files without contig lines in the header
2012-06-21 15:16:31 -04:00
Eric Banks 62cee2fb5b Feature request from Tim that could be useful to all: there's now an --interval_padding argument that specifies how many basepairs to add to each of the intervals provided with -L (on both ends). This is particularly useful when trying to run over the exome plus flanks and don't want to have to pre-compute the flanks (just use e.g. --interval_padding 50). Added integration test to cover this feature. 2012-06-18 21:36:27 -04:00
Mark DePristo 6ca71fe3b4 GATK tests use public/testdata not /humgen/ as much as possible 2012-05-24 10:58:58 -04:00
Eric Banks 337ff7887a When constructing VariantContexts from symbolic alleles, check for the END tag in the INFO field; if present, set the stop position of the VC accordingly. Added integration test to ensure that this is working properly for use with -L intervals. 2012-04-04 10:57:05 -04:00
Mauricio Carneiro e1d69e4060 make the size of a GenomeLoc int instead of long
it will never be bigger than an int and it's actually useful to be an int so we can use it as parameters to array/list/hash size creation.
2012-02-03 17:12:42 -05:00
Matt Hanna cd43f016ce Fixed NPE in getNextOverlappingBAMScheduleEntry() when mixed mapped/unmapped interval lists are used. Added integrationtest to verify behavior. 2012-01-12 13:29:11 -05:00
David Roazen 1296dd41be Removing the legacy -L "interval1;interval2" syntax
This syntax predates the ability to have multiple -L arguments, is
inconsistent with the syntax of all other GATK arguments, requires
quoting to avoid interpretation by the shell, and was causing
problems in Queue.

A UserException is now thrown if someone tries to use this syntax.
2011-11-21 13:18:53 -05:00
Khalid Shakir c50274e02e During flanking interval creation merging overlapping flanks so that on scatter the list doesn't accidentally genotype the same site twice.
Moved flanking interval utilies to IntervalUtils with UnitTests.
2011-11-17 13:56:42 -05:00
Eric Banks 02d5e3025e Added integration test for intervals from bed file 2011-11-09 15:34:19 -05:00
Mark DePristo 392e0aeace Moved unit tests into master IntervalUtilsUnitTest 2011-11-02 10:52:00 -04:00
Mark DePristo c2b97030a4 IntervalUtils for completely balanced locus-based scatter/gather
-- scatterLocusIntervals master utility
-- Moved around some general functionality from GenomeLocSortedSet to GenomeLoc
-- Util function for reversing a list (List<T> -> List<T>, unlike Collections version)
-- DoC is PartitionType.INTERVAL
-- Significant unit tests on new functionality (all passing)
-- Ready for real-world testing, as soon as I can get LocusScatterFunction.scala to actually work
2011-11-02 10:49:40 -04:00
Eric Banks 0ca7428e76 Allow processing of empty intervals, but warn user when this case is encountered. 2011-10-28 12:12:14 -04:00
Eric Banks 8b1a62da27 Adding unit test to cover overlapping intervals from the same source with the intersection rule. 2011-10-28 09:59:43 -04:00
Eric Banks 6ba08a103d Empty ROD files should generate an exception when used for creating intervals. Moved some now obsolete files to the archive as the realigner will now read all target intervals into memory. 2011-10-28 09:23:25 -04:00
Eric Banks ccfd853b34 Added further integration tests for rod-based intervals that deal with more complex cases. Good call by Mark to test the empty VCF example because we were failing on it; fixed. 2011-10-27 20:43:50 -04:00
Eric Banks 8c4dbce6d8 Don't serialize the GATKArgumentCollection for the GATKRunReports (which would have meant dealing with the new IntervalBindings). Also, forgot to remove a test that's no longer relevant to BED parsing. 2011-10-27 13:58:19 -04:00
Eric Banks 3273c20c98 Added integration tests for Tribble-based intervals and fixed up some of the other tests based on some method changes. 2011-10-26 15:29:18 -04:00
Mark DePristo 9bd3ba4c7e Missed one MD5 2011-10-04 16:04:52 -07:00
Mark DePristo ffdfdcde3f Updating MD5s
-- Interval test now uses RG containing BAM
-- DoC sample name ordering has changed.
2011-10-04 15:54:45 -07:00
Mark DePristo 4ad330008d Final intervals cleanup
-- No functional changes (my algorithm wouldn't work)
-- Major structural cleanup (returning more basic data structures that allow us to development new algorithm)
-- Unit tests for the efficiency of interval partitioning
2011-09-19 10:19:10 -04:00
Mark DePristo 72536e5d6d Done 2011-09-09 15:44:47 -04:00
Mark DePristo 06cb20f2a5 Intermediate commit cleaning up scatter intervals
-- Adding unit tests to ensure uniformity of intervals
2011-09-09 12:56:45 -04:00
David Roazen 3c9497788e Reorganized the codebase beneath top-level public and private directories,
removing the playground and oneoffprojects directories in the process. Updated
build.xml accordingly.
2011-06-28 06:55:19 -04:00