gatk-3.8

Commit Graph

Author	SHA1	Message	Date
hanna	1d50fc7087	Misc bug fixes: fix tracking of nInsertions with sample-split pileup constructor. Fix performance issue building up pileups from pileups of individual sample data. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3598 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-20 20:32:27 +00:00
hanna	f18ac069e2	A refactoring / unification of ReadBackedPileup and ReadBackedExtendedEventPileup. Provides a cleaner interface with extended events inheriting all of the basic RBP functionality. Implementation is still slightly messy, but should allow users to provide separate implementations of methods for sample split pileups and unsplit pileups for efficiency's sake. Methods not covered by unit/integration tests have not been sufficiently tested yet. Unit tests will follow this week. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3597 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-20 04:42:26 +00:00
hanna	52477bd9e6	Add some missing methods to the pileup architecture. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3588 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-18 15:03:08 +00:00
aaron	b978d5946b	adding changes for VCF 4, mostly in the way we handle VCF headers. The header fields are now aware of the differences between different VCF formats. There was also a bunch of clean-up of out-of-spec VCF used in the tests (mismatched VCF file format fields, etc), and updates to the associated integration tests. Also some logging statements for BTI. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3584 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-18 08:23:23 +00:00
hanna	612c3fdd9d	First pass at eliminating the old sharding system. Classes required for the original sharding system are gone where I could identify them, but hierarchies that split to support two sharding systems have not yet been taken apart. @Eric: ~4k lines. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3580 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-17 20:17:31 +00:00
hanna	db1383d0b2	Rev the latest version of Picard. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3575 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-16 23:55:07 +00:00
ebanks	f003703912	Allow specification of particular rods for pulling out sample names. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3570 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-16 19:37:09 +00:00
asivache	671ac00748	A simple utility class that implements a merging Iterator<GenomeLoc> built over an interval or bed file (this is NOT a rod, but rather a direct line-by-line file reader that converts strings to genome locs on the fly and merges overlapping intervals) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3546 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-14 15:54:37 +00:00
asivache	7b7d3341f0	trivial refactoring: isFile renamed to isIntervalFile and made public git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3541 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-14 14:02:23 +00:00
hanna	c3b68cc58d	Rethinking DownsamplingLocusIteratorByState with a flattened read structure. Samples are kept independent while processing, and only merged back in a priority queue if necessary in a special variant of the ReadBackedPileup. This code is not live yet except in the case of naive deduping. Downsampling by sample temporarily disabled, and the ReadBackedPileup variant is sketchy and not well integrated with StratifiedAlignmentContext or the walkers. Cleanup to follow. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3540 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-13 01:47:02 +00:00
asivache	e6d8faf293	making 'parseLocation' public static - as simple as the logic is, it's better kept in one place and I need it! git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3537 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-11 18:19:59 +00:00
weisburd	3b375cb237	Sped up parseGenomeLoc(..) by replacing regexp with String.indexOf(..) - attempt 2 git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3529 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-10 20:54:36 +00:00
bthomas	99b684ea89	Adding new support for reference data. ReferenceDataSource is a new class that manages reference data, and allows IndexedFastaSequenceFile to be a simple reader. This checkin also includes FastaSequenceIndexBuilder, which reads a fasta file and creates an index, like samtools faidx. Right now this is not enabled, because we are still working out thread safety. So the only new UI change is that GATK can be run without a fai file. Soon, we will enable 1) GATK to be run without a dict file too, and 2) both dict and fai files will be saved on disk for future program executions. For more info, see ReferenceDataSource.java git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3527 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-10 20:10:23 +00:00
aaron	6941c81bfa	reverting revision 3522 to the old code until we fix the tests. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3524 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-09 19:25:02 +00:00
weisburd	adc4c4e577	Sped up parseGenomeLoc(..) by replacing regexp with String.indexOf(..) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3522 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-09 18:11:43 +00:00
aaron	ad98512f6c	adding changes so that we look at the headers already loaded by the engine for samples and other VCF utils, and not create readers for each file to get them (this caused Tribble to regerenate indices if the index file can't be written to disk). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3518 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-09 17:21:12 +00:00
ebanks	9b2fcc4711	Refactoring of the annotation system: 1. VA is now a ROD walker so it no longer requires reads (needs a little more testing) 2. Annotations can now represent multiple INFO fields (i.e. sets of key/value pairs) 3. The chromosome count annotations have been pulled out of UG and the VCF writer code and into VA where they belong. Fixed the headers too. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3513 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-09 17:05:51 +00:00
depristo	e2b41082af	GATK now does automatic adaptor filtering in locus iterators (but not expt. downsampling iterator). General support for LocusIteratorFilters just like read filters but only applying at particular bases. Updated tools with new MD5 sums due to adaptor bases in their integrationtest data. Not that as a side effect here reads close to each other with odd orientations are also filtered out. Updated minor argument to VariantRecalibrator to change the qStep value on the command line git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3481 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-02 22:26:32 +00:00
ebanks	4a555827aa	Removing more toUpperCase sanity checks git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3471 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-02 14:38:39 +00:00
depristo	2b02324587	Support for detecting and automatically excluding reads reading into the adaptor sequence and, if desired, also only showing the first pair when two reads overlap in the fragment. Not enabled, an intermediate check in before updating and verifying the impact on locus walkers everywhere. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3465 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-30 18:00:12 +00:00
aaron	871cf0f4f6	Call out ROD types by there record type, instead of the codec type (which was clumsy). So instead of: @Requires(value={},referenceMetaData=@RMD(name="eval",type= VCFCodec.class)) you'd say: @Requires(value={},referenceMetaData=@RMD(name="eval",type= VCFRecord.class)) Which is more in-line with what was done before. All instances in the existing codebase should be switched over. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3457 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-28 14:52:44 +00:00
depristo	f2e7582cfc	Reorganization of SW code for clarity. Totally failure at raw optimization. Discovered that ~50% of reads being cleaned were perfect reference matches. New code comes with flag to look at NM field and not clean perfect matches. Can we turned off with command line option (needed for 1KG bams with bad NM fields). Going to rerun cleaning jobs due to accidentally rebuilding of stable codebase and loss of 2 days of runtime. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3452 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-27 23:16:00 +00:00
aaron	cded9ec985	adding a command line option, -etd (enable threaded debugging), that uses a custom thread pool class to catch exceptions thrown inside of a thread. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3450 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-27 21:57:56 +00:00
depristo	dfc36c1e95	Restructuring of the mandatory read filters for traversals. Now everything uses ReadFilters, even for the required filters like being mapped for LocusWalkers. Statistics now tracked for each read filter used during the traversal and info emitted in INFO at the end. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3445 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-26 22:12:25 +00:00
depristo	5928047d8b	Optimization of reference window calculation to us bytes not char and no uppercasing since reference and read bases are always uppercase now. Should remove some ~5% of runtime of UG. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3438 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-26 14:10:26 +00:00
ebanks	ae6c014884	Fixed UG parallelization bug. Better integration test to catch this in the future. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3432 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-25 21:03:45 +00:00
ebanks	772f558ae0	Massive change to the indel realigner code. We now properly deal with soft-clipped reads. Also, improved left-alignment code. Small change for Ryan to get hard-clipped reads working for the recalibrator. PLEASE DO NOT RELEASE THIS WEEK. I still have some more testing to do and need Mark to run WG jobs. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3430 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-25 20:04:33 +00:00
depristo	a10fca0d5c	Genotyper now is using bytes not chars. Passes all tests. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3406 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-20 21:02:44 +00:00
aaron	b543dd4ac4	more aggressive checks for the locking, and some more documentation git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3404 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-20 16:16:36 +00:00
depristo	727822adb4	BaseUtils has more clear distinction between byte and char routines. All char routines are @Depreciated now. Please use bytes. Better organization of reverse(), now in Utils not BaseUtils. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3400 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-20 14:05:13 +00:00
depristo	6ce3835622	Removing unused methods in QualityUtils; ReferenceContext now converting all bases to upper case, but can be disabled with static boolean git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3399 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-20 12:38:06 +00:00
depristo	5abac5c057	A few more char -> byte cleanups git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3398 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-20 00:02:06 +00:00
depristo	8a725b6c93	Restructuring of ReferenceContext and ReadWalkers to accept a ReferenceContext. Now ReferenceContext is byte[] backed not char[]. Please no more chars for the reference. All of the tests pass now. Coming check-ins are going to clean up the char / byte problems in the GATK git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3397 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-19 23:27:55 +00:00
hanna	017ab6b690	Experimental versions of downsampler and Ryan's deduper are now available either as walker attributes or from the command-line. Not ready yet! Downsampling/deduping works in a general sense, but this approach has not been completely optimized or validated. Use with caution. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3392 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-19 05:40:05 +00:00
weisburd	2f3933148d	Added fast split(str, delimiter) methodf git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3384 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-19 03:37:26 +00:00
aaron	7cfb9ff3dc	updates for Tribble 82, fixes for Ryans case where multiple processes would attempt to read/write to the same index, and a couple other Tribble-centric bug fixes. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3382 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-18 19:34:45 +00:00
hanna	0791beab8f	Checking in downsampling iterator alongside LocusIteratorByState, and removing the reference implementation. Also implemented a heap size monitor that can be used to programmatically report the current heap size. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3367 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-17 21:00:44 +00:00
aaron	2c55ac1374	fixes for parallel processing problems with Tribble, a small bug in the resource pool, and some more documentation. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3349 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-12 06:13:26 +00:00
hanna	76efa757f0	Switched over to reviewed version of Picard patch. In process, did some optimization to the IntervalSharder which improved startup time 5-10x when dynamically merging many BAMs. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3331 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-08 14:12:22 +00:00
depristo	504103bd15	Misc. additions to correct utilities git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3329 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-07 21:34:18 +00:00
aaron	06ea65e60b	again for JIRA GSA-320 git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3319 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-07 03:47:58 +00:00
aaron	ac9b32db88	a bug fix for Kiran; putting JIRA in for better type determination system for the new Tribble tracks so this doesn't happen again. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3318 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-07 03:31:43 +00:00
hanna	4e0019b04f	Repair code that sorts and merges intervals. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3317 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-06 22:37:25 +00:00
ebanks	0e58fb7cc0	Moved over to be a walker inside the GATK git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3313 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-06 18:28:03 +00:00
aaron	78409dca0d	turned off the progress output from tribble when making an index, and fixing a case where the index file isn't writable so we instead make the index in memory. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3312 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-06 16:36:58 +00:00
ebanks	bacc507a48	Don't worry about sorting anymore in the liftover tool. That will come later. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3311 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-06 15:00:30 +00:00
ebanks	2975e3a4e8	picard Intervals don't sort right - switching to GenomeLocs git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3308 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-06 03:50:28 +00:00
ebanks	1a99fb9318	First pass at liftover tool. Passing buck over to Aaron... git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3306 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-05 20:38:19 +00:00
aaron	a0d71540df	speed-up for VCF, adding code to the VCF reader to automagically make an index if one doesn't already exist, and a change to the VCF writer unit test git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3305 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-05 20:19:42 +00:00
aaron	6bbcc47b5d	removing some out-of-date RODs and some unused genotype writer formats git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3304 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-05 19:07:13 +00:00
aaron	a68f3b2e9c	VCF moved over to tribble. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3302 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-05 17:28:48 +00:00
ebanks	64640d6b17	Complete the switch statement to deal with all possible cigar operators for Kris. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3299 348d0f76-0448-11de-a6fe-93d51630548a	2010-05-05 13:41:05 +00:00
weisburd	8b2ce128b5	Optimized the join(..) method. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3280 348d0f76-0448-11de-a6fe-93d51630548a	2010-04-30 15:55:07 +00:00
aaron	64c5f287c5	fixes for edge-cases when using reflections to find classes outside of the main jar. Will push as a patch to reflections git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3264 348d0f76-0448-11de-a6fe-93d51630548a	2010-04-27 17:46:46 +00:00
aaron	c647153b10	Adding Jama for Ryan. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3262 348d0f76-0448-11de-a6fe-93d51630548a	2010-04-27 14:30:36 +00:00
aaron	f6468f9143	a fix for a bug we've worked around in the reflections package: previously it didn't find classes that weren't in the main jar. Fixed in this version. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3261 348d0f76-0448-11de-a6fe-93d51630548a	2010-04-27 04:49:49 +00:00
ebanks	42bcca1010	Pulling out the left-alignment code for indels so that other walkers can use it. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3251 348d0f76-0448-11de-a6fe-93d51630548a	2010-04-23 16:12:34 +00:00
aaron	536f22f3bd	adding VC adaptor for GELI, along with unit tests. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3243 348d0f76-0448-11de-a6fe-93d51630548a	2010-04-23 05:28:39 +00:00
hanna	32d86cf457	Rev the reservoir downsampler to support partitioning through a functor. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3232 348d0f76-0448-11de-a6fe-93d51630548a	2010-04-21 19:50:26 +00:00
asivache	1373fee278	Because of the ugly VCF format, generic addCall() method of GenotypeWriter interface acquired an additional parameter, explicitly specified reference base (in VCF it's the base immediately before the event in case of indels, so we got to pass it). All implementing classes are modified to accomodate the change. VCFGenotypeWriterAdapter now explicitly uses the passed reference base instead of deriving it from VatriantContext (in SNP mode as well!), other writers simply ignore that additional argument. SimpleIndelCalculationModel now WORKS (or rather, it does produce calls :) ) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3228 348d0f76-0448-11de-a6fe-93d51630548a	2010-04-21 18:19:03 +00:00
asivache	6fda78f93f	Always return deleted bases in upper case git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3218 348d0f76-0448-11de-a6fe-93d51630548a	2010-04-20 19:17:40 +00:00
asivache	52a570637d	Always keep event bases in upper case git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3217 348d0f76-0448-11de-a6fe-93d51630548a	2010-04-20 19:16:39 +00:00
aaron	80c4f88a72	removing the Variation interface. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3216 348d0f76-0448-11de-a6fe-93d51630548a	2010-04-20 18:56:45 +00:00
hanna	c1e53d407d	The copyright tag that I copied/pasted from a LaTeX document into IntelliJ had unicode quote characters embedded in it. These characters were invisible inside IntelliJ but cause compile warnings for Ryan and Aaron, who for whatever reason have a different default charset. Fixed. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3203 348d0f76-0448-11de-a6fe-93d51630548a	2010-04-20 15:26:32 +00:00
aaron	b5f6f54968	Almost done removing any trace of the old Variation and Genotype interfaces. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3202 348d0f76-0448-11de-a6fe-93d51630548a	2010-04-20 14:52:15 +00:00
hanna	1bc26f69e9	An attempt to cleanup the Utils directory. Email to follow. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3198 348d0f76-0448-11de-a6fe-93d51630548a	2010-04-19 23:00:08 +00:00
hanna	c08936d6f4	Added a reservoir downsampler which can sample elements in an iterator uniformly from a stream (see Vitter 1985). Thanks to Eric and Andrey for the pointer. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3197 348d0f76-0448-11de-a6fe-93d51630548a	2010-04-19 20:48:14 +00:00
aaron	e11ca74eb5	removing some outdated ROD classes (PooledEMSNPROD and SangerSNPROD), removing an out-of-date interface (VariantBackedByBenotype), and moving AnalyzeAnnotationWalker over to VariationContext. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3188 348d0f76-0448-11de-a6fe-93d51630548a	2010-04-16 18:59:29 +00:00
asivache	6dc1275cfb	Utility method added: getQualsInCycleOrder(read) - examines the read and returns its quals in the order the machine read them (i.e. always from cycle 1 to cycle N). Simply inverts quals if the read happens to be rc-aligned :) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3183 348d0f76-0448-11de-a6fe-93d51630548a	2010-04-16 00:15:57 +00:00
aaron	e682460c1f	add a fix so that XL arguments won't cancel out -BTI arguments, fixed a bug for Ben where the ROD -> interval list conversion was throwing an exception, and some old code removal. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3174 348d0f76-0448-11de-a6fe-93d51630548a	2010-04-15 16:31:43 +00:00
hanna	8573b0bc6f	Refactoring intervals, separating the process of parsing interval lists, sorting and merging interval lists, and creating RODs from intervals. This gives Doug the ability to keep using our interval list parsing code when sorting intervals on our behalf. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3159 348d0f76-0448-11de-a6fe-93d51630548a	2010-04-13 15:50:38 +00:00
ebanks	3f2455e346	Better error message as suggested by James P git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3141 348d0f76-0448-11de-a6fe-93d51630548a	2010-04-09 05:52:53 +00:00
aaron	12e4f88ca7	a little bit more clean-up git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3122 348d0f76-0448-11de-a6fe-93d51630548a	2010-04-05 20:49:06 +00:00
aaron	df7e7921ce	removing some unused code. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3121 348d0f76-0448-11de-a6fe-93d51630548a	2010-04-05 19:30:08 +00:00
bthomas	b4f6f54502	Reorganizing the way interval arguments are processed Most of the changes occur in GenomeAnalysisEngine.java and GenomeLocParser.java: -- parseIntervalRegion and parseGenomeLocs combined into parseIntervalArguments -- initializeIntervals modified -- some helper functions deprecated for cleanliness Includes new set of unit tests, GenomeAnalysisEngineTest.java New restrictions: -- all interval arguments are now checked to be on the reference contig -- all interval files must have one of the following extensions: .picard, .bed, .list, .intervals, .interval_list git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3106 348d0f76-0448-11de-a6fe-93d51630548a	2010-04-01 12:47:48 +00:00
aaron	c3c6e632d1	support for two new VCF header info field value-types, Flag (for fields that are just boolean truths), and Character (for single charatcer info fields). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3105 348d0f76-0448-11de-a6fe-93d51630548a	2010-04-01 03:11:32 +00:00
aaron	3d3d19a6a7	the last-mile commit for Tribble integration. The system is now ready for Tribble to be turned on, as soon as we've removed any dependencies in the ROD code on interfaces that aren't in the Tribble library (i.e. the Variation or Genotype interface on RODs). All of the walkers should be up to date. a caveat: for anyone asking for all of the ROD's back from the RefMetaDataTracker (if your not using the facilities to get the track by name), you'll now be getting back a collection of GATKFeature objects. This object will contain the track name, and a method for getting the underlying object (getUnderlyingObject()), which will be the traditional RodVCF, rodDbSNP, etc. This layer is needed so we can integrate Tribble tracks (which don't natively have names). Calls that ask for RODs by name will still get back the traditional reference ordered data objects (RodVCF, rodDbSNP, etc). Sorry for the inconvenience! More changes to come, but this is by far the largest (as has the greatest effect on end users). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3104 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-31 22:39:56 +00:00
hanna	400684542c	Revisions to take into account finalization of Picard patch: naming changes, better definition of public interfaces. This won't be the last Picard patch, but it should be the last big one. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3096 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-30 19:28:14 +00:00
hanna	85037ab13f	Fix for Kiran's sharding issue (Invalid GZIP header). General cleanup of Picard patch, including move of some of the Picard private classes we use to Picard public. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3087 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-29 03:21:27 +00:00
depristo	b8ab74a6dc	Minor useful changes to BaseUtils and MathUtils to support a new haplotype score annotation that determines to the two most likely haplotypes over an interval and scores variants by their consistency with a diploid model. Appears to be useful. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3085 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-28 21:45:22 +00:00
ebanks	47e30aba92	Rods for reads hooked up into the cleaner git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3070 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-24 18:17:56 +00:00
ebanks	49117819f5	For the cleaner to clean, it must beat the entropy produced by the aligner (and not just the raw reads). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3068 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-24 15:21:58 +00:00
aaron	a69b8555dd	Geli to variant context. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3063 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-23 06:45:29 +00:00
aaron	eafdd047f7	GLF to variant context. Added some methods in GLF to aid testing; and added a test that reads GLF, converts to VC, writes GLF and reads back to compare. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3062 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-23 03:43:25 +00:00
hanna	3767adb0bb	Processing intervals as they stream in means much lower memory usage and quicker runtime. Making change as minimal as possible to avoid conflicts with BT's incoming patch. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3061 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-22 22:04:45 +00:00
ebanks	0097106938	VariantFiltration can now filter specific samples. This is NOT an ideal implementation. One day when we have lots of free time (or a greater desire), we will implement this correctly and sophisticatedly using all the power of JEXL. For now, though, this will have to do. Docs coming tonight. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3060 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-22 20:45:11 +00:00
depristo	076d21d394	Minor bug workaround in GenotypeConcordance module (see todo). General platform read filter. You can say -rl Platform illumina to remove all SLX reads git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3054 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-22 02:47:09 +00:00
ebanks	c88a2a3027	Fixing/cleaning up the vcf merge util git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3047 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-19 15:13:32 +00:00
depristo	56092a0fc2	Slight cleanup for mathutils git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3042 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-19 13:18:08 +00:00
ebanks	03480c955c	And now the UnifiedGenotyper can officially annotate genotype (FORMAT) fields too. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3039 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-19 04:58:37 +00:00
ebanks	e757f6f078	Missing value for arbitrary format entries is empty string (need to revisit at some point, but it will require updating the VCF spec). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3038 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-19 03:56:27 +00:00
ebanks	0311980668	The VariantAnnotator can now officially annotate genotype (FORMAT) fields. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3037 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-19 03:30:14 +00:00
ebanks	ee0e833616	Some significant changes to the annotator: 1. Annotations can now be "decorated" with any arbitrary interface description - not just standard or experimental. 2. Users can now not only specify specific annotations to use, but also the interface names from #1. Any number of them can be specified, e.g. -G Standard -G Experimental -A RankSumTest. 3. These same arguments can be used with the Unified Genotyper for when it calls into the Annotator. 4. There are now two types of annotations: those that are applied to the INFO field and those that are applied to specific genotypes (the FORMAT field) in the VCF (however, I haven't implemented any of these latter annotations just yet; coming soon). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3029 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-18 05:38:32 +00:00
rpoplin	58a31bab6a	Variant optimizer now outputs VCF files via ApplyVariantClustersWalker. Documentation to be added to the wiki. It is ready to be used by other people but only with great caution. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3028 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-17 20:41:42 +00:00
hanna	d9398dc347	Remove some of the restrictions on getStart() and getStop(); getStart() and getStop() now do the minimum validation rather than the more rigorous only-within-the-contig-bounds header validation. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3027 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-17 19:39:30 +00:00
ebanks	ded4ba8966	Let's make artificial reads that actually adhere to the specs... git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3022 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-17 16:51:42 +00:00
bthomas	5b34bb9ab0	Adding three minor new features: + -L all now walks over all intervals + if a -L argument is passed with a .list extension, and file does not exist, returns a \ File Not Found error instead of "bad interval" error. We plan to soon revisit interval \ lists and generate a concrete list of filenames, so this is likely temporary. + Error is thrown if the start position on an interval is higher number than the end position. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3021 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-17 16:24:10 +00:00
ebanks	4340601c26	-Pushed base quals back down into SAMRecord; if -OQ is used, the SAMRecord quals get updated automatically -Better integration test git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3020 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-17 16:00:10 +00:00
ebanks	1fd909cdaf	Fix for Kiran: -1 is a valid value for genotype qualities in VCF, so VariantContext shouldn't die. Cleaned up the relevant VCF code while I was in there. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3015 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-17 00:20:15 +00:00
ebanks	586f87fa35	Quick fix git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3007 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-16 02:59:26 +00:00

1 2 3 4 5 ...

750 Commits (3308d956f4d59ec8716da2f712f13e7c52e286a2)