gatk-3.8

Commit Graph

Author	SHA1	Message	Date
ebanks	25fb53e7a2	Oops, forgot to call toLowerCase(). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4097 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-24 14:43:24 +00:00
ebanks	7957b60768	We now automatically compress the output VCF if the file suffix is one of the supported types (.gz, .bz, .bz2). You can still specify -bzip if you want to use another file suffix (or pipe it to sdout for some reason). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4096 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-24 14:39:59 +00:00
rpoplin	7a8b6b87da	Committing Michael Yourshaw's patch for AnalyzeCovariates. We spawn each RScript process and wait for it to finish in series. Thanks Michael! git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4095 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-24 13:06:25 +00:00
ebanks	9fb151f417	Minor update git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4094 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-24 05:17:10 +00:00
ebanks	44f3c5639a	I have finally figured out that when you volunteer to do something in group meeting, you keep getting pestered about it on Mark's Omniplan doc until it gets done (except for contig aliasing, of course). As such... We can now emit bzipped VCFs from the GATK. Details: any walker that defines a VCFWriter for its @Output (i.e. pretty much every core walker from UG and on), also has associated with it the -bzip (--bzip_compression) boolean argument. When set, it will emit a VCF that is compressed with bzip2. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4093 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-24 04:14:50 +00:00
hanna	691333f75c	Force isRequired() to be false for @Deprecated args. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4092 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-23 23:50:30 +00:00
hanna	5d6a6420a9	New behavior for filling it output streams: if required==true for a field and the field is an output stream, we'll automatically create it and point it to stdout. Otherwise, we'll leave it empty. I think about it like this: marking a field 'required' indicates to the GATK that the walker author requires a value for this field, and if the GATK can provide one without end user intervention, it will. Maybe this is hackish. We'll try it and see. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4091 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-23 23:39:13 +00:00
ebanks	90aef66ec5	Minor fixes for my last commit git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4090 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-23 23:25:29 +00:00
ebanks	ef795825fd	Yet more argument consistency updates git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4089 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-23 20:52:30 +00:00
aaron	7474afa7a3	allow other objects access to the static method that resolves bam lists, and some renaming and improved documentation for the function. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4087 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-23 18:52:00 +00:00
ebanks	ccda4f6ec1	More output consistency changes (updating wiki docs as I go along). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4086 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-23 18:46:08 +00:00
ebanks	c9c6ff49c2	Deprecated 'O' in favor of 'o' in the cleaner git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4085 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-23 18:09:24 +00:00
ebanks	55a8306a0d	Update the @RMD tags to look for VariantContext.class instead of ReferenceOrderedDatum.class. Since the test for rod type is broken this won't affect anything right now. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4084 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-23 17:49:37 +00:00
aaron	35b9883dd6	vcfwriter is in tribble now git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4083 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-23 17:01:04 +00:00
aaron	2d3b6d89dc	adding the ability in Tribble to create indexes from a stream of features, so that we can create multiple indexes from one pass of the file. In the GATK we now create multiple indexes, and choose the most appropriate based on feature density, and the longest feature in the file. Also: - Converted Tribble to TestNG; it has better features and is about 6x faster. - As much code clean-up as I could get done. More to do, especially in the example code. - Moved asserts in the code to throw exceptions. - Added getBinSize to the index interface; both indexes already implemented this. - Removed the abstract parts of the indexCreator interface; this is now more simple. - Added an IndexType enumeration; might be overkill but it is at least a single point of entry for index information. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4082 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-23 06:54:59 +00:00
kiran	295472bf69	Simple change to handle a no-call (must avoid asking for the second allele, which will be be null in this case). Also, added a hack to deal with input VCFs where there are no genotype likelihoods (needed in order to process Hapmap and 1KG VCFs). In this mode, called genotypes are assigned a likelihood of 0.96, and alternative genotypes are given 0.02 each. I know Beagle actually takes genotype data without likelihoods, so this might not be the right way to do this. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4081 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-23 05:13:09 +00:00
kiran	dec713a184	Simple test code from Steve Schaffner to compute R^2 and D'. This is just for educational purposes. Don't use this code for anything, ever! git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4080 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-23 05:06:16 +00:00
hanna	c177801d81	Add deprecated command-line arguments, and switched over UG to output to -o/--out instead of -varout. Let's watch as our intrepid support engineer gracefully responds to all the incoming questions of the form: "the GATK told me to use -o instead of -varout. What do I do?" git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4078 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-22 21:01:44 +00:00
hanna	b80cf7d1d9	Modifications to the output system for better interaction with @Output. Multiplexed arguments. More details in the Monday meeting. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4077 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-22 14:27:05 +00:00
ebanks	30a104228a	Don't require entropy reduction when cleaning only at known sites; instead we need to trust the known indels. This will improve consistency between lane-level and aggregated cleaning. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4076 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-22 02:44:38 +00:00
depristo	b6989289fc	Potential bug fix for bad references where some codons may have Ns git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4075 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-21 12:09:33 +00:00
kiran	121b4f23b6	Simple change to allow a list of samples or regular expressions to be provided in a text file (one line per sample). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4074 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-21 00:01:48 +00:00
ebanks	165dc6d3b0	Ryan, what did you decide about supporting this tool? Is it still useful? git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4073 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-20 19:16:14 +00:00
ebanks	2ef2f1b24a	Fix UG's simple indel calculation model so that deletions are created correctly git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4072 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-20 15:35:47 +00:00
fromer	1c4784999a	Updated to work exclusively in log10 space git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4069 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-19 21:31:07 +00:00
fromer	3af4e618cc	Fixed precision issues with PQ (phasing quality) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4068 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-19 20:34:47 +00:00
kshakir	88ca1fb22c	Lazy loading reflections so Queue can hack the classpath before the PluginManager looks for classes. Removed extra quotes from 'cd' pre-exec command. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4067 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-19 20:29:52 +00:00
aaron	63ada20da5	allow RefSeq files to optionally contain the header line, which is the default output from the UCSC table browser git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4065 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-19 20:25:37 +00:00
fromer	effeedf1a3	Updated Bayesian phasing method to output per-site phasing statistics (and to not cap PQ at 40) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4064 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-19 19:55:47 +00:00
aaron	04e5b28f6d	updates for VCF; we can no longer cache genotypes or alleles in a static array, this is bad for sharred memory parallel runs. One instance per codec was better for performance than using ThreadLocal code. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4063 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-19 19:34:44 +00:00
corin	8054b6b295	Changing a name of a column for variantevals output for easier reading by R--let me know if this needs to be updated elsewhere; it's just a space to an underscore. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4062 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-19 19:18:16 +00:00
ebanks	4b94f8c21b	Silly me, I forgot to check for the contig boundaries. Thank goodness for performance tests! git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4061 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-19 18:40:26 +00:00
aaron	f16bb1e830	fix for a bug in package utils. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4060 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-19 15:01:50 +00:00
fromer	15c5aa6e48	Efficient iteration over all possible combinations of variable assignments, for variables of arbitrary cardinalities git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4059 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-19 14:14:37 +00:00
ebanks	1ec305cd15	Fix for running the cleaner at the lane-level for known indels only: instead of relying on the reads to get the reference sequence, we now use an IndexedFastaSequenceFile in all cases and pad the reference with bases on either end. This allows us to deal with cases in which we are trying to clean just a single deletion-containing read with tiny LOD (so the read needs to be pushed off the seen reference; @Reference doesn't yet work for Read Walkers) and has the added benefit of allowing us now to get much larger known indels that aren't completely covered with reads. Thanks to Matt for the advice. Also, for Guillermo: while I was at it, I changed the .stats debug output to emit the original interval instead of the cleaned region. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4058 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-19 11:31:13 +00:00
ebanks	98f7679619	Fixed the bug reported on GS regarding a clipped read that got moved several hundred bases away. The code that got triggered here was written back in the original version of the cleaner and it never actually did the right thing. While I was fixing it, I noticed that we weren't allowing the cleaner to un-clean reads with indels when they're wrong even though we should. Hypothetically, that should rarely happen: only when we can left-align out an indel or when the original mapper really went haywire. This situation is rare enough that I'm calling logger.info to let the user know it's happening and suggesting that they double-check that everything looks right with their reads. Better to be extra-cautious now that the cleaner is moving into the 1kg and Broad production pipelines soon . Mark, have no fear: this was truly a rare edge case - one that won't affect the cleaning stats. There is no need to re-clean the data processing paper bams! git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4057 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-19 01:42:48 +00:00
aaron	3dc4d3c3a9	removing the custom reflections library from the libs, and adding a release version. Hopefully this will fix the problem Menachem has been seeing with random JVM crashes. Also removed the auto-deletion of the reflections jar, and removed the very old OmniPlan document we had checked-in. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4056 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-19 00:42:37 +00:00
fromer	1336ea17a3	quality-scored-based Bayesian phasing algorithm implemented git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4055 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-18 21:17:46 +00:00
fromer	553bda4e0e	PreciseNonNegativeDouble permits precise arithmetic operations on NON-NEGATIVE double values git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4054 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-18 21:10:58 +00:00
rpoplin	8f15b2ba72	Memory optimization for the VariantRecalibrator. Only add variants to the list if they pass the novelty and qual filters. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4051 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-17 21:57:28 +00:00
kshakir	b7c60b9729	Queue now uses its own version instead of the gatk version. Added a Queue release directory. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4050 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-17 19:34:23 +00:00
aaron	c1df293feb	remove testing code from tribble track builder, set the command line program in walker test to null to reclaim memory in integration tests, and removed some orphaned intergration tests. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4046 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-16 23:52:01 +00:00
rpoplin	578e7fa36d	Don't output -0 as qual value in VariantRecalibrator. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4044 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-16 16:47:58 +00:00
kiran	3d63302b70	Deprecated. Use SelectVariants instead. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4043 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-16 15:07:50 +00:00
depristo	20db00a3e8	Lazy reference loading; the engine doesn't fetch the reference bases until you actually call ref.getBases(). With the new hidden --dontUpdateUG to table recalibrator this is 2-3x faster than before. Enabled for locus, read, and rod walkers. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4042 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-16 13:46:22 +00:00
aaron	9ab647b730	adding checks to the RefSeq rod for line's that contain less than the required number of columns (we expect there to be 16 columns) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4041 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-16 13:34:32 +00:00
aaron	b23545fafa	re-enable the check for up-to-date versions in the Tribble index. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4039 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-16 12:47:58 +00:00
ebanks	37586d3a43	Don't exception out when bad aligners emit wonky alignments; instead, just don't clean git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4038 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-16 02:36:04 +00:00
depristo	a36951f11a	@output and @input arguments for table recalibration for use with Q git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4037 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-14 18:36:28 +00:00
depristo	61064d7075	GenotypeConcordance log file -- if provided, GC module will write FN/FP information to this file by context git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4036 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-14 18:35:57 +00:00
depristo	0d209d5442	Nicer printing out of clustering git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4035 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-14 16:02:13 +00:00
kshakir	307c8ca027	Created a new playground script for cleaning bams in Firehose. Some refactoring of Queue extensions for reusability in scripts. Putting the extensions into the Queue.jar after building them. More updates to GATK walker arguments specifying @Input and @Output for Queue. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4032 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-13 23:52:24 +00:00
fromer	dfe2922b5e	First working version of statistical haplotype phaser git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4031 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-13 21:29:45 +00:00
ebanks	f36c0ed613	Stop building obsolete VCFTools and CGUtilities git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4030 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-13 19:28:36 +00:00
rpoplin	222f61df87	Bug fix for damoskow in TableRecalibration. Shouldn't try to update the reference mismatch rate tag for an unmapped read. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4028 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-13 18:57:07 +00:00
kshakir	80a70ccf03	Repopulating rodsToSamples. Code reviewed by Eric. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4027 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-13 17:07:18 +00:00
hanna	cb144734c0	Getting rid of GenotypeWriter interface. Of note: - GATKVCFWriter deleted, to be replaced if absolutely necessary when VCF writing goes into Tribble. - VCFWriter is now an interface, for easier redirection. - VCFWriterImpl fleshes out the VCFWriter interface. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4026 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-13 16:33:22 +00:00
kshakir	542d394e09	Cleaning up Queue debugging output. -l DEBUG with local programs now prints out the stdout/stderr of the programs as they are run. More documentation in the examples with a new even simpler CountReads example. Took out unused option to build Queue GATK extensions separately. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4025 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-13 15:54:08 +00:00
chartl	49a3db9dfe	A brief implementation of a QD calculation that is not quite so bimodal for known variants (multiplicatively penalizes QD by (n variant samples)/(n variant alleles) ). Not sure how helpful this will be (which is why it is in oneoffs). Seems nice on MCKD1, but I'm still playing with the optimization. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4024 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-13 15:42:37 +00:00
chartl	c6a8fba922	Occasionally if a JEXL expression results in no variants being captured (like "QD > 20.0" on filtered variants) the per-sample mapping from samples to eval objects can be empty. This semi-hacky fix prevents null pointer exceptions in setting up the resulting empty table (by jumping straight to it in this case) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4023 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-13 15:37:45 +00:00
ebanks	f874e548aa	Shame on us. FlagStat used ints instead of longs, so we ended up getting negative read counts git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4022 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-13 03:00:57 +00:00
kshakir	f39dce1082	Exposed CommandLineFunction defaults to the Queue.jar command line (see -help). Added ability to skip up-to-date jobs where the outputs are older than the inputs. Changed -T CountDuplicates --quiet to --quietLocus so that Queue GATK extensions can use both short and full argument names. Short names can be used to set values on Queue GATK extensions, for example: vf.XL :+= myFile Moved Hidden from the GATK to StingUtils. Updated ivy from 2.0.0 to 2.2.0-rc1 to fix sha1 issue: http://bit.ly/aX72w7 Added Queue to javadoc and testing build targets. Added first Queue unit test. Another pass at avoiding cycles in the DAG thanks to all function I/O being files. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4017 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-11 21:58:26 +00:00
chartl	8c08f47923	1) Make sure that the table size is set correctly in finalize() 2) Make sure variants are biallelic before asking for isTransversion() git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4016 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-11 20:32:22 +00:00
hanna	41d57b7139	Massive cleanup of read filtering. - Eliminate reduncancy of filter application. - Track filter metrics per-shard to facitate per merging. - Flatten counting iterator hierarchy for easier debugging. - Rename Reads class to ReadProperties and track it outside of the Sting iterators. Note: because shards are currently tied so closely to reads and not the merged triplet of <reads,ref,RODs>, the metrics classes are managed by the SAMDataSource when they should be managed by something more general. For now, we're hacking the reads data source to manage the metrics; in the future, something more general should manage the metrics classes. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4015 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-11 20:17:11 +00:00
ebanks	7385cce494	Useful tool for calculating the perentage of misaligned reads at homozygous non-ref indel sites git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4013 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-11 17:57:44 +00:00
ebanks	cc9e6b4ad9	Moved into Tribble to be with VC git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4012 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-11 17:14:32 +00:00
aaron	14e492fa80	fix for a problem in readNextRecord() of BFS, where we'd go looking for the next record far into in the next contig because (f.getEnd() >= start) was never true once we cycled to a new conitg. Added a check for contig identity. Also, removed duplicate HW calculation classes in the GATK and Tribble. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4011 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-11 17:01:38 +00:00
flannick	cd4cd6db81	Added option to print out discordant sites in GenotypeConcordance git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4006 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-10 19:55:19 +00:00
flannick	18fc5c8c3e	Initial implementation of annotator to compute allele balance for each sample git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4005 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-10 19:40:17 +00:00
flannick	1dc373b9d0	Initial implementation of evaluator to compute popgen theta statistics git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4004 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-10 19:36:34 +00:00
aaron	0a8ebcb4f9	moving tests over from the GATK to Tribble, and added a speed-up to the readNextRecord() that Mark suggested. Also removed the contained flag from the queries to Tribble in the GATK. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4003 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-10 17:54:59 +00:00
ebanks	3ff6e3404e	Alleles are now returned in a consistent order, so we can deal with tri-allelic sites git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4002 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-10 15:21:10 +00:00
ebanks	ca5b274f16	Unit, integration, and performance tests are all busted, so this is a good time to make a big commit... Major cleanup of the genotype writer code from the calling end. UG no longer supports making calls in anything but VCF, and that allows us to use the VCFWriter more generically now. Putting the ball in Matt's court to finish collapsing everything. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3996 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-10 04:18:29 +00:00
ebanks	419a36f74c	Starting the clean up of the sting.utils.genotype code which is all either moving to Tribble, moving to sting.utils.vcf, or being removed. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3994 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-10 02:16:05 +00:00
depristo	2a4a4b0aab	VariantRecalibrator now calls plot_Tranches directly so it works on the farm git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3993 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-09 23:17:16 +00:00
depristo	c2c0c1f57c	Removing used --enable_overlap_filters argument; Eric assures me this won't break the currently broken tests git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3992 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-09 22:27:13 +00:00
aaron	0f29f2ae3f	fixes for the Tree index, and some small clean-up in the GATK. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3991 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-09 20:41:50 +00:00
rpoplin	3eee3183fd	Checking in the tiger team changes. LOD calculation modified. -qScale is back in case people need it. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3990 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-09 20:41:03 +00:00
ebanks	0eeb659aa3	Useful utility function to print out the Allele as a String since toString prints out * for refs. It was annoying to keep seeing new String(Allele.getBases()). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3989 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-09 20:35:56 +00:00
chartl	d0ecb8875a	Added - a class to count functional annotations by sample (currently for the MAF annotation strings, soon to be migrated to genomic annotator once it is up and running) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3988 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-09 20:09:13 +00:00
aaron	5b0b9e79ba	protect against nulls git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3987 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-09 19:21:39 +00:00
depristo	8944800f60	Minor refactoring for Ryan git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3986 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-09 18:05:23 +00:00
kshakir	4f51a02dea	Changed logging level to default at INFO instead of WARN. Changes to StingUtils command line for use in Queue, replacing Queue's use of property files. Updates to walkers used in existing QScripts to add @Input/@Output. RMD used in @Required/@Allows now has a new default equal to "any" type. New QueueGATKExtensions.jar generator for auto wrapping walkers as Queue CommandLineFunctions. Added hooks to modify the functions that perform the Scattering and Gathering (setting their jar files, other arguments, etc.) Removed dependency on BroadCore by porting LSF job submitter to scala. Ivy now pulls down module dependencies from maven. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3984 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-09 16:42:48 +00:00
aaron	30178c05c5	providing a way to specify how you'd like -BTI combined with your -L options; set BTIMR to either UNION (default) or INTERSECTION. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3983 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-09 14:00:52 +00:00
hanna	6b4a1e3b9f	Reenabling code that was commented out after it was confirmed to work by many participating in this thread: http://getsatisfaction.com/gsa/topics/error_thrown_when_reading_reference_file git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3981 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-09 00:12:09 +00:00
kiran	48e311a5ea	Added copyright notice. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3980 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-08 07:11:51 +00:00
kiran	9aa70d9c7c	Replaced by SelectVariants git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3979 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-08 07:07:42 +00:00
kiran	758ab428f5	Better logging info for the samples being selected and the sample expressions being ignored. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3978 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-08 07:03:37 +00:00
ebanks	637a1e5055	Updating to use the new VA interface git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3975 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-08 05:31:01 +00:00
ebanks	bd6d5a8d51	Adding command-line header to VA and VF git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3974 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-08 05:21:15 +00:00
kiran	64446f0ddf	Avoid NaNs in the final output. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3973 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-08 05:16:52 +00:00
ebanks	3f6e44dc71	Updated recalibrator and cleaner to output full command-lines in the bam header git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3972 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-08 04:39:18 +00:00
kiran	0da0dfa1da	Cosmetic change - lower-case for all command-line arguments' short names. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3971 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-08 04:12:01 +00:00
kiran	eb1bb94d1c	Moved the evaluation of the JEXL expressions to a point after the samples are subset and the INFO-field annotations are updated. I think this makes more sense than having the evaluations happen beforehand, since it seems jarring to have the JEXL expressions operate on the annotations before they're updated, and have the file contain the annotations after they're updated. Now, selecting on something like allele frequency will actually apply to the annotations that actually end up in the file, while selection on other annotations (which are carried over without modification) will act exactly the same regardless. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3970 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-08 04:09:02 +00:00
ebanks	594b7912f1	Added a generic method for returning the complete command-line used when calling a walker, to be used in the bam/vcf headers. As requested, every possible engine/walker argument is included. I've added it to the Unified Genotyper output, so people can try it out and let me know what they think. Something that needs to be discussed in group meeting: what happens when we merge VCFs? Do we keep all of the command-lines? git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3969 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-08 03:53:07 +00:00
kiran	6e389059cf	An improved version of VariantSubset and VariantSelect, meant to replace those walkers. Takes in a VCF and creates a subsetted VCF by sample(s), JEXL expressions, or both. When subsetting by sample, the -SN argument is treated as a literal sample name and, if no match is found, as a regular expression. This allows for a large number of samples to be selected at once (useful when, for instance, cases are given one sample name prefix and controls are given another). After the subsetting procedure, the INFO-field annotations AC, AN, AF, and DP are all recalculated to properly reflect the new contents of the VCF. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3968 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-08 02:57:06 +00:00
ebanks	ac4699a650	Re-enabling this test git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3962 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-06 20:20:37 +00:00
depristo	f275041b1c	-minimalVCF for CombineVariants. Work around for broken locking code. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3960 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-06 16:10:59 +00:00
ebanks	341e752c6c	1) AlleleBalance is no longer a standard annotation, but the Allelic Depth (AD) is for each sample. 2) Small fixes in the VCFWriter: a) Trailing missing values weren't being removed if their count was > 1 (e.g. ".,.") b) We were handling key values that were Lists, but not Arrays. We now handle both. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3956 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-06 12:05:14 +00:00
aaron	c68625f055	Fixes from Mark for the MutableContexts; this fixes the clearGenotypes() and the clearFilters() methods, and adds a method to clear the attributes. Also added is a method for creating a variant context where the attribute list is pruned to a specific subset, which can be null. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3955 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-05 22:39:51 +00:00
aaron	72ae81c6de	VariantContext has now moved over to Tribble, and the VCF4 parser is now the only VCF parser in town. Other changes include: - Tribble is included directly in the GATK repo; those who have access to commit to Tribble can now directly commit from the GATK directory from Intellij; command line users can commit from inside the tribble directory. - Hapmap ROD now in Tribble; all mentions have been switched over. - VariantContext does not know about GenomeLoc; use VariantContextUtils.getLocation(VariantContext vc) to get a genome loc. - VariantContext.getSNPSubstitutionType is now in VariantContextUtils. - This does not include the checked-in project files for Intellij; still running into issues with changes to the iml files being marked as changes by SVN I'll send out an email to GSAMembers with some more details. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3954 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-05 18:47:53 +00:00
fromer	b21f90aee0	Added preliminary framework for performing short-range phasing (ReadBackedPhasingWalker.java) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3953 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-05 14:56:34 +00:00
rpoplin	a8d37da10b	Checking in everyone's changes to the variant recalibrator. We now calculate the variant quality score as a LOD score between the true and false hypothesis. Allele Count prior is changed to be (1 - 0.5^ac). Known prior breaks out HapMap sites git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3952 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-05 14:12:19 +00:00
ebanks	07addf1187	Fix for Kiran: since the Variant Annotator will re-annotate on top of existing annotations it makes sense to remove old headers if they conflict with the definitions being added by VA. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3951 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-05 06:44:39 +00:00
ebanks	1539791a04	Fix for Kiran: when using VCFs for the comp tracks in the Annotator(s), don't put the headers from them into the output VCF. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3950 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-05 04:45:47 +00:00
ebanks	227c4b10f0	Bug fix for Chris: convert comp tracks to VC so that we can respect the filter field. Added an integration test to cover this. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3949 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-05 04:13:16 +00:00
ebanks	84ca2f27bb	Bug fix for Chris: added method createPotentiallyInvalidGenomeLoc() to the GenomeLocParser that doesn't check that the contig exists in the sequence dictionary. This is crucial for lifting over from one reference to another, as sometimes contigs names change in the liftover (e.g. chrM to MT). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3948 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-05 03:19:02 +00:00
ebanks	f247cbf68e	I want to be the first to use the new super-cool Hidden annotation! No more telling people not to use the cleaner debugging options. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3947 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-05 02:44:37 +00:00
hanna	78bfe6ac48	Added @Hidden annotation, a way to deliberately exclude experimental fields and walkers from the help system. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3946 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-05 02:26:46 +00:00
chartl	82d6c5073b	A simple read strand filter for potluri on get satisfaction git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3945 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-04 23:23:50 +00:00
asivache	d53d5ffbf6	A utility class that computes running average and standard deviation for a stream of numbers it is being fed with. Updates mean/stddev on the fly and does not cache the observations, so it uses no memory and also should be stable against overflow/loss of precision. Simple unit test is also provided (does not stress-test the engine with millions of numbers though). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3944 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-04 21:39:02 +00:00
ebanks	8d8acc9fae	Moving G's MyHapScore to replace the old HapScore git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3943 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-04 21:00:54 +00:00
ebanks	7858ffec32	Spit out the error in the warning message so that Sendu can tell me what his problem is git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3942 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-04 20:40:28 +00:00
delangel	86211b74e8	Bug fix: when padding alleles in creating a Variant context from an indel, leave no-call alleles as no-call alleles. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3940 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-04 19:51:10 +00:00
chartl	38e65f6e1b	Added: A VariantEval module that gives simple metrics by sample, an an abstract class that makes per-sample modules easy to write (but a little bit clunky since a class needs be defined for each data point -- see SimpleMetricsBySample as an example). AnalysisModuleScanner needed a slight update to pull in data points from parent classes for this to work (thanks Khalid for showing me how to do this). After a code review with Aaron (thanks) and ensuring integration tests pass, I am committing. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3939 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-04 19:37:39 +00:00
hanna	f13d52e427	Attempt to determine whether underlying filesystem supports file locking and disable on-the-fly dict and fai generation. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3938 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-04 19:28:27 +00:00
asivache	a47824d680	A couple of type specific implementations of a single extend() method: takes an array (byte[] or short[] currently) and "extends" it to the left or to the right by the specified number of elements. Returns newly allocated array, with the content of original array copied in (if we extend by n elements to the left, then the returned array will have n default-filled elements followed by the content of the old array). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3932 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-04 15:30:48 +00:00
asivache	012a7cf0a5	mismatchCount now has a version that counts mismatches only along a part of the read (takes additional args start_on_read and length_on_read to specify the read's subsequence to be interrogated); isMateUnmapped() convenience shortcut method added. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3931 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-04 15:27:35 +00:00
delangel	e6e8a20a1e	1) Fix MyHaplotypeScore to ignore 454 reads, since all those pathological non-existing indels make some sites' score blow up. If a site is only covered by 454 reads, we (hopefully) detect this graciously and just emit a score of 0.0 for the site. 2) New annotation SByDepth = log10(-StrandBias/Depth) (non-standard annotation, key name = "SBD"). If StrandBias/Depth happens to be positive (very rare but can happen), annotation gets value=-1000. 3) Abstracted out new class AnnotationByDepth so that QD and SBD can share code. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3930 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-04 15:23:08 +00:00
ebanks	bf60ed0b25	Needed it here too: warn user instead of dying if the R script cannot be executed git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3929 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-04 13:11:27 +00:00
ebanks	40ffe34686	Warn user instead of dying if the R script cannot be executed git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3928 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-04 13:08:15 +00:00
ebanks	17d5e89734	Now --list annotates which modules are Standard git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3927 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-03 21:00:37 +00:00
ebanks	72875cf717	Removing annoying printouts git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3926 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-03 19:55:00 +00:00
ebanks	2307bed742	VariantEval now uses the "standard" modules only by default. You can add other modules with the -E argument and not use all of the standard ones with -noStandard (they can be added back individually with -E). Generalized some of the packaging code from VariantAnnotator. Matt might want to take a look to make this nicer...? git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3925 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-03 16:51:10 +00:00
ebanks	a7ff9caf54	Added sanity check against bad people and/or crazy big indels at edges of ref context git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3918 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-03 05:37:17 +00:00
hanna	5f1b67c1de	Coping out and forcing the entire GATK (and associated JVM) to use US English locale. Method to force JVM into proper locale exists in CommandLineProgram and is disabled by default, but implementers of CommandLineProgram can opt in to the forced US locale by calling a static method. Question for the VCF developers: I removed the code to explicitly output doubles in US locale. Do you / how do you want to handle this in applications that use Tribble outside the GATK? git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3917 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-03 03:48:26 +00:00
chartl	2bc69572cb	Make transcript2info capable of handling b37/hg19 contigs git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3915 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-02 17:32:08 +00:00
depristo	c203e0fb02	Added JEXL support for hetCount, homRefCount, and homVarCount in VCs. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3914 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-02 12:24:11 +00:00
depristo	7fab5c0a8f	support for -singleton_fp_rate arguments to variant recalibrator instead of the pop.gen. AF prior. Worth experimenting with Ryan. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3913 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-31 21:17:47 +00:00
ebanks	6d91cd587e	Be explicitly clear about which options are for debugging purposes only and shouldn't be used if your username is not ebanks@broad. If only we had a @hidden annotation option for args... git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3909 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-30 14:18:31 +00:00
depristo	ac8048f17b	Support for automated selects for tranches in variant eval -- use -tf to make tranch-specific ve outputs. ApplyVariantCuts with tranche reading functions for general use, along with todo for ryan. CombineVariants now has --filteredAreUncalled and will treat filtered snps in input VCFs are uncalled, and so won't emit -filteredInOther set features git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3908 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-30 14:16:43 +00:00
chartl	9231d13252	Minor modification: adding an argument to make slightly more general. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3907 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-30 05:20:20 +00:00
chartl	db54d63fc7	Hahaha yes, ownage. This now works. BTW, Eric, thanks for forwarding the DepthOfCoverage thread to gsamembers. I'd forgotten about reduce by interval. Mighty helpful in this case! git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3906 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-30 04:23:02 +00:00
chartl	3e3f8c7692	Simple count intervals walker, as per my recent email to GSAMembers. Never use this. It doesn't behave the way you think it does. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3905 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-30 03:39:23 +00:00
delangel	ba1a330293	Corrected location and made more explicit the error message thrown if someone tries to read a VCF 3.3 file with indels, which is not supported. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3901 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-29 20:02:47 +00:00
delangel	e1a34685fd	Add back MyHaplotypeScore as a new implementation for HaplotypeScore, this time as a non-standard annotation. Implementaiton is also better, it computes better consensus haplotypes, ranks them by sum of quality score. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3890 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-27 21:23:19 +00:00
hanna	6c93b13428	A Java sizeof, implemented using the Java instrumentation API. Can either get the memory consumed either only by a single object or by a single object and all the references it contains. Requires a command-line change to add a Java agent to the command-line; see the Sizeof.java javadoc for details. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3889 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-27 18:44:15 +00:00
rpoplin	f5566a6593	Knocking out some quick findBugs. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3887 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-27 14:10:59 +00:00
delangel	894623858d	OK, bad idea to add new temporary annotation - revert to keep integration tests hapy. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3886 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-27 12:07:13 +00:00
delangel	71bfb1ee35	First redesign of HaplotypeScore - now, a different approach is taken to build possible haplotypes at a site: first, all possible haplotypes consistent with reads are formed (reference is not used). After this list has been formed, it is ranked according to the number of reads that are consistent with it and the two most popular haplotypes are chosen. this reduces to the old method in typical cases, but it builds haplotypes correctly if there are two variants close by within a context window. Annotation is temporarily named MyHaplotypeScore so it can be run in parallel with old one, soon it will be renamed after some more testing. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3885 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-27 10:54:56 +00:00
delangel	cffebcc867	Small utility walker used for production of the Beagle data processing paper section. Walker will print out to output file, for every site common to a reference vcf and an eval vcf, a given sample's depth, hapmap AC and AF and pre/post Beagle genotype as well as corresponding reference (e.g. Hapmap) genotype. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3884 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-27 03:00:17 +00:00
ebanks	1d9ed1e214	Cleanup of old VCFRecord code git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3883 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-27 02:56:47 +00:00
ebanks	7dd55fbf13	Archiving git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3882 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-27 02:47:18 +00:00
aaron	9667942e52	fix for Ryan's issue: we also need to sync when we store a resource. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3881 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-26 22:17:47 +00:00
hanna	8b072b59e2	Returning index dumping functionality in BAMFileStat to a useable state. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3880 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-26 20:03:50 +00:00
depristo	19ad44d332	Minor improvements to CombineVariants to handle the complex case from Chris. IntegrationTest of complex case. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3876 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-25 13:46:11 +00:00
ebanks	7c5a3836db	Trivial changes git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3875 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-25 04:00:47 +00:00
ebanks	56de475f11	Based on feedback from non-GSA users, who claim that our exceptions are 'scary and overwhelming,' I've cleaned up the error message to first describe the error and what users should do and then ask them to copy the subsequent stack trace into their GetSatisfaction posting. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3874 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-25 03:57:44 +00:00
ebanks	9bd8a2685b	Because the performance tests were busted on LSF, no one caught this error until now: when Matt changed over the contract for the AlignmentContext, this line needed to get updated too. All is well now. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3873 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-25 02:53:01 +00:00
depristo	b551eaf8fd	Actually commit the code that makes variant eval run in a reasonable amount of time. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3872 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-24 17:32:03 +00:00
depristo	b0b37c3476	No handles (I believe) reference only VCs correctly git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3871 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-23 23:09:23 +00:00
depristo	e21376219d	Updates to CombineVariants for Tim. -setKey can be null. Integrationtests for -setKey foo and -setKey null. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3870 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-23 22:35:52 +00:00
delangel	4fc1db7aaf	Change interface to VCFWriter add() method to take only 1 byte from reference (since that's the only thing it needs), to prevent bugs like having people call it with ref.addBases() which is wrong (since it provides bases starting from the left of reference context window). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3868 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-23 20:24:03 +00:00
aaron	b3fd145161	fix for a bug deep in the tribble indexing: if you had a single record in the first contig, the second contig's index blocks would point to the wrong file seek location, and you'd see no features in that contig. Thanks to Mark for finding this. I'm not rev'ing the index version (which would cause all indexes to be rebuilt), since this seems like a pretty rare edge case. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3865 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-23 18:39:55 +00:00
depristo	33090629ea	VariantEval can now see the EvaluationContext group objects, so they can decide if/when to print interesting sites. GenotypeConcordance has a hard-coded option to print FNs that is on the way to being generally useful. VCFWriter now uses the US locale for formatting floating point numbers; I believe this fixes a long-standing annoyance. Italian guys will check on this. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3864 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-23 17:16:50 +00:00
delangel	5eef15cfdf	a) Bad bug fix to CombineVariants: when indels were being merged, the reference base provided was wrong - ref.getBases()[0] was being used, but this returns bease at start of window. Instead, the reference at current locus should be used. b) Cosmetic change to Beagle annotation description. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3861 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-23 15:13:47 +00:00
ebanks	4ff8b8fc0e	1. Fixing a bug that Mark found where indel-containing clipped reads would get an original cigar tag even when they didn't actually get modified. 2. Added some useful logging messages. 3. Added a oneoffs walker to calculate the number of realigned reads and intervals containing them. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3860 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-23 14:24:01 +00:00
chartl	973934f769	Depth of coverage now uses longs rather than ints. We can now successfully run on the Lepidosiren paradoxa genome. (about 80 GB) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3859 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-23 14:14:12 +00:00
depristo	536399eaa0	Improvements to variant combine. Now calculates AC/AN/AF correctly by calling into the VariantAnnotator engine. Automatically removes annotations that are inconsistent across incoming VCs (in simpleMerge). TODO bug fix for Guillermo/Eric. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3858 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-23 13:33:11 +00:00
aaron	9579aace1f	updates to code dependent on Tribble, as well as the following Tribble changes: - makes writing to disk optional for indexes using the indexCreator classes (allow the user to specify the index file, if null don't write it) - removed some system.out debugging code - fixed version checking in interval tree - made indexes store and return a LinkedHashSet for sequence names (to ensure they've preserved the ordering in the file) - index creators now read the file before creating the index - changed the Index.write() method to take a LEDataStream instead of a file - removed the sequence dictionary code on the header - added utils for getting LEDataStreams - added a base Tribble exception git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3857 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-23 01:56:10 +00:00
ebanks	c5325b03be	1) Removed hard-coded strings. Please let's use the fields defined in VCFConstants. 2) General code cleanup. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3856 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-23 01:49:47 +00:00
hanna	e9d243babb	More improvements to exception handling during multithreaded runs based on a bug reported by Ryan. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3855 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-22 22:13:01 +00:00
hanna	83798225ac	Repackaged datasource-specific command-line tools into their own package. Added a tag renamer tool. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3854 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-22 19:50:34 +00:00
asivache	485023ba8e	this.intersect(that) method added to GenomeLoc (returns intersection of two intervals or dies if the locations do not overlap) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3852 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-22 16:00:30 +00:00
asivache	3308d956f4	Added utility shortcut method: getOriginalQualsInCycleOrder(read) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3851 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-22 15:44:25 +00:00
delangel	473ec91633	a) Bug fix in VCFHeader parsing - Info fields were not being parsed properly, with the result that the Count field was not being properly displayed in records (e.g. if Count=0 for a particular field, the INFO tag was still being displayed as ...;Field=x;... instead of ...;Field;... b) Bug fixes and update to how we represent indels and other complex events in a VariantContext object. Convention is now that all events are left aligned, with the first variant context location marking the common base before an event occurs. However, alleles in a VC don't have the common base in all VC's. Two new functions are now part of VariantContextUtils: CreateVariantContextWithPaddedAlleles and CreateVariantContextWithTrimmedAlleles. Both take a VC as an input and create a VC as an output. Main flow is that a VCF reader would create a VC with trimmed alleles, all walkers would ideally work with these trimmed alleles, and then the VCF writer would pad back the alleles before writing. However, there are special cases where we need to pad alleles like for example when merging/combining VC's. Pending issues: - PED and DBSNP RODs have to be updated to create VC's for indels following the convention above. Changes will go in after Tribble location is moved and things are tested. - Need to verify Indel genotyper and other modules that create VC's with indels.- Wiki page describing convention above and how walkers should interpret indel VC's still needs updating/detailing. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3850 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-22 02:36:45 +00:00
chartl	b696c3ea98	No more traversal reduce results. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3849 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-21 18:34:54 +00:00
chartl	365b42390d	Support for generating (very basic) wiggle files for use with IGV (see UCSC for wiggle spec); and a walker to take in a variant track and create a transition transversion rate track for the whole genome (due to the wiggle spec, this has to be done by chromosome). It's interesting to see the effect of genes! git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3848 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-21 18:04:30 +00:00
depristo	f7957bc7f2	Fixed memory leak in VariantEval git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3845 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-21 12:35:46 +00:00
aaron	1cba81c16f	updates to tribble with fixes for some bugs I've found in some new indexing code. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3842 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-20 22:08:04 +00:00
ebanks	c6ad26e04f	1) When quals/GQs are really integers (x.00), strip off the floating points. 2) Keep track of whether vcf records are unfiltered vs. pass filters in the variant context so we can regenerate the records on output. 3) No more "ID" hard-coded all over the code to set the VariantContext ID. Use a static variable instead. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3840 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-20 18:01:45 +00:00
ebanks	0db7fab1a9	Fixing genotype filtering for VF and adding integration tests git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3839 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-20 07:30:21 +00:00
aaron	0108517b98	updating the Tribble track loading code to use the new shared locks, updated lots of new tests, add infrastructure for the TreeInterval, and removed the old locking class. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3837 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-20 07:08:10 +00:00
ebanks	f742980864	1. Refactoring of GenoypeWriters so that parallelization now works again with VCF4.0. We now have just a single reference to the old VCF classes, and that one will be purged soon. 2. Moved Jared's VCFTool code into archive so that everything would compile. 3. Added the vcf reference base (needed for indels) as an attribute to the VariantContext from the reader. 4. TribbleRMDTrackBuilderUnitTest was complaining that a validation file didn'r exist, so I commented it out. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3835 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-20 06:16:45 +00:00
depristo	c47a5ff5ab	Official parallel CountCovariates, passes all integration tests. Now poster-child example of parallelism in GATK (Matt H). Apparent general performance improvements throughout too. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3833 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-19 22:13:18 +00:00
rpoplin	0b56003d1a	Remove stray commented out line git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3832 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-19 19:14:39 +00:00
rpoplin	8e31c01680	Solid processing in base quality recalibrator now has several options for how to handle no calls in the color space. --ignore_nocall_colorspace is removed and replace by --solid_nocall_strategy. Fixed some of the @Deprecated tags in BaseUtils. LocusWalkers now filter out FailsVendorQualityCheck reads. HLA caller integration test bam file had bad vendor reads so its integration test changed. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3831 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-19 19:10:29 +00:00
aaron	18b0114e25	remove FixBAMSortOrder walker. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3830 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-19 17:27:23 +00:00
aaron	f4cfb0f990	The first step in integrating Jim's tree based index scheme: - changed to a better method for getting headers from Codecs - some removal of old commented out code in the GATKAgrumentCollection - changes for the rename of FeatureReader to FeatureSource - removed the old Beagle ROD - cleaned up some of the code in SampleUtils git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3826 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-19 04:49:27 +00:00
hanna	40a963541d	Uniquify the registered MXBean by adding an instanceNumber=... tag to the ObjectName. In the Queue-enabled future, we might want to come up with GUIDs (or at least semi-unique IDs) so that we could use JMX to track runtime attributes for multiple jobs running simultaneously. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3825 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-19 00:58:54 +00:00
depristo	7c42e6994f	FindBugs fixes throughout the code base git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3823 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-18 16:29:59 +00:00
ebanks	693672a461	Refactoring the VCF writer code; now no longer uses VCFRecord or any of its related classes, instead writing directly to the writer. Integration tests pass, but some are actually broken and will be fixed this week. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3822 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-18 13:19:56 +00:00
ebanks	982947d328	update to deal with partial indels (I/D with no bases) in the HM records git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3820 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-18 02:56:37 +00:00
depristo	414ec6f20a	Removing version argument constructors that shouldn't be used. Temporary allow -- with global variant to indicate this should be removed -- header records without description fields. Real error checking in the headers. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3818 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-17 22:30:08 +00:00
depristo	14b21e487b	always 4.0 git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3817 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-17 22:28:48 +00:00
depristo	d40299840c	indenting clean up git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3816 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-17 22:28:28 +00:00
hanna	9207c58b8f	A fix for the integration test I broke on Friday on my way out the door -- some workflows using AlignmentContext were working with it in a way I didn't expect and wound up treating extended pileups as base pileups. I'll work to make sure the AlignmentContext interface is crystal clear. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3815 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-17 22:22:44 +00:00
delangel	55b756f1cc	First step in major cleanup/redo of VCF functionality. Specifically, now: a) VCF track name can work again with 3.3 or 4.0 VCF's when specifying -B name,VCF,file. Code will read header and parse automatically the version. b) Old VCF codec is deprecated. Reader goes now direct from parsing VCF lines into producing VariantContext objects, with no intermediate VCF records. If anyone can't resist the urge to still input files using the old method, a new VCF3Codec is in place with the old code, but it will be eventually deleted. c) VCF headers and VCF info fields no longer keep track of the version. They are parsed into an internal representation and will be output only in VCF4.0 format. d) As a consequence, the existing GATK bug where files are produced with VCF4 body but VCF3.3 headers is solved. e) Several VCF 4.0 writer bugs are now solved. f) Integration test MD5's are changed, mostly because of corrected VCF4.0 headers and because validation data mostly uses now VCF4.0. g) Several VCF files in the ValidationData/ directory have been converted to VCF 4.0 format. I kept the old versions, and the new versions have a .vcf4 extension. Pending issues: a) We are still not dealing with indels consistently or correctly when representing them. This will be a second part of the changes. b) The VCF writer doesn't use VCFRecord but it does still use a lot of leftovers like VCFGenotypeEncoding, VCFGenotypeRecord, etc. This needs to be simplified and cleaned. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3813 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-16 22:49:16 +00:00
chartl	75bea4881a	Modified SampleFilter to allow for multiple samples to be given. AminoAcidTransition now turns on when you give VariantEval the right commands. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3812 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-16 21:27:32 +00:00
hanna	96034aee0e	Cleanup for Steve Hershman's issue. In the midst of doing this, I discovered that the semantics for which reads are in an extended event pileup are not clear at this point. Eric and I have planned a future clarification for this and the two of us will discuss who will implement this clarification and when it'll happen. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3809 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-16 18:57:58 +00:00
asivache	6aedede7f3	Added Type.MNP to allowed variant context types; this does not break the tests (yet) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3808 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-16 15:50:25 +00:00
asivache	1dd8a28a5d	Added new query: isMNP(feature); returns true if dbsnp feature is multi-nucleotide polymorfism (e.g. a di-nuc TA ->CC) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3806 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-16 15:32:10 +00:00
depristo	b29eda83bb	Parallelized CountCovarites! percent_ref_called_var now a standard genotype concordance module (for validation!). Really much smarter merging of headers for combineVariants. VCF codecs now actually look at the file version and blow up if they are the wrong versions. setHeaderVersion() in VCFHeaderLine. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3802 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-16 14:10:18 +00:00
ebanks	f293eb7de1	Fix for Kim: for some ungodly reason, I was initializing the bins that were maintaining counts to 1 instead of 0. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3801 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-16 03:40:29 +00:00
ebanks	e7e58d7129	The SAM spec has now officially reserved my new tags for original cigar and original alignment start... except that OS has been named OP ('original POS') git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3800 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-16 00:09:36 +00:00
ebanks	ab84ed8c68	Fix for Mark: get rid of old program tags whose IDs clash with the recalibrator/realigner tag (including if the id has a .1 at the end, etc.). Keeping them around is dangerous because we don't know which one refers to the latest run of the tool on the bam. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3798 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-15 19:13:50 +00:00
hanna	dfddf8fd75	- Bring the PaperGenotyper up to code. - Remove some old debugging cruft regarding handling of threaded engine exceptions. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3796 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-14 22:31:21 +00:00
bthomas	f65cba6b9a	Adding support for shared file locking via a new class for file locking, FSLockWithShared. This will eventually take over for FSLock, the current file locking class - I'll work with Aaron to merge the tribble code that uses FSLock right now. FYI: creating an exclusive lock on a file that does not exist will create that file as an empty file, and will NOT delete that file after the program terminates. So watch out if it's possible that the file you're locking does not exist - could end up leaving extra files that confuse users. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3795 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-14 20:45:51 +00:00
hanna	a8caa20378	Previously the hierarchical microscheduler defensively coded around and reported exceptions of the walker itself, but didn't do a great job of catching framework exceptions. This became extremely unfortunate in the case where walkers caused exceptions that manifested themselves in the framework, such as when the walker opens more files than file handles are available. Reworked the exception handling so that framework errors are treated like walker errors and the resulting exception bubbles out of the walker. Stack traces for threaded walkers are still convoluted and nasty. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3794 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-14 20:34:43 +00:00
ebanks	bf384f48e1	Reverting previous change because it won't always work. More investigation needed. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3793 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-14 19:13:17 +00:00

... 2 3 4 5 6 ...

3379 Commits (205fc0b63664531e264f8da0ad76a8d41eb9c8f5)