gatk-3.8

Commit Graph

Author	SHA1	Message	Date
rpoplin	92e3682991	Moved NHashMap to sting/utils git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2452 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-28 20:57:32 +00:00
ebanks	b1ac4b81d5	Optimization: look up diploid genotypes from a static matrix instead of creating them on the fly (with String.format); bases no longer need to be ordered appropriately git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2448 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-28 17:28:51 +00:00
ebanks	d2770f380c	Writing calls to standard out now works again (it got broken when we introduced parallelization) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2446 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-27 04:36:45 +00:00
ebanks	0571d9dcb9	Point MAX_QUAL_SCORE to SAMUtils.MAX_PHRED_SCORE. Also, array size for caches should be max score + 1. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2444 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-24 20:47:32 +00:00
aaron	b134e0052f	added changes to the code to allow different types of interval merging, 1: all overlapping and abutting intervals merged (ALL), 2: just overlapping, not abutting intervals (OVERLAPPING_ONLY), 3: no merging (NONE). This option is not currently allowed, it will throw an exception. Once we're more certain that unmerged lists are going to work in all cases in the GATK, we'll enable that. The command line option is --interval_merging or -im git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2437 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-23 21:59:14 +00:00
alecw	159778416c	In TableRecalibrationWalker, update UQ tag if it was present in the original SAMRecord. This required a new sam.jar, which caused some other files to need to be changed. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2435 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-23 21:42:36 +00:00
hanna	0d890e1bf0	Rework Eric's output management code given that the behavior of the UG changes drastically depending on its output format. Current implementation is probably a bit overkill-ish and we can whittle this down to what's absolutely necessary. Writing VCFs to the 'out' protected printstream may not work at this moment. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2425 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-22 00:33:43 +00:00
ebanks	cf303810d3	VCF reader now creates the correct type of header line for each header type git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2423 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-21 20:39:06 +00:00
hanna	b780ffb34a	Add a getFormat() method to get the output format from the writer. The need for this call suggests that I may be thinking about the typing of the GenotypeWriter object the wrong way. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2418 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-21 01:46:26 +00:00
hanna	11cbfcec9c	Get rid of backlink from ArgumentDefinitions to ArgumentSources. This will help in the future with multiple source -> single definition mapping sets. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2417 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-21 00:39:36 +00:00
aaron	7e0f69dab5	Changed the GLF record to store it's contig name and position in each record instead of in the Reader. Integration tests all stay the same. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2410 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-18 22:54:56 +00:00
ebanks	4ea31fd949	Pushed header initialization out of the GenotypeWriter constructors and into a writeHeader method, in preparation for parallelization. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2406 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-18 19:16:41 +00:00
ebanks	eeddf0d08e	Adding sample utils for convenience methods to pull out samples from e.g. SAMFileHeader or Genotype objects git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2405 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-18 18:51:21 +00:00
ebanks	4f59bfd513	Updates to the various GenotypeWriters to make them do simple things like write records (plus allow GLFReader to close). Adding first pass of stub and storage classes for the GenotypeWriters so that UG can be parallelizable. Not hooked up yet, so UG is unchanged. The mergeInto() code in the storage class is ugly, but it's all Tribble's fault. We can clean it up later if this whole thing works. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2400 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-18 07:20:23 +00:00
ebanks	94f5edb68a	1. Fixed VCFGenotypeRecord bug (it needs to emit fields in the order specified by the GenotypeFormatString) 2. isNoCall() added to Genotype interface so that we can distinguish between ref and no calls (all we had before was isVariant()) 3. Added Hardy-Weinberg annotation; still experimental - not working yet so don't use it. 4. Move 'output type' argument out of the UnifiedArgumentCollection and into the UnifiedGenotyper, in preparation for parallelization. 5. Improved some of the UG integration tests. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2398 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-18 04:14:14 +00:00
rpoplin	6fbf77be95	Updating the two solid_recal_mode options to also change the previous base since solid aligner prefers single color mismatch alignments over true SNP alignments. COUNT_AS_MISMATCH mode has been removed completely. The default mode is now SET_Q_ZERO. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2394 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-17 20:07:26 +00:00
ebanks	bb92e31118	Optimizations: 1. push the ReadBackedPileup filtering up into the ReadFilters for read-based filters 2. stop querying the cigar for its length (just do it once) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2381 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-16 21:39:58 +00:00
ebanks	bb312814a2	UG is now officially in the business of making good SNP calls (as opposed to being hyper-aggressive in its calls and expecting the end-user to filter). Bad/suspicious bases/reads (high mismatch rate, low MQ, low BQ, bad mates) are now filtered out by default (and not used for the annotations either), although this can all be turned off. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2373 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-16 17:28:09 +00:00
depristo	0d2a761460	Bugfix for minBaseQuality to ignore deletion reads. LocusMismatch walker now allows us to skip every nths eligable site git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2357 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-15 14:38:39 +00:00
ebanks	bf7bab754e	Made getPileupWithoutMappingQualityZeroReads() and getPileupWithoutDeletions() more efficient, per Mark's cue. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2356 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-15 04:35:21 +00:00
ebanks	874552ff75	Pull the genotype (and genotype quality) calculation out of the VCF code and into the Genotyper. [Also, enable Mark's new UG arguments] git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2355 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-15 04:29:28 +00:00
depristo	2cbc85cc7a	min mapping quality and min base quality arguments for UG git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2354 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-15 03:57:27 +00:00
depristo	1da97ebb85	Walker for calculating non-independent base errors, v1. Will be moved to somewhere not in core git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2352 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-15 02:40:15 +00:00
chartl	b42fc905e8	Added - new tests (Hapmap was re-added) Modified - Hapmap now takes a -q command to filter out variants by quality Modified - MathUtils - cumBinomialProbLog now uses BigDecimal to handle some numerical imprecisions Modified - PowerBelowFrequency - returns 0.0 if called with a negative number (can't be done from inside the walker itself, but since it's called elsewhere one can't be too careful) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2350 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-14 21:57:20 +00:00
asivache	bd7b07f3f1	added PrimitivePair.Long and a few shortcut utility methods to PrimitivePairs: add(pair), subtract(pair), assignFrom(pair) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2347 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-14 00:15:44 +00:00
ebanks	97618663ef	Refactored and generalized the VCF header info code. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2346 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-13 21:02:45 +00:00
ebanks	bd2a46ab4c	I want to move over to hpprojects tonight, so I'm checking in various changes all in one go: 1. Initial code for annotating calls with the base mismatch rate within a reference window (still needs analysis). 2. Move error checking code from rodVCF to VCFRecord. 3. More improvements to SNP Genotype callset concordance. 4. Fixed some comments in Variation/Genotype git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2341 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-13 02:52:18 +00:00
hanna	6955b5bf53	Cleanup of the doc system, and introduce Kiran's concept of a detailed summary below the specific command-line arguments for the walker. Also introduced @help.summary to override summary descriptions if required. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2337 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-12 04:04:37 +00:00
hanna	cdfe204d19	Incorporated feedback from Kiran. Use the Javadoc first sentence extraction capability to just show the first sentence from each line of Javadoc. @help.description can still be used to produce exceptionally verbose descriptions. Also increased the line width as much as I could tolerate (100 characters -> 120 characters). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2336 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-11 21:59:55 +00:00
aaron	09811b9f34	Now that we always output the VCF header, make sure that we correctly handle the situation where there are no records in the file. Added unit tests as well. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2333 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-11 19:51:05 +00:00
depristo	8f7554d44f	A few improvements to pooled concordance calcluations. Now will show you FN with the -V option. BasicGenotype now prints out a reasonable representaiton wiwth toString git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2320 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-10 23:09:10 +00:00
ebanks	2869270c11	Fixed deletion depth calculation plus mis-spelling in ReadBackedPileup method. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2315 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-10 21:11:42 +00:00
hanna	5eac510b2f	Refactor the code I gave Eric yesterday to output command line arguments. Convert it from a completely wonky solution to a slightly less wonky solution that will work in more cases. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2310 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-10 18:57:54 +00:00
ebanks	a45adadf1f	VCFGenotypeRecord already defines all the methods needed to be SampleBacked, so let's annotate it as being SampleBacked. This way, when used as a generic Genotype, sample data can be retrieved. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2305 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-10 04:16:21 +00:00
ebanks	4e54b91ce4	UG now outputs the FORMAT header fields when there's genotype data. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2294 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-09 16:31:07 +00:00
ebanks	7a76e13459	Better explanation in the exception being thrown. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2291 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-09 03:59:36 +00:00
ebanks	717eb1de96	- Depth annotation now includes MQ0 reads - Removed MQ0 annotation - Updated RMS MQ annotation to use new pileup - UG now outputs all of its arguments as key/value pairs in the header (for VCF) - Cleaned up VCFGenotypeWriterAdapter interface a bit git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2288 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-09 02:53:00 +00:00
ebanks	e8822a3fb4	Stage 3 of Variation refactoring: We are now VCF3.3 compliant. (Only a few more stages left. Sigh.) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2287 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-08 21:43:28 +00:00
hanna	9e2f831206	A bit of cleanup in preparation for Picard patch. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2286 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-08 16:09:04 +00:00
hanna	d3b78338da	Get rid of characters in the docs that aren't universally compatible with character sets used throughout the group. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2285 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-07 21:41:07 +00:00
hanna	d75d3a361a	Clean up some of the walker help output based on additional experience and feedback received. Also, add a flag to build.xml to disable generation of docs on demand (use ant -Ddisable.doc=true to disable docs). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2284 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-07 21:33:11 +00:00
hanna	a3e88c0b1c	Cleanup results of bad merge. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2281 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-07 19:30:49 +00:00
hanna	10be5a5de9	Move some files around to reflect our growing help infrastructure. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2280 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-07 19:23:12 +00:00
rpoplin	1d5b9883db	Added --solid_recal_mode argument to experiment with different ways of dealing with solid reference bias. Currently the default option is DO_NOTHING which means use the same behavior as the old recalibrator. Eventually the new methods in RecalDataManager will be moved over to a SolidUtils class. Added transition and transversion methods to BaseUtils that work like simpleComplement, used with the color space in my solid methods. Also, initial check-in of HomopolymerCovariate. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2276 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-07 14:26:27 +00:00
hanna	8089aa3c50	Adding support to override the help text. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2273 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-07 00:16:26 +00:00
ebanks	c0528cd88e	Updated the CallsetConcordance classes to use new VCF Variation code... and uncovered a whole bunch of VCF bugs in the process. I'm not convinced that I got them all, so I'll unit test like crazy when the refactoring is done. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2272 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-06 11:43:40 +00:00
ebanks	b6f8e33f4c	Stage 2 of Variation refactoring: VCFRecord now implements Variation, VCFGenotypeRecord now implements Genotype. Because of this change, RodVCF is now just a wrapper around the VCFRecord and does nothing else. Also, one can call toVariation on the VCFGenotypeRecord and it returns the VCFRecord. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2271 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-06 06:48:03 +00:00
hanna	3b440e0dbc	Add a taglet to allow users to override the display name in command-line help. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2270 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-06 04:12:10 +00:00
ebanks	08f2214f14	Stage 1 of massive Variation/Genotype refactoring. This stage consists only of the code originating in the Genotyper and flowing through to the genotype writers. I haven't finished refactoring the writers and haven't even touched the readers at all. The major changes here are that 1. Variations which are BackedByGenotypes are now correctly associated with those Genotypes 2. Genotypes which have an associated Variation can actually be associated with it (and then return it when toVariation() is called). The only integration tests which need to be updated are MSG-related (because the refactoring now made it easy for me to prevent MSG from emitting tri-allelic sites). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2269 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-06 03:12:41 +00:00
hanna	b04de77952	First pass at a reorganized walker info display. Groups walkers by package and displays walker data extracted from the JavaDoc. Needs a bit of help, both in content and flexibility of package naming. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2267 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-04 23:24:29 +00:00

1 2 3 4 5 ...

493 Commits (4617052b3c93a5e053a0fe491a8f8f60046ba1e0)