gatk-3.8

Commit Graph

Author	SHA1	Message	Date
rpoplin	cea544871d	Fixed an issue with recalibrating original quality scores above Q40. There is a new option -maxQ which sets the maximum quality score possible for when a RecalDatum tries to compute its quality score from the mismatch rate. The same option was added to AnalyzeCovariates to help with plotting q scores above Q40. Added an integration test which makes use of this new -maxQ option. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2534 348d0f76-0448-11de-a6fe-93d51630548a	2010-01-07 13:50:30 +00:00
ebanks	6c739e30e0	1. Removing an old version of the Genotype interface which is no longer being used. Needed to do this now so that the naming conflicts would cease. 2. Adding a preliminary version of the new Genotype/Allele interface (putting it into refdata/ as the VariantContext really only applies to rods) with updates to VariantContext. This is by no means complete - further updates coming tomorrow. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2533 348d0f76-0448-11de-a6fe-93d51630548a	2010-01-07 05:51:10 +00:00
depristo	7215526810	Fix to isReference() in VCFRecord. Change to VariantCounter to correctly counter only non-genotype variants, as well as update to VariantEvalWalker git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2531 348d0f76-0448-11de-a6fe-93d51630548a	2010-01-07 00:03:29 +00:00
andrewk	6c4ac9e663	Updated HapMap2VCF to use the VCFGenotypeWriterAdapter interface; fixed bug in VCFParameters that affects VariantsToVCF and HapMap2VCF when reference is lower-cased; added integration test for HapMap2VCF that checks for the lower-case issue by testing against Hg18 region that has lower-cased bases git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2530 348d0f76-0448-11de-a6fe-93d51630548a	2010-01-06 21:27:11 +00:00
depristo	8d13597a27	Temporary command-line support to enable rod walkers, if you know what you are doing this is safe. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2505 348d0f76-0448-11de-a6fe-93d51630548a	2010-01-06 12:15:36 +00:00
rpoplin	0a6bd5a270	CycleCovariate is now one-based so that 0 and -0 don't collide with each other. Solid recal modes now only change the inconsistent base and the previous base (along the direction of the read) instead of both the bases before and after. Removed estimatedNumberOfBins from the Covariate interface because it wasn't being used. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2498 348d0f76-0448-11de-a6fe-93d51630548a	2010-01-04 20:52:15 +00:00
ebanks	ed2fff13aa	-Misc improvements to VCF code -Small fix to callset concordance git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2497 348d0f76-0448-11de-a6fe-93d51630548a	2010-01-04 02:28:47 +00:00
ebanks	b668d32cf1	Updated the min mapping quality and min base quality defaults to be 10 in both cases (and updated all integration tests) as suggested by Mark. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2494 348d0f76-0448-11de-a6fe-93d51630548a	2010-01-03 21:31:04 +00:00
hanna	b6ecc9e151	Support for ad-hoc reference sequences. Also reenabled BWA/Java integration test, which was commented out and the data backing it up deleted without my knowledge. Unfortunately, since the data was deleted, I had to regenerate the data and a new md5. Hopefully the aligner output is still correct. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2493 348d0f76-0448-11de-a6fe-93d51630548a	2010-01-02 20:19:14 +00:00
asivache	ad549eacfd	Now that we changed how deletions are represented, got to update MD5... git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2491 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-30 22:00:58 +00:00
asivache	9c41ac252f	Disable testSingleBPFailure - getReferenceContext() now whould agree to accept length > 1 genome locs as its argument, so there's nothing to test... git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2486 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-30 21:12:00 +00:00
asivache	4aeb50c87d	Added: integration test for extended pileup (with indels included) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2481 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-29 23:02:23 +00:00
rpoplin	96c4929b3c	Recalibrator now uses NestedHashMap instead of NHashMap. The keys are now nested hash maps instead of Lists of Comparables. These results in a big speed up (thanks Tim!). There is still a little bit of clean up to do, but everything works now. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2474 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-29 21:01:32 +00:00
depristo	7826e144a1	forgot to update md5s git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2473 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-29 20:31:29 +00:00
depristo	87e863b48d	Removed used routines in duputils; duplicatequals to archive; docs for new duplicate traversal code; general code cleanup; bug fixes for combineduplicates; integration tests for combine duplicates walker git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2468 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-29 19:46:29 +00:00
ebanks	5fdf17fccb	Removed the VCF "NS" annotation (which wasn't working for pooled calls anyways) since it's ambiguous and not useful. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2465 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-29 17:30:47 +00:00
aaron	a34c2442c0	moved hard-coded file paths to the oneKGLocation, validationDataLocation, and seqLocation variables setup in the BaseTest. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2460 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-29 07:40:48 +00:00
depristo	9d263b2565	Integration tests for count duplicates walker validated on a TCGA hybrid capture lane. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2459 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-28 23:57:25 +00:00
depristo	fcc80e8632	Completely rewritten duplicate traversal, more free of bugs, with integration tests for count duplicates walker validated on a TCGA hybrid capture lane. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2458 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-28 23:56:49 +00:00
hanna	4617052b3c	For Alec, and others at the Broad who want to run our unit/integration tests off of gsa1/gsa2: put a ceiling on the amount of memory that integration tests can use. Reduce the memory footprint of the fasta reader test. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2457 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-28 23:42:46 +00:00
alecw	b5e5e27225	New versions of picard-private, sam and picard jars for TileCovariate and regeneration of NM tag git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2456 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-28 22:18:55 +00:00
rpoplin	562db45fa5	Sites that were marked NO_DINUC no longer get dinuc-corrected but are still recalibrated using the other available covariates. Solid cycle is now the same as Illumina cycle pending an analysis that looks at the effect of PrimerRoundCovariate. Solid color space methods cleaned up to reduce number of calls to read.getAttribute(). Polished NHashMap sort method in preparation for move to core/utils. Added additional plots in AnalyzeCovariates to look at reported quality as a function of the covariate. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2451 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-28 20:19:37 +00:00
ebanks	12990c5e7a	Added qual-by-depth annotation git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2445 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-25 02:30:30 +00:00
ebanks	438d21842a	The new recalibrator had been mimicking the behavior of the old one in that if there was no dinuc available (following a no-call base or at either end of a read), it didn't try to recalibrate. Now that Ryan has modularized the system, we no longer need to skip the base completely (we just need to skip the dinuc value)... which is good because the Picard people complained after realizing that cycle #1 never got recalibrated. The major effects of this commit are as follows: 1. We no longer skip any good bases (of course, this change alone breaks every single integration test). 2. The dinuc covariate returns a "no dinuc" value for the first base of a read (but not for the last base anymore, since there is a valid dinuc) or if the previous base is a bad base (e.g. 'N'). I've done a bunch of testing on real data and everything looks right; however, let's wait until the recalibrator guru gets back from vacation next week and can double-check everything before shipping this out in another early access release. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2443 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-24 20:41:29 +00:00
ebanks	6df40876a3	Un-reverted Matt's previous changes and fixed integration tests. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2441 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-24 02:47:00 +00:00
hanna	2bd0b1bbf7	After further review, it's unclear that my patch in RecalDataManager was the right choice. Reverting. Also updating other IntervalCleanerIntegrationTest failures that were masked by my first patch. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2440 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-24 00:32:33 +00:00
hanna	98c268483e	Fixed issues with the integration tests: 1) sam-jdk apparently no longer supports custom tags with type int[] values. 2) BAM output for indel cleaner integration test changed in a way that's so subtle it can't be seen after converting the output to .sam. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2439 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-23 23:12:22 +00:00
aaron	b134e0052f	added changes to the code to allow different types of interval merging, 1: all overlapping and abutting intervals merged (ALL), 2: just overlapping, not abutting intervals (OVERLAPPING_ONLY), 3: no merging (NONE). This option is not currently allowed, it will throw an exception. Once we're more certain that unmerged lists are going to work in all cases in the GATK, we'll enable that. The command line option is --interval_merging or -im git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2437 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-23 21:59:14 +00:00
ebanks	770093a40e	Oops - forgot to check this one in. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2433 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-23 19:53:28 +00:00
ebanks	dc96879861	2 separate changes which both affect lots of UG integration md5s, so I'm committing them together: 1. allele balance annotation is now weighted by genotype quality (so we don't get misled by borderline het calls) 2. Updates to the Unified Genotyper for parallelization: a. verbose writing now works again; arg was moved from UAC to UG b. UG checks for command that don't work with parallelization c. some cleanup git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2432 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-23 19:03:56 +00:00
ebanks	872a9d1c7b	I'm making this change now (as opposed to waiting until Monday) to honor Tim's request. The cycle covariate is now first/second of pair aware. I'm taking it on faith from both Chris Hartl (waiting on slides from him) and Tim that this is the right thing to do. We'll have Ryan confirm it all next week. The only change is that if a read is the second of a pair, we multiple the cycle by -1 (a simple way of separating its index from that of its mate). Of course, this broke all integration tests. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2431 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-23 16:26:43 +00:00
ebanks	cf303810d3	VCF reader now creates the correct type of header line for each header type git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2423 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-21 20:39:06 +00:00
ebanks	87e5a41964	Fixed a bug that accounted for a bunch of my remaining mis-cleaned indels. Also, slightly optimized the cleaner by using readBases (instead of readString) and caching cigar element lengths. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2419 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-21 05:46:16 +00:00
hanna	9e53c06328	First revision of command-line argument support for GenotypeWriter. Also, fixed the damn build. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2416 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-20 19:19:23 +00:00
aaron	7e0f69dab5	Changed the GLF record to store it's contig name and position in each record instead of in the Reader. Integration tests all stay the same. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2410 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-18 22:54:56 +00:00
hanna	80b3eb85fa	Fixed curiously epic failure in read-backed pileup: size() mismatched the numReads-numDeletions at that locus in the case where includeReadsWithDeletionsAtLoci == false, causing failures including bad output from pileup walker. Also fixed up ValidatingPileup to run with the new ReadBackedPileup instead of just compiling successfully. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2409 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-18 22:52:44 +00:00
rpoplin	fdf542c214	The CycleCovariate for 454 data is now the TACG flow cycle. That is, each flow grabs all the T's, A's, C's, and G's in order in a single cycle. This is changed from incrementing the cycle whenever there is a discontinuous nucleotide along the direction of the read. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2408 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-18 22:39:51 +00:00
ebanks	4ea31fd949	Pushed header initialization out of the GenotypeWriter constructors and into a writeHeader method, in preparation for parallelization. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2406 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-18 19:16:41 +00:00
ebanks	1cde4161b7	Fixed another test git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2399 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-18 05:05:03 +00:00
ebanks	94f5edb68a	1. Fixed VCFGenotypeRecord bug (it needs to emit fields in the order specified by the GenotypeFormatString) 2. isNoCall() added to Genotype interface so that we can distinguish between ref and no calls (all we had before was isVariant()) 3. Added Hardy-Weinberg annotation; still experimental - not working yet so don't use it. 4. Move 'output type' argument out of the UnifiedArgumentCollection and into the UnifiedGenotyper, in preparation for parallelization. 5. Improved some of the UG integration tests. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2398 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-18 04:14:14 +00:00
rpoplin	6fbf77be95	Updating the two solid_recal_mode options to also change the previous base since solid aligner prefers single color mismatch alignments over true SNP alignments. COUNT_AS_MISMATCH mode has been removed completely. The default mode is now SET_Q_ZERO. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2394 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-17 20:07:26 +00:00
hanna	07f1859290	Added integration test for running the recalibrator with no index. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2393 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-17 19:10:53 +00:00
ebanks	c75ec67f84	When called as a standalone, VariantAnnotator now emits samples in sorted (as opposed to random) order in VCFs. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2392 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-17 19:01:08 +00:00
hanna	b863fffdf6	Fix git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2390 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-17 17:55:00 +00:00
asivache	e6cc7dab26	fixing md5 sum; new version of IndelIntervalWalker does the right thing... git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2388 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-17 01:04:13 +00:00
ebanks	b626fc0684	Joint Estimate is now the default calculation model. Reworked all of the integration tests so that they're now more comprehensive, cover more of what we wan to test, and don't take forever to run. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2376 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-16 19:41:02 +00:00
ebanks	bb312814a2	UG is now officially in the business of making good SNP calls (as opposed to being hyper-aggressive in its calls and expecting the end-user to filter). Bad/suspicious bases/reads (high mismatch rate, low MQ, low BQ, bad mates) are now filtered out by default (and not used for the annotations either), although this can all be turned off. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2373 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-16 17:28:09 +00:00
ebanks	874552ff75	Pull the genotype (and genotype quality) calculation out of the VCF code and into the Genotyper. [Also, enable Mark's new UG arguments] git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2355 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-15 04:29:28 +00:00
chartl	1389ac6bdf	Hurrr -- this uses power as part of its output. Changes to the power calculation broke the md5s RIGHT AFTER I HAD FIXED THEM arghflrg. Will fix again. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2351 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-14 22:42:50 +00:00
chartl	b42fc905e8	Added - new tests (Hapmap was re-added) Modified - Hapmap now takes a -q command to filter out variants by quality Modified - MathUtils - cumBinomialProbLog now uses BigDecimal to handle some numerical imprecisions Modified - PowerBelowFrequency - returns 0.0 if called with a negative number (can't be done from inside the walker itself, but since it's called elsewhere one can't be too careful) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2350 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-14 21:57:20 +00:00

1 2 3 4 5 ...

374 Commits (cea544871d01d7bb937d8a3a74afc541c5877579)