gatk-3.8

Commit Graph

Author	SHA1	Message	Date
Eric Banks	87e41c83c5	In AlleleCount stratification, check to make sure the AC (or MLEAC) is valid (i.e. not higher than number of chromosomes) and throw a User Error if it isn't. Added a test for bad AC.	2012-08-14 15:02:30 -04:00
Eric Banks	8e3774fb0e	Fixing behavior of the --regenotype argument in SelectVariants to properly run in GenotypeGivenAlleles mode. Added integration tests to cover recent SV changes.	2012-08-14 14:21:42 -04:00
Eric Banks	34b62fa092	Two changes to SelectVariants: 1) don't add DP INFO annotation if DP wasn't used in the input VCF (it was adding DP=0 previously). 2) If MLEAC or MLEAF is present in the original VCF and the number of samples decreases, remove those annotations from the VC.	2012-08-14 12:54:31 -04:00
Eric Banks	cfb994abd2	Trivial removal of ununsed variable (mentioned in resolved JIRA entry)	2012-08-13 22:55:02 -04:00
Khalid Shakir	f809f24afb	Removed SelectHeader's --include_reference_name option since the reference is always included. In SelectHeaders instead of including the path to the file, only include the name of the reference since dbGaP does not like paths in headers.	2012-08-13 16:49:27 -04:00
Mark DePristo	6ad75d2f5c	Reverting changes to BCF2 ranges -- The previously expanded ones are actually the missing values in the range. The previous ranges were correct. Removed the TODO to confirm them, as they are now officially confirmed	2012-08-13 15:06:28 -04:00
Mark DePristo	4d3fad38e9	Increase allowable range for BCF2 by -1 on low-end	2012-08-13 14:20:26 -04:00
Mark DePristo	aab417c94d	Fix missing argument in unittest	2012-08-12 13:58:14 -04:00
Mark DePristo	f032e0aba4	A bit better output for ContextCovariate context size logging	2012-08-12 13:45:52 -04:00
Mark DePristo	243af0adb1	Expanded the BQSR reporting script -- Includes header page -- Table of arguments (Arguments) -- Summary of counts (RecalData0) -- Summary of counts by qual (RecalData1) -- Fixed bug in output that resulted in covariates list always being null (updated md5s accordingly) -- BQSR.R loads all relevant libaries now, include gplots, grid, and gsalib to run correctly	2012-08-12 13:45:14 -04:00
Mark DePristo	458bbdee8f	Add useful logger.info telling us the mismatch and indel context sizes	2012-08-12 10:27:05 -04:00
Ami Levy Moonshine	6fefdaf428	"update integration tests in CombineVariantsIntegrationTest" Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-08-10 17:00:35 -04:00
Ami Levy Moonshine	4968daf0a5	update integration tests at CombineVariantsIntegrationTest	2012-08-10 16:58:05 -04:00
Eric Banks	40f0320a1c	When adding a unit test to LIBS for X and = CIGAR operators, I uncovered a bug with the implementation of the ReadBackedPileup.depthOfCoverage() method.	2012-08-10 14:58:29 -04:00
Eric Banks	eca9613356	Adding support of X and = CIGAR operators to the GATK	2012-08-10 14:54:07 -04:00
Ami Levy Moonshine	68fb04b8f7	Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable into testing	2012-08-09 16:48:22 -04:00
Mark DePristo	06258c8a01	BCF2 optimizations -- Added Write method to BCF2 types that directly converts int value to byte stream. Deleted writeRawBytes(int) -- encodeTypeDescriptor semi-inlined into encodeType so that the tests for overflow are done in just one place -- Faster implementation of determineIntegerType for int[] values	2012-08-09 16:36:18 -04:00
Mark DePristo	c6bd9b15ff	BCF2 optimizations -- BCF2Type enum has an overloaded method to read the type as an int from an input stream. This gets rid of a case statement and replaces it with just minimum tiny methods that should be better optimized. As side effect of this optimization is an overall cleaner code organization	2012-08-09 16:36:18 -04:00
Mark DePristo	9a0dda71d4	BCF2 optimizations -- All low-level reads throw IOException instead of catching it directly. This allows us to not try/catch in readByte, improving performance by 5% or so -- Optimize encodeTypeDescriptor with final variables. Avoid using Math.min instead do inline comparison -- Inlined willOverflow directly in its single use	2012-08-09 16:36:18 -04:00
Ryan Poplin	9887bc4410	Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-08-09 16:31:06 -04:00
Ryan Poplin	f4c72a26d5	A few quick, minor findbugs fixes.	2012-08-09 16:30:58 -04:00
Ryan Poplin	c7f22e410f	A few quick, minor findbugs fixes.	2012-08-09 16:22:08 -04:00
Eric Banks	def077c4e5	There's actually a subtle but important difference between foo++ and ++foo	2012-08-09 12:42:50 -04:00
Ryan Poplin	e48727dae3	Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-08-09 10:31:10 -04:00
Guillermo del Angel	5be7e0621d	Merge branch 'master' of ssh://gsa4.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-08-09 09:58:34 -04:00
Guillermo del Angel	71ee8d87b3	Rename per-sample ML allelic fractions and counts so that they don't have the same name as the per-site INFO fields, and clarify wording in VCF header	2012-08-09 09:58:20 -04:00
Eric Banks	35cec8530c	Make coverage threshold in FindCoveredIntervals a command-line argument	2012-08-08 21:44:24 -04:00
Ryan Poplin	1223d77546	Removing argument from HaplotypeCaller that was made unneccesary by recent improvements to triggering around large events	2012-08-08 15:13:20 -04:00
Eric Banks	0a2a646a52	Other random FindBugs fixes	2012-08-08 14:56:27 -04:00
Eric Banks	4c84cc9486	Quick pass of FindBugs 'should be static inner class' fixes.	2012-08-08 14:42:06 -04:00
Eric Banks	a0196c9f5b	Quick pass of FindBugs 'method invokes inefficient Number constructor' fixes.	2012-08-08 14:34:16 -04:00
Eric Banks	4b2e3cec0b	Quick pass of FindBugs 'inefficient use of keySet iterator instead of entrySet iterator' fixes for core tools.	2012-08-08 14:29:41 -04:00
Guillermo del Angel	3e2752667c	Intermediate checkin for ReducedReads with HaplotypeCaller - change min read count over k-mer to average count over k-mer when doing assembly of a reduced read (not optimal, currently trying max and then will decide on best approach), fix merge conflicts	2012-08-08 12:07:33 -04:00
David Roazen	a7811d673f	Update URL for phone home / GATK key documentation output by the GATK upon error	2012-08-08 09:29:54 -04:00
Mark DePristo	cda8d944b7	Bugfixes for BCF with VQSR -- Old version converted doubles directly from strings. New version uses VariantContext getAttributeAsDouble() that looks at the values directly to determine how to convert from Object to Double (via Double.valueOf, (Double), or (Double)(Integer)). -- getAttributeAsDouble() is now smart in converting integers to doubles as needed -- Removed unnecessary logging info in BCF2Codec -- Added integration tests to ensure that VQSR works end-to-end with BCF2 using sites version of the file khalid sent to me -- Added vqsr.bcf_test.snps.unfiltered.bcf file for this integration test	2012-08-07 17:22:39 -04:00
Mark DePristo	80b94a4f9a	AdaptiveContexts implement pruning to a given chi2 p value -- Added bonferroni corrected p-value pruning, so you tell it how significant of a different you are willing to collapse in the tree, and it prunes the tree down to this maximum threshold -- Penalty is now a phred-scaled p-value not the raw chi2 value -- Split command line arguments in VisualizeContextTree into separate arguments for each type of pruning	2012-08-07 17:22:39 -04:00
Mark DePristo	982c735c76	VisualizeAdaptiveTree now considers only leaf nodes when computing max/min penalty	2012-08-07 17:22:39 -04:00
Ryan Poplin	15085bf03e	The UnifiedGenotyper now makes use of base insertion and base deletion quality scores if they exist in the reads.	2012-08-07 13:58:22 -04:00
Guillermo del Angel	97c5ed4feb	Merge branch 'master' of ssh://gsa4.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-08-06 20:22:31 -04:00
Guillermo del Angel	238d55cb61	Fixes for running HaplotypeCaller with reduced reads: a) minor refactoring, pulled out code to compute mean representative count to ReadUtils, b) Don't use min representative count over kmer when constructing de Bruijn graph - this creates many paths with multiplicity=1 and makes us lose a lot of SNP's at edge of capture targets. Use mean instead	2012-08-06 20:22:12 -04:00
Mark DePristo	00858f16a6	Deleting empty unit test for AdaptiveContexts	2012-08-06 12:58:13 -04:00
Ryan Poplin	f1c30c3a59	Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-08-06 12:02:26 -04:00
Mark DePristo	44f160f29f	indelGOP and indelGCP are now advanced, not hidden arguments	2012-08-06 11:42:55 -04:00
Mark DePristo	2f004665fb	Fixing public -> private dep	2012-08-06 11:42:55 -04:00
Mark DePristo	7bf5ca51ee	Major bugfix for adaptive contexts -- Basically I was treating the context history in the wrong direction, effectively predicting the further bases in the context based on the closer one. Totally backward. Updated the code to build the tree in the right direction. -- Added a few more useful outputs for analysis (minPenalty and maxPenalty) -- Misc. cleanup of the code -- Overall I'm not 100% certain this is even the right way to think about the problem. Clearly this is producing a reasonable output but the sum of chi2 values over the entire tree is just enormous. Perhaps a MCMC convergence / sampling criterion would be a better way to think about this problem?	2012-08-06 11:42:55 -04:00
Mark DePristo	b4841548f1	Bug fixes and misc. improvements to running the adaptive context tools -- Better output file name defaults -- Fixed nasty bug where I included non-existant quals in the contexts to process because they showed up in the Cycle covariate -- Data is processed in qual order now, so it's easier to see progress -- Logger messages explaining where we are in the process -- When in UPDATE mode we still write out the information for an equivalent prune by depth for post analysis	2012-08-06 11:42:55 -04:00
Ryan Poplin	b8709d8c67	Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-08-06 11:41:28 -04:00
Eric Banks	210db5ec27	Update -maxAlleles argument to -maxAltAlleles to make it more accurate. The hidden GSA production -capMaxAllelesForIndels argument also gets updated.	2012-08-06 11:31:18 -04:00
Eric Banks	8f95a03bb6	Prevent NumberFormatExceptions when parsing the VCF POS field	2012-08-06 11:19:54 -04:00
Ryan Poplin	b7eec2fd0e	Bug fixes related to the changes in allele padding. If a haplotype started with an insertion it led to array index out of bounds. Haplotype allele insert function is now very simple because all alleles are treated the same way. HaplotypeUnitTest now uses a variant context instead of creating Allele objects directly.	2012-08-05 12:29:10 -04:00

1 2 3 4 5 ...

2532 Commits (db92671b7ff2fd049287a3a6304fcd3ad13fb7b2)