gatk-3.8

Commit Graph

Author	SHA1	Message	Date
Guillermo del Angel	143e92b797	Rebasing	2012-04-18 20:05:43 -04:00
Ryan Poplin	4999ae87ad	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-04-18 15:02:42 -04:00
Ryan Poplin	dcc4871468	minor misc optimizations to PairHMM	2012-04-18 15:02:26 -04:00
Eric Banks	d3c84e7b1f	This should be a User Error since it's provided from the DoC command-line arguments	2012-04-18 13:09:23 -04:00
Eric Banks	392f1903f7	Handling some of the NumberFormatExceptions seen via Tableau that are really user errors.	2012-04-18 12:57:37 -04:00
Ryan Poplin	8a84456626	Following Eric's awesome update to change the VQSR recal file into a VCF file, the ApplyRecalibration step is now scatter/gather-able and tree reducible.	2012-04-18 11:24:04 -04:00
Eric Banks	4448a3ea76	Final tweaks. Added an integration test to cover the case of SNPs and indels that start at the same position.	2012-04-17 23:54:10 -04:00
Eric Banks	c1f52b773a	Minor tweaks and updated integration tests MD5s	2012-04-17 23:17:28 -04:00
Eric Banks	6d03bce0d3	Important refactoring of the VQSR recal file format: we now use a VCF instead of a CSV file. The most important reason for this change is that we no longer need to read the entire recal file into memory up front in ApplyRecalibration. For 1000G calling this was prohibitive in terms of memory requirements. Now we go through the rod system and pull in just the records we need at a given position. As an added bonus, once BCF2 is live we can drastically cut down the sizes of these recal files (which can grow large for whole genome calling).	2012-04-17 22:38:18 -04:00
Eric Banks	ea793d8e27	Khalid pressured me into adding an integration test that makes sure we don't fail on reads with adjacent I and D events.	2012-04-17 21:21:29 -04:00
Mauricio Carneiro	46a212d8e9	Added "simplify reads" option to PrintReads.	2012-04-17 19:32:34 -04:00
Mauricio Carneiro	f0c81b59b0	Implementation of the new BQSR plotting infrastructure * removed low quality bases from the recalibration report. * refactored the Datum (Recal and Accuracy) class structure * created a new plotting csv table for optimized performance with the R script * added a datum object that carries the accuracy information (AccuracyDatum) for plotting * added mean reported quality score to all covariates * added QualityScore as a covariate for plotting purposes * added unit test to the key manager to operate with one required covariate and multiple optional covariates * integrated the plotting into BQSR (automatically generates the pdf with the recalibration tearsheet)	2012-04-17 19:23:55 -04:00
Ryan Poplin	952280bef1	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-04-17 17:00:14 -04:00
Ryan Poplin	cf705f6c62	Adding read position rank sum test to the list of annotations that get produced with the HaplotypeCaller	2012-04-17 17:00:00 -04:00
Eric Banks	13c800417e	Handle NPE in UG indel code: deletions immediately preceding insertions were not handled well in the code.	2012-04-17 15:51:23 -04:00
Khalid Shakir	91cb654791	AggregateMetrics: - By porting from jython to java now accessible to Queue via automatic extension generation. - Better handling for problematic sample names by using PicardAggregationUtils. GATKReportTable looks up keys using arrays instead of dot-separated strings, which is useful when a sample has a period in the name. CombineVariants has option to suppress the header with the command line, which is now invoked during VCF gathering. Added SelectHeaders walker for filtering headers for dbGAP submission. Generated command line for read filters now correctly prefixes the argument name as --read_filter instead of -read_filter. Latest WholeGenomePipeline. Other minor cleanup to utility methods.	2012-04-17 11:45:32 -04:00
Ryan Poplin	1a2e92f8db	Merged bug fix from Stable into Unstable	2012-04-17 10:23:05 -04:00
Ryan Poplin	adad76b36f	Fixing NPE in VQSR for the case of very small callsets.	2012-04-17 10:20:43 -04:00
Mark DePristo	3f6b2423d8	Update VE IT to reflect new fields and bugfixes	2012-04-13 17:00:37 -04:00
Mark DePristo	f9190b6fcd	VariantEvalUnitTest is better named VariantEvalWalkerUnitTest	2012-04-13 17:00:37 -04:00
Mark DePristo	23ccf772d4	IndelSummary now emits all of the underlying counts for ratios, percentages, etc it computes	2012-04-13 17:00:36 -04:00
Mark DePristo	84d1e8713a	Infrastructure for combining VariantEvaluations -- Not hooked up yet, so the output of VariantEval should be the same as before -- Implemented a VariantEvalUnitTest that tests the low level strat / eval combinatorics and counting routines -- Better docs throughout	2012-04-13 17:00:36 -04:00
Mark DePristo	38986e4240	Documentation for StratificationManager	2012-04-13 17:00:36 -04:00
Mark DePristo	ab06d53867	Useful test constructor or Unit tests in RefMetaDataTracker	2012-04-13 17:00:36 -04:00
Mark DePristo	285e61a227	Bugfix for IndelSummary -- multi allelic count should be % not ratio	2012-04-13 17:00:35 -04:00
Mark DePristo	e6d5cb46d2	Improvements and bugfixes to IndelSummary -- Now properly includes both bi and multi-allelic variants. These are actually counted as well, and emitted as counts and % of sites with multiple alleles -- Bug fix for gold standard rate	2012-04-13 17:00:35 -04:00
Mark DePristo	bfa966a4e9	Bugfix for OneBPIndel -- Previously was only including 1 bp insertions in stratification	2012-04-13 17:00:35 -04:00
Mark DePristo	2aa2d9aec0	Merged bug fix from Stable into Unstable	2012-04-13 09:25:43 -04:00
Mark DePristo	27e7e17dc7	New way to handle exceptions in multi-threaded GATK -- HMS no longer tries to grab and throw all exceptions. Exceptions are just thrown directly now. -- Proper error handling is handled by functions in HMS, which are used by ShardTraverser and TreeReducer -- Better printing of stack traces in WalkerTest	2012-04-13 09:23:33 -04:00
Mark DePristo	e85e9a8cf5	More extensive testing of type of error thrown in multi-threaded walker test -- Unfortunately the result of the multi-threaded test is non-deterministic so run the test 10x times to see if the right expection is always thrown -- Now prints the stack trace and exception message of the caught exception of the wrong type, if this occurs	2012-04-13 09:23:33 -04:00
Eric Banks	297afc7911	Added unit test to ensure that we genotype correctly cases with really large GLs	2012-04-12 15:43:14 -04:00
Eric Banks	818e8c2fb9	Resolving merge conflicts	2012-04-12 15:19:44 -04:00
Eric Banks	0dd571928d	Let's not have the indel model emit more than the max possible number of genotypable alt alleles (since we may not be able to subset down to the best ones).	2012-04-12 15:16:29 -04:00
Eric Banks	f77a6d18b8	Bad conflict merge before	2012-04-12 09:56:49 -04:00
Eric Banks	33a8bdd75f	Resolving merge conflicts	2012-04-12 09:51:55 -04:00
Eric Banks	b659b16b31	Generate User Error for bad POS value	2012-04-12 09:49:35 -04:00
Eric Banks	cc71baf691	Don't allow users to try to genotype more than the max possible value (catch and throw a User Error at startup). Better docs explaining that users shouldn't play with this value unless they know what they are doing.	2012-04-12 09:18:44 -04:00
Eric Banks	5bf9dd2def	A framework to get annotations working in the HaplotypeCaller (and ART walkers in general). Adding support for active-region-based annotation for most standard annotations. I need to discuss with Ryan what to do about tests that require offsets into the reads (since I don't have access to the offsets) like e.g. the ReadPosRankSumTest. IMPORTANT NOTE: this is still very much a dev effort and can only be accessed through private walkers (i.e. the HaplotypeCaller). The interface is in flux and so we are making no attempt at all to make it clean or to merge this with the Locus-Traversal-based annotation system. When we are satisfied that it's working properly and have settled on the proper interface, we will clean it up then.	2012-04-11 16:22:12 -04:00
Eric Banks	5b7da3831f	Not sure why this didn't make it into the last push, but here's a working MD5 for the NDA annotation in UG	2012-04-11 13:49:50 -04:00
Eric Banks	7aa654d13f	New interface for some dev work that Ryan and I are doing; only accessible from private walkers right now	2012-04-11 13:49:09 -04:00
Eric Banks	dc90508104	Adding a new annotation to UG calls: NDA = number of discovered (but not necessarily genotyped) alleles for the site. This could help downstream analysis esp. of indels for wonky sites (since we only use the top 2-3 alleles). Not enabled by default but we can change that if this turns out to be useful.	2012-04-11 13:47:10 -04:00
Eric Banks	d2142c3aa7	Adding integration test for Flag Stat	2012-04-10 22:40:38 -04:00
Eric Banks	f560611fe8	Merged bug fix from Stable into Unstable	2012-04-10 22:26:53 -04:00
Eric Banks	f46f7d0590	Fix the stats coming out of FlagStat. I will add an integration test in unstable	2012-04-10 22:26:10 -04:00
Mauricio Carneiro	cd842b650e	Optimizing DiagnoseTargets * Fixed output format to get a valid vcf * Optimzed the per sample pileup routine O(n^2) => O(n) pileup for samples * Added support to overlapping intervals * Removed expand target functionality (for now) * Removed total depth (pointless metric)	2012-04-10 17:43:59 -04:00
Ryan Poplin	1df0adf862	Fixing ActivityProfile unit test.	2012-04-10 15:28:27 -04:00
Ryan Poplin	e3cc7cc59c	Resolving merge conflict.	2012-04-10 14:50:27 -04:00
Ryan Poplin	a4634624b7	There are now three triggering options in the HaplotypeCaller. The default (mismatches, insertions, deletions, high quality soft clips), an external alleles file (from the UG for example), or extended triggers which include low quality soft clips, bad mates and unmapped mates. Added better algorithm for band pass filtering an ActivityProfile and breaking them apart when they get too big. Greatly increased the specificity of the caller by battening down the hatches on things like base quality and mapping quality thresholds for both the assembler and the likelihood function.	2012-04-10 14:48:23 -04:00
Eric Banks	10e74a71eb	We now allow arbitrary annotations other than dbSNP (e.g. HM3) to come out of the Unified Genotyper. This was already set up in the Variant Annotator Engine and was just a matter of hooking UG up to it. Added integration test to ensure correct behavior.	2012-04-10 12:30:35 -04:00
Mark DePristo	b43d21056b	Merged bug fix from Stable into Unstable	2012-04-10 09:42:09 -04:00

1 2 3 4 5 ...

1930 Commits (143e92b79790cdbed7f60b8e9ecd87f9085b3f04)