gatk-3.8

Commit Graph

Author	SHA1	Message	Date
Mark DePristo	6df96644d9	Unified, standard IndelSummary metrics for VariantEval -- Now you always get SNP and indel metrics with VariantEval! -- Includes Number of SNPs, Number of singleton SNPs, Number of Indels, Number of singleton Indels, Percent of indel sites that are multi-allelic, SNP to indel ratio, Singleton SNP to indel ratio, Indel novelty rate, 1 to 2 bp indel ratio, 1 to 3 bp indel ratio, 2 to 3 bp indel ratio, 1 and 2 to 3 bp indel ratio, Frameshift percent, Insertion to deletion ratio, Insertion to deletion ratio for 1 bp events, Number of indels in protein-coding regions labeled as frameshift, Number of indels in protein-coding regions not labeled as frameshift, Het to hom ratio for SNPs, Het to hom ratio for indels, a Histogram of indel lengths, Number of large (>10 bp) deletions, Number of large (>10 bp) insertions, Ratio of large (>10 bp) insertions to deletions -- Updated VE integration tests as appropriate	2012-03-22 21:24:37 -04:00
Mark DePristo	bcf80cc7b3	Cleanup in VariantEval. Example of molten VariantEval output -- Moved a variety of useful formatting routines for ratios, percentages, etc, into VariantEvalator.java so everyone can share. Code updated to use these routines where appropriate -- Added variantWasSingleton() to VariantEvaluator, which can be used to determine if a site, even after subsetting to specific samples, was a singleton in the original full VCF -- TableType, which used to be an interface, is now an abstract class, allowing us to implement some generally functionality and avoid duplication. -- This included creating a getRowName() function that used to be hardcoded as "row" but how can be overridden. -- #### This allows us implement molten tables, which are vastly easier to use than multi-row data sets. See IndelHistogram class (in later commit) for example of molten VE output	2012-03-22 21:24:37 -04:00
Mark DePristo	e4d49357ce	Further cleanup of R	2012-03-22 21:24:37 -04:00
Mark DePristo	503e2ea29e	Cleanup R directory	2012-03-22 21:24:37 -04:00
Mark DePristo	5725f72904	Cleanup unused python programs -- If you happen to use one of these files you can always revert it.	2012-03-22 21:24:36 -04:00
Mark DePristo	9ddd5aec93	More eval modules being removed from VariantEval -- IndelStatistics is superceded by IndelStatistics	2012-03-22 21:24:36 -04:00
Mark DePristo	bd5b6d1aba	Remove no longer in use Eval modules from VariantEval -- No more IndelLengthHistogram (superceded by IndelSummary in subsequent commit) -- No more SamplePreviousGenotypes or PhaseStats -- No more MultiallelicAFs	2012-03-22 21:24:36 -04:00
Mark DePristo	6c2290fb6e	Performance optimization for gsa.read.gatkreport.R -- instead of using y = rbind(x, y), which is O(n^2) in a loop when processing lines into a data structure in R, preallocate a matrix and explicitly assign each row to x. This results in a radical performance improvement when reading large tables into R. It's possible with this optimization to read in a 70MB table for variantQCReport.R with 200K lines for 800 samples.	2012-03-22 21:24:36 -04:00
Menachem Fromer	7faa9938b1	Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-03-22 17:43:44 -04:00
Menachem Fromer	b9b9219ac7	Added respectPhaseInInput flag to RBP and integration tests	2012-03-22 17:40:21 -04:00
Menachem Fromer	1dfaacfeb5	Check for consistency of the BAM and VCF sample names, with a command line disable to throw if you know what you are doing	2012-03-22 12:40:15 -04:00
Guillermo del Angel	58965d6a6e	Merged bug fix from Stable into Unstable	2012-03-22 11:04:11 -04:00
Mark DePristo	256e9f001e	analyzeRunReports now includes full stack traces with causes	2012-03-22 10:15:44 -04:00
Guillermo del Angel	b8cd959461	Potential corner condition bug fix: protect against null pointer exceptions when computing consensus indel bases when UG is discovering alt alleles. If an alt allele has non-standard bases, skip allele gracefully instead of adding null object into list	2012-03-22 10:06:22 -04:00
Eric Banks	8c09ff9459	Merged bug fix from Stable into Unstable	2012-03-21 12:44:43 -04:00
Eric Banks	58245bfa2f	Bug fix: check to see whether there's a BasePileup before asking for one.	2012-03-21 12:44:09 -04:00
Eric Banks	07c3bd32b3	Bug fix: merge NO_VARIATION records with those of another type. The sad part is that this WAS covered by integration tests but someone updated the MD5s without actually paying attention...	2012-03-21 12:42:13 -04:00
Eric Banks	dcf2fa361d	Minor cleanup	2012-03-21 12:14:31 -04:00
Eric Banks	ab1c48745b	Need to catch RuntimeExceptions coming out of Picard too so that they show up as UserErrors (some BAM errors are thrown as REs).	2012-03-21 12:13:52 -04:00
Ryan Poplin	9e10779fa7	Caching log calculations cut the non-Map runtime of HaplotypeCaller in half. Moved the qual log cache used in HC and PairHMM into a common place and added unit tests.	2012-03-21 08:45:42 -04:00
Mauricio Carneiro	0e93cf5297	Taking care of bad cigars in the GATK * fixed BadCigarFilter to filter out reads starting/ending in deletion and that have adjacent I/D events. * added Unit tests for BadCigarFilter * updated all exceptions in LocusIteratorByState to tell the user that he can instead run with -rf BadCigar * added the BadCigar filter to ReduceReads and RealignTargetCreator (if your walker blows up with these malformed reads, you may want to add it too)	2012-03-20 14:32:57 -04:00
Eric Banks	b290152542	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-03-20 08:56:04 -04:00
Eric Banks	5e79046c98	Minor change but I realized from Mark's commit that the code I stole it from was flawed	2012-03-20 08:55:56 -04:00
Mark DePristo	5ecfc49f74	Minor cleanup of MergeIntervalLists (example, please look) -- Note that isDone() is override to return true. This causes the GATK to cleanly stop processing early.	2012-03-20 07:49:27 -04:00
Mark DePristo	36636eb323	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-03-20 07:47:24 -04:00
Eric Banks	ade1971581	Since we allow any generic header types, there's no longer any reason to check for supported types	2012-03-20 00:12:17 -04:00
Eric Banks	4910ef86d9	Added a to-do for Khalid	2012-03-19 23:12:58 -04:00
Eric Banks	5a3afd768d	Walker to merge multiple bed/interval files into a single consensus. 'Walker' is used loosely here; there must be a better way to do this, but I don't know how within the GATK framework.	2012-03-19 22:42:48 -04:00
Eric Banks	2324c5a74f	Simplified the interface for simple VCF header lines by making the VCFSimpleHeaderLine not abstract anymore - now any arbitrary header line with an ID (e.g. the contig and ALT lines) can be part of this class without having to define new classes. Also, renamed the 'named' header line to 'id' since that's more accurate.	2012-03-19 21:29:24 -04:00
Ryan Poplin	069ccdfdd4	Fixing broken HC integration tests while changes to exact model are being formulated.	2012-03-19 16:56:51 -04:00
Mauricio Carneiro	633b5c687d	Fixing MD5's (new GATKReport header was missing from old md5's)	2012-03-19 15:28:45 -04:00
Mauricio Carneiro	9cf4df15e5	BQSR recal script (just so we can scatter-gather)	2012-03-19 15:28:45 -04:00
Khalid Shakir	875dc5ef95	Re-added non-verbose MultiallelicSummary to HSP eval.	2012-03-19 14:40:31 -04:00
Khalid Shakir	e8b083ac20	Merged bug fix from Stable into Unstable	2012-03-19 14:37:36 -04:00
Khalid Shakir	d0056d6c71	Updated HSP dbsnp from 132 to 135 along with other minor patches.	2012-03-19 14:36:38 -04:00
Roger Zurawicki	7afb333811	GATK Report code cleanup - Updated the documentation on the code - Made the table.write() method private and updated necessary files. - Added a constructor to GATKReport that takes GATKReportTables - Optimized my code Signed-off-by: Mauricio Carneiro <carneiro@broadinstitute.org>	2012-03-19 11:53:57 -04:00
Mauricio Carneiro	0d4ea30d6d	Updating the BQSR Gatherer to the new file format This is important for quick turnaround in the analysis cycle of the new covariates. Also added a dummy unit test that doesn't really test anything (disabled), but helps in debugging.	2012-03-19 09:02:27 -04:00
Mark DePristo	37d979d98d	GATK performance over time includes GATK 1.5	2012-03-18 19:49:26 -04:00
Ryan Poplin	1c67a62fc0	Updating LikelihoodCalculationEngineUnitTest	2012-03-18 16:39:58 -04:00
Ryan Poplin	943b1d34f8	intermediate commit to aid in debugging HC / exact model changes. HC integration tests will still fail	2012-03-18 15:50:27 -04:00
Ryan Poplin	c4f4d16490	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-03-18 14:27:42 -04:00
Eric Banks	9223e451a3	Merged bug fix from Stable into Unstable	2012-03-18 00:54:19 -04:00
Eric Banks	5c5d8e7cd3	Minor: cleaner way of turning off index-on-the-fly checking in case we want to turn it back on.	2012-03-18 00:53:29 -04:00
Eric Banks	344a938a70	When checking to make sure that we have cached enough data in the PL array, use the converted index value since that's what will be used as an index into the array.	2012-03-18 00:36:30 -04:00
Ryan Poplin	4f2f1cbca9	misc optimizations to the HMM code related to allocating and initializing the big state space arrays	2012-03-17 14:07:11 -04:00
Guillermo del Angel	a27a9ccba2	Merged bug fix from Stable into Unstable	2012-03-16 21:15:30 -04:00
Guillermo del Angel	a05a7f287d	TMP: disable checking of whether on the fly index is equal to index after run completed	2012-03-16 21:14:45 -04:00
Eric Banks	539d51f324	Resolving conflicts	2012-03-16 14:36:07 -04:00
Eric Banks	be9e48ba29	Merged bug fix from Stable into Unstable	2012-03-16 14:33:53 -04:00
Eric Banks	a7578e85e8	Rewriting a few of the indel integration tests for multi-allelics. The old tests were running b37 calls against a b36 reference, so the calls were all ref. The new tests are run against the pilot1 data and then those calls are fed back into the the same bam to test genotype given alleles, with a sprinkling of bi- and tri-allelics.	2012-03-16 14:21:27 -04:00

1 2 3 4 5 ...

9090 Commits (6df96644d94ccf270ba2dc7838abc61ba4498acf) All Branches Search

9090 Commits (6df96644d94ccf270ba2dc7838abc61ba4498acf)

All Branches