gatk-3.8

Commit Graph

Author	SHA1	Message	Date
kshakir	28b897d5de	Fixed O(N^2) operation when scattering interval files. Cleaned up intervals contig count function. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5768 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-05 03:32:35 +00:00
carneiro	3882d1b9c0	fixing the build \o/ git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5767 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-05 00:57:49 +00:00
kshakir	8ad547e6c2	Fixed another interval bug where dividing up N intervals into N parts wasn't working. Minor updates to the FCPTest to match the changes due to using the old indel caller. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5766 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-04 20:49:35 +00:00
hanna	5c6965575e	Some refactoring that Mauricio and I worked through together. Changed filters to extend from org.broadinstitute.sting.gatk.filters.ReadFilter rather than directly from net.sf.picard.filter.SamRecordFilter, which allows us to add an initialize(GATKEngine) method so that filters can do any initialization they'd like based on CL arguments, SAM headers, etc. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5760 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-04 19:29:08 +00:00
carneiro	b66c6dced1	- No longer prints out non confident calls (they were leading to tables that don't add up and confusing some Pacbio folk). - Added sensitivity and Specificity to the report. - With the changes in genotype likelihoods, the indel analysis only happens if the BAM file also has an extended event. Not great, but at least it's not broken. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5759 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-04 19:26:55 +00:00
carneiro	7ed8b4ddb0	Making sure CalculateLikelihoodsAndGenotypes returns an empty variant context when 'EMIT_ALL_SITES' and 'GENOTYPE_GIVEN_ALLELES' are being used, now for indels too! git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5756 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-04 18:04:56 +00:00
rpoplin	6c7a0adc76	Updating VariantGaussianMixtureModelUnitTest to use truth sensitivity cutting git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5750 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-04 13:56:01 +00:00
delangel	a19389528d	Bring back from the dead the old likelihoods model for indels, which has worse performance but is about 4x faster. Enabled with argument -GSA_PRODUCTION_ONLY in UG git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5748 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-03 22:38:33 +00:00
carneiro	e5cc0f4eec	Added 'specificity' to variant eval's Validation Report evaluator. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5742 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-03 20:48:30 +00:00
rpoplin	b88dec387c	clean up from VQSR movement git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5741 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-03 20:35:30 +00:00
rpoplin	23cd3a7a5d	Moving VQSR v2 to core. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5740 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-03 20:20:06 +00:00
rpoplin	44a717f63a	Good bye VQSR v1. This commit will break the build. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5739 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-03 20:09:52 +00:00
hanna	2dacf1b2b2	Better header support when running R's read.table(...,header=T). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5738 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-03 19:56:20 +00:00
hanna	ad8c786b2d	Now more easily R-parseable. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5737 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-03 19:30:50 +00:00
rpoplin	5bade81c6d	Adding tranche plot generation back to VQSR git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5736 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-03 19:26:26 +00:00
rpoplin	e73720c2db	Updating VQSLOD annotation description git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5735 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-03 19:01:08 +00:00
rpoplin	11052918d9	Better exception text for common error in VQSR. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5734 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-03 18:37:25 +00:00
rpoplin	4bbce42861	Renaming ContrastiveRecalibrator --> VariantRecalibrator in preparation for move to core git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5733 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-03 18:12:47 +00:00
rpoplin	6323fb8673	misc cleanup in VQSR git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5732 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-03 18:00:22 +00:00
hanna	f3bd11a02e	Dress up some formatting issues. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5731 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-03 17:35:18 +00:00
hanna	9c809ed68e	A walker to analyze the memory consumption of reference, reads, and RODs at each base both in bytes and as a percentage of the used heap size. May be a bit buggy at this point; there are a lot of metrics around the Java heap and I'm not completely sure that the metrics I'm outputting are exactly the ones that I'm looking for. Also fixed a documentation bug in my Sizeof class. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5730 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-03 17:08:15 +00:00
ebanks	d4cbd8691c	Make the default that we only output SNPs (so that when I make another release we don't get flooded with questions about why the UG is all of a sudden so slow) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5729 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-03 16:38:55 +00:00
rpoplin	70f8ab6f89	Adding AF bin stratification for VariantEval. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5728 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-03 15:22:50 +00:00
hanna	870e65a685	Fixing a build failure because I want to be completely sure that the code I checked in immediately following the build breaking code passes integration tests. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5727 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-03 02:09:53 +00:00
hanna	411980a50a	Performance enhancements in GATKBAMIndex. Not sure these will assist in a normal use case, but they cut startup times and memory allocation noise in the profiler, making my profiling time more productive. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5726 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-02 20:48:16 +00:00
delangel	422d4ceeea	removed useless file - no need for tableRecalibration, right now everything is done in PairHMMIndelErrorModel git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5725 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-02 20:35:44 +00:00
delangel	2a80ffa2ee	Totally experimental, barely useable not to be used yet implementation of an "Indel Quality Recalibrator" Idea is that any indel that's not in input dbsnp is treated as an artifact, and then a csv is built with # of indels and # of observations as a function of each input covariate (initially, only cycle, read group and homopolymer run are useful). Then, when computing likelihoods of indels based on input haplotypes we compute gap penalties based on value of covariates at read. Feature is disabled by default with hidden arguments. TBD if usefulness of feature is worth the extra time and pain. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5724 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-02 20:31:43 +00:00
rpoplin	3224bbe750	New visualization output for VQSR. It creates the R script file on the fly and then runs Rscript on it. Adding 1000G Project consensus code. First pass of having VQSR work with missing data by marginalizing over the missing dimension for that data point (thanks Chris and Bob for ideas). Updated math functions to use apache math commons instead of approximations from wikipedia. New parameters available for the priors based on further reading in Bishop and looking at the new visualizations. Updated integration test to use more modern files. Updated MDCP to use new best practices w.r.t. annotations. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5723 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-02 19:14:42 +00:00
ebanks	fcf8cff64a	We didn't actually support all of these extensions. Updated to be accurate. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5722 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-02 19:03:46 +00:00
carneiro	34092fd32f	minor update... git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5716 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-29 21:29:01 +00:00
carneiro	36ac8beee1	Making the GATK unpredictably random... through an option! set -ndrs if you want the GATK to be really random (non-deterministic). Engine option, available to every walker. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5715 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-29 19:29:08 +00:00
carneiro	f97e7d2fb4	Walker that calculates the percentage of bases that are covered to at least 20x. Very useful! In oneoffs until someone else thinks it's as useful as I think it is ;) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5714 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-29 19:19:39 +00:00
ebanks	deed7c47a1	Continuing the epic fail, some of our existing integration tests were wrong because of the lazy loading failure. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5712 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-29 17:54:41 +00:00
ebanks	ab9ffb1a74	Epic failure on the lazy loading of genotypes: if the input VCF had its samples unsorted and we used a walker that didn't require genotypes, then we would sort the samples but not load genotypes (and therefore the genotypes wouldn't match the samples anymore). Added simple integration test to cover this case. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5711 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-29 16:03:45 +00:00
hanna	96571b55be	Disable caching of ReadShards by the GenomeLocProcessingTracker (at least temporarily). Unfortunately this does not completely fix the IndelRealigner exception that Ryan is seeing, but it helps things quite a bit. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5710 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-29 13:59:34 +00:00
carneiro	a5b96e0e04	I have to remember that this is Java, not C. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5709 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-28 17:40:14 +00:00
rpoplin	b7334dcc1e	Rank sum test annotations are the Z-scores from the test instead of the p-value. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5707 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-28 14:35:00 +00:00
ebanks	45081c32d7	continuing from last night, the integration tests weren't covering the right behavior either git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5706 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-28 13:30:57 +00:00
ebanks	f34e6d5b8c	Somewhere along the way someone broke this tool and failed to update the documentation to boot. Fixing. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5705 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-28 03:16:20 +00:00
ebanks	ae8f3f2cde	Check for bad reference bases before creating simple/'empty' VCs. Updated the code in the indel GL model to be consistent and to use the existing utility in the Allele class. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5704 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-27 23:55:20 +00:00
depristo	6cce3e00f3	A test walker that does consensus compression of deep read data sets. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5702 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-27 22:00:48 +00:00
rpoplin	3907377f37	When genotyping given alleles, for multiallelic sites we go back to the reads and use the alternate base with the highest sum of quality scores instead of taking the first alternate allele from the vcf file git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5701 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-27 21:31:09 +00:00
droazen	6e9e766a71	The tighter interval validation wasn't interacting well with unmapped intervals -- altered the validation methods to not throw an error for unmapped intervals. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5700 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-27 20:56:46 +00:00
hanna	6d5e45b5c6	Revbump Picard dependencies at Tim/Kathleen's request. Exclude anonymous classes from PluginManager. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5699 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-27 20:38:05 +00:00
droazen	d650efd40a	Fix for bug GSA-449: Intervals that are not in GATK format are not validated to the same standard as GATK format intervals. Full validation against contig bounds is now performed for all intervals, regardless of their source. Also fixed a few tests for validation exclusions that were backwards. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5698 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-27 18:12:10 +00:00
kshakir	df35a143b2	Removed -debug/--debug_mode. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5697 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-27 10:56:39 +00:00
hanna	27495a0c64	Killed quiet mode. Should probably kill debugMode as well, but Queue's using it. Will check with Khalid tomorrow. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5695 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-27 04:17:36 +00:00
hanna	f3dacd3c40	Use ByteBuffer.allocateDirect() instead of ByteBuffer.allocate(). ByteBuffer.allocateDirect() behaves like Java NIO MappedByteBuffers in that it consumes address space, which counts against our virtual memory allocation; but cannot be destroyed or otherwise freed. This was definitely contributing to the LSF failures that I was seeing, but I'm not yet convinced that it's the sole source of these virtual memory 'leaks'. More tomorrow as the results of my whole exome tests start to roll in. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5693 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-27 02:01:11 +00:00
chartl	7afeb1ab17	Removing broken imports (boo) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5692 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-26 18:55:25 +00:00
rpoplin	379f837e82	RankSum z-scores are looking quite good, so RIP Wilcoxon. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5691 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-26 18:34:39 +00:00

1 2 3 4 5 ...

4562 Commits (28b897d5de5860867eb7f3a3d42648575403cd5f)