gatk-3.8

Commit Graph

Author	SHA1	Message	Date
rpoplin	3534f412c9	Better error message for the case of input variants found in ApplyRecalibration that were never seen during VariantRecalibrator. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5979 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-13 14:45:28 +00:00
rpoplin	6231bba288	Bug fix for mergeInfoWithMaxAC git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5978 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-12 20:10:16 +00:00
ebanks	1f4469976e	Made into UserException with better error message git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5977 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-12 03:38:52 +00:00
rpoplin	0d6ce91614	When running CombineVariants with -mergeInfoWithMaxAC the set field will be added appropriately git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5974 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-10 14:35:48 +00:00
delangel	f8ffda6835	a) Hidden, experimental argument to UnifiedGenotyper that makes code, when in GenotypeGivenAlleles mode, ignore SNP alleles mixed in with indels in complex records - theory is that SNP sites behave statistically differently when doing VQSR so those alleles/sites should be treated separately. b) Bug fix: multiallelic indel records where not being treated properly by VQSR because vc.isIndel() returns false with them. Correct general treatment for now is to do (vc.isIndel()\|\|vc.isMixed()). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5973 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-09 19:19:23 +00:00
rpoplin	17e17d3c3c	Misc cleanup in VQSR. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5972 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-09 18:37:37 +00:00
depristo	ac3620839c	Very basic intergration tests for ReducedReads, to allow safe optimization of the code git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5970 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-09 17:06:32 +00:00
rpoplin	895e86c544	Annotations used to build the 1000G consensus callsets are now standard annotations git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5969 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-09 17:03:39 +00:00
depristo	93d6e17762	Final, documented version of CalibrateGenotypeLikelihoods. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5966 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-08 20:22:28 +00:00
depristo	44287ea8dc	ReducedBAM changes to downsample to a fixed coverage over the variable regions. Evaluation script now has filters and eval. commands. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5965 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-08 19:36:08 +00:00
kiran	49b021d435	Changed the definition of degeneracy (it's at the site level - degeneracy of a position in a codon, not degeneracy of the amino acid itself like I initially thought. Added the ability to supply an ancestral allele track (available in /humgen/gsa-hpprojects/GATK/data/Ancestor/). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5963 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-08 15:07:31 +00:00
depristo	a331e13721	Slightly more extensive test includes a 0/0 site to genotype git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5961 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-08 14:48:55 +00:00
depristo	0f43b10c39	Optimization in CombineVariants when merging into a sites_only VCF VariantContextUtils now was a utility function that creates a sitesOnlyVariantContext from an input VC Add complex merge test of SNPs and indels from the new batch merge wiki in : http://www.broadinstitute.org/gsa/wiki/index.php/Merging_batched_call_sets with multiple alleles for an indel. Created a BatchMergeIntegrationTest that uses GGA with the complex merged input alleles to genotype SNPs and Indels with multiple alleles simultaneously in NA12878. Looks great. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5959 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-08 14:31:46 +00:00
delangel	1d6486a28f	First part of fix for correctly processing mixed multi-allelic records: correctly compute start/stop of vc when there are no null alleles (i.e. record is not a simple indel). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5958 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-08 13:36:18 +00:00
delangel	d27800e07c	a) Forgot to commit this ages ago: uncomment code to ignore hard clipped bases when computing indel likelihoods. b) First part of fix for correctly processing mixed multi-allelic records: correctly compute start/stop of vc when there are no null alleles (i.e. record is not a simple indel). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5957 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-08 11:28:17 +00:00
hanna	ad97099df6	Getting rid of a few extra, very explicit qualifications so that the public/ private bifurcation script doesn't have to discover them. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5956 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-08 03:08:47 +00:00
ebanks	bb6c0db783	We found the cause of the inconsistency. Woo hoo! git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5955 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-07 15:13:58 +00:00
hanna	ca48ea78df	At Picard team's request, generate md5s for generated BAM files. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5954 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-07 04:25:40 +00:00
depristo	311dfa0998	Now builds examples, as I expected. GATKPaperGenotyper lives again. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5953 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-07 00:13:44 +00:00
alecw	2901abf070	Switch from PriorityQueue to TreeSet for better and more consistent performance. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5952 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-06 20:41:30 +00:00
ebanks	2c57721ed2	Updated printouts to help with debugging. Issue does appear to be deterministic though. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5950 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-06 01:04:07 +00:00
ebanks	27dfb53d26	We really don't want to be advising the user to use an unsafe option - really, they should fix their busted bam file. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5949 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-05 05:18:16 +00:00
delangel	7e49e1668f	Finished changing md5's due to recent change in definition of mixed and indel vc's. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5948 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-05 00:40:51 +00:00
delangel	d534241f35	Major revamp of annotations for indels: a) All rank sum tests now work for indels including multiallelic sites. For the latter cases, rank sum test is REF vs most common allele b) Redid computation of HaplotypeScore for indels. It's now trivially easy to do because we are already computing likelihoods of each read vs haplotypes in GL computation so we reuse that if available. For multiallelic case, we score against N haplotypes where N is total called alleles. Drawback is that all cases need information contained in likelihood table that stores likelihood for each pileup element, for each allele. If this table is not available we dont annotate, so we can only fully annotate indels right now when running UG but not when running VariantAnnotator alone. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5947 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-04 15:34:24 +00:00
delangel	1448a1f155	Change md5 because conversion of a tri-allelic dbsnp indel record is now legit git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5946 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-04 11:24:16 +00:00
delangel	53667ce8fa	Disabled test that checks whether output is the same whether in Genotype Given Alleles mode or not - it won't as long as extended events are finally removed. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5945 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-04 00:52:54 +00:00
delangel	35df80de14	Updated md5 due to changes to changes in QUAL field when in Genotype given alleles mode w/indels when in insertions. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5944 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-03 23:52:38 +00:00
ebanks	b93829e505	The underlying bam file for this test was busted for many reasons preventing Picard folks from making unrelated changes, so I needed to fix it. Updating md5s accordingly. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5943 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-03 22:26:06 +00:00
delangel	a8faacda4e	Major change to UG engine to support: a) Genotype given alleles with indels b) Genotyping and computing likelihoods of multi-allelic sites. When GGA option is enabled, indels will be called on regular pileups, not on extended pileups (extended pileups will be removed shortly in a next iteration). As a result, likelihood computation is suboptimal since we can't see reads that start with an insertion right after a position, and hence quality of some insertions is removed and we could be missing a few marginal calls, but it makes everything else much simpler. For multiallelic sites, we currently can't call them in discovery mode but we can genotype them and compute/report full PL's on them (annotation support comes in next commit). There are several suboptimal approximations made in exact model to compute this. Ideally, joint likelihood Pr(Data \| AC1=i,AC2=j..) should be computed but this is hard. Instead, marginal likelihoods are computed Pr(Data \| ACi=k) for all i,k, and QUAL is based on highest likelihood allele. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5941 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-03 22:13:07 +00:00
depristo	cd293f145b	More stable reduced reads representation. Bug fixes throughout. No diffs by <1% of sites in an exome, and the majority of these differences are filtered out, or are obvious artifacts. UnitTests for BaseCounts. BaseCounts extended to handle indels, but not yet enabled in the consensus reads. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5939 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-03 20:11:31 +00:00
ebanks	80cbc1924b	Oops, just realized that I forgot to comment my commit from yesterday so it was clear what was happening git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5938 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-03 18:06:41 +00:00
fromer	e4eb8087bc	A VariantContext can now be isSymbolic. More importantly, multi-allelic variants are now properly handled in determining their type [using isMixed only if any of the biallelic variant types differ between the alt alleles]. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5937 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-03 18:02:47 +00:00
depristo	b4c479bcb0	Support for reducedReads in the pileup and UG. Totally experimental -- the code interface could change, and so could the implementation. Only works for SNPs now. Pileup has contracts as well. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5936 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-03 16:39:01 +00:00
delangel	2df12472c2	One more step in commit to support multi-allelic indel genotyping and processing: utility class that supports multi-allelic genotype likelihoods git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5935 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-03 16:08:29 +00:00
ebanks	420d8feff6	No one should be calling the createHeader method(s) directly, but instead should be going through the full readHeader method (because it first sets the version); therefore I made them package protected and merged them. Updated the various unit tests that were using createHeader and were dangerously assuming that the header version was defaulting to 4.0 (which it no longer does). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5934 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-03 02:17:37 +00:00
depristo	86df10ec09	UnitTests for ConsensusSpan infrastructure git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5929 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-02 22:44:52 +00:00
fromer	74298f6858	Basic walker to calculate statistics of CNV genotyping copy counts git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5927 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-02 21:46:35 +00:00
depristo	ad9dca9137	Package updated. Copyrights added git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5926 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-02 21:29:27 +00:00
depristo	3d628f06f0	moved to playground git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5925 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-02 21:25:26 +00:00
depristo	429833c05a	Intermediate commit (DVCS, where are you?) of a fully operational ReducedRead walker. Now results in minor differences in the raw calls (filtering is a different matter) in an exome but 20x less disk space than the full exome data. Changes to the UG necessary to process reduced reads are not yet committed, as they are being tested. This code is being moved to playground now. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5924 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-02 21:13:31 +00:00
ebanks	dd6d61c031	Adding integration test to cover the case of a read that only covers an insertion (i.e. no M in the CIGAR string). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5923 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-02 21:02:47 +00:00
ebanks	d0ca6f8a9c	Patch for case that a read spans only an insertion (i.e. no Ms in the CIGAR string): the end position should not be less than the start position (which is how Picard defines it) but instead should be equal to it. This is just a patch; we'll get a proper solution in at some point. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5922 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-02 20:40:56 +00:00
ebanks	3302a733ef	Fixed docs git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5920 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-02 16:02:14 +00:00
chartl	84c2c5d7e6	Stop running away from my commits, test modules. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5919 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-02 13:05:53 +00:00
chartl	092952db44	After verifying that the changes to these tests were all in the RankSum annotations, I'm commiting fixes to the test md5s. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5918 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-02 13:01:18 +00:00
ebanks	c7fe062cb7	Refactored the VCF codec classes to minimize code duplication (which happened during the VCF3/4 split). Now, both codecs extend the AbstractVCFCodec class and all shared functionality exists there. Only methods that differ between the various codecs (e.g. because FILTER strings are encoded differently) are defined in the actual codecs. While I was in there, I put in checks for invalid empty inputs in the ID, FILTER, and INFO fields. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5917 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-01 19:40:47 +00:00
ebanks	81d9808eea	Next version of test output for non-determinism git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5916 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-01 19:36:56 +00:00
chartl	511cd48d7a	There is an edge case ( \|Set1\| = 5, \|Set2\| = 4) where the exact p-value exceeds the range of the normal distribution we want to invert. For the edge cases, this happens exactly at the mean, and so this can be safely replaced with a z value of 0.0 git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5915 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-01 17:30:09 +00:00
carneiro	dcd13060e1	created wiki page for Print Reads and changed help to match wiki. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5914 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-01 16:26:32 +00:00
droazen	8f6af299d8	Remove what is hopefully the last of the evil core -> playground dependencies. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5913 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-01 16:22:35 +00:00

1 2 3 4 5 ...

4704 Commits (165befd38ae9ddd24fa306e2d2cfe245ffa758be)