gatk-3.8

Commit Graph

Author	SHA1	Message	Date
chartl	8b2d387643	Added in an eval module that calculates the dispersion histograms between eval and comp (e.g. M_{i,j} = # of times eval observed to have AC i, comp AC j -- for af it's i/100 vs j/100 ) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4507 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-15 19:07:43 +00:00
ebanks	f78ff08e2b	This is less correct than my previous change but it's what UGv1 does and now is not the right time to start mucking with things. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4506 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-15 18:56:45 +00:00
ebanks	471c18054f	Fix for SB calculation: the best overall AF might not have any mass when just looking at reads from a single strand. We need to compute the best AF for each stratification. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4505 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-15 17:51:18 +00:00
asivache	42c3d74432	bug fix git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4503 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-15 16:27:40 +00:00
chartl	c9d473edee	More changes to Variant Eval and Genotype Concordance (passes all integration tests): 1: -sample can now include a file, which will be parsed for sample-name entries 2: If you request a sample to run analysis on, but it is not present in any of your RODs, VEW will exception out 3: Change added to parse Integer, String, and List<Integer> type Allele Count annotations (error otherwise) 4 [slightly problematic]: The count objects now maintain row-keys in order, as the keys were taking an inordinate amount of time in onTraversalDone (multiple calls to getRowKeys(), so many multiple sorts of the same underlying unsorted object, very bad) There is a legacy comparison object which is unused which I will strip out soon. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4502 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-15 12:40:36 +00:00
ebanks	954dd84f51	Adding an integration test (against hg18 this time) that requires on-the-fly sorting in order to work properly. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4500 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-15 07:45:21 +00:00
ebanks	9f54170dff	Hooking up the liftover tool to the new on-the-fly sorting VCF writer so that records can now get emitted in order. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4499 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-15 07:27:01 +00:00
ebanks	d41c252b13	Looking over the calling results with Ryan, it's clear that while the grid search optimization (ignoring samples that are clearly ref) can work for assigning genotypes, it cannot be used for calculating P(AF>0). There's too much area under the likelihood curve that gets lost and the QUALs are negatively affected. However, testing showed that this only slightly affects runtime (~15 minutes per 1Mbase for the 1kg allpops). The optimization does remain for genotyping. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4498 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-14 19:06:32 +00:00
ebanks	2606e67cf1	Reverting Matt's change from yesterday which I accidentally blew away when trying to cope with the stupid svn update issues we've been plagued with recently. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4495 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-14 14:40:42 +00:00
ebanks	cfb33d8e12	Filtering optimizations are now live for UGv2. Instead of re-computing filtered bases at every locus, they are computed just once per read and stored in the read itself. Eyeballing the results on the ~600 sample set from 1kg, we cut out ~40% of the runtime! QUALs are now sometimes different from UGv1 because I noticed a bug in v1 where samples with spanning deletions only were assigned ref calls instead of no-calls which ever so slightly affects the QUAL. Not a big deal though. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4494 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-14 05:04:28 +00:00
chartl	4ac636e288	Minor change: when tabulating concordance by AC, ignore sites with multiple segregating alleles in the population, at least for now git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4493 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-14 01:35:33 +00:00
chartl	7c9ef59d65	This is simultaneously a minor and major change to VariantEval, so take heed: The core walker has been modified so that when variant contexts (eval and comp) are subset to command-line-specified sample(s), the chromosome count annotations (AC/AN/AF) are altered to reflect the AC/AN/AF of only those samples involved in the comparison. No more getting AC500 when you're comparing a 10-sample overlap. Interestingly enough, this didn't break any integration tests. GenotypeConcordance now has two additional tables: Allele Count Statistics, and Allele Count Summary Statistics. These work exactly identically to the Sample Statistics and Sample Summary Statistics tables, except that the partition being used is no longer the sample, but instead the allele count of the variant sites. These tables stratify by both eval and comp ACs, e.g. evalAC0 evalAC1 evalAC2 compAC0 compAC1 compAC2 Differences with previous integration tests were verified to only be in the Allele Count tables (by grepping them out of the diff); a new test has been added for the simple case of an AC=1 site in the eval becoming an AC=2 site in the comp. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4491 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-13 22:26:15 +00:00
hanna	83b8676b69	Hack to fix mysterious disappearing read attributes. Ultimately caused by the fact that the GATKSAMRecord, by design, needs to both inherit from SAMRecord and wrap a 'member' SAMRecord, and method calls that aren't implemented as explicit passthroughs can compromise the content of the SAMRecord in subtle ways. Will be automatically fixed when Picard moves to a lightweight SAMRecord interface rather than the current heavyweight implementation. But in the short-term, there's no obvious fix. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4489 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-13 19:06:54 +00:00
depristo	da29fcdb68	No longer writes the index to disk twice. But fixes for closing VCFWriters throughout the codebase git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4488 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-13 14:26:06 +00:00
aaron	28a1020c89	comment out debugging line that was clogging the performance test output. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4487 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-13 03:26:55 +00:00
aaron	272ac2ae4a	more fixes for tests broken by indexing-on-the-fly; I think this should do it. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4486 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-13 01:54:32 +00:00
hanna	ed39af53cd	Fix for exception when trying to load reference segment for a read that aligns to 0 bases. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4485 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-12 23:50:51 +00:00
ebanks	fe9f128631	Better fix for earlier bug. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4484 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-12 19:21:33 +00:00
aaron	ff0df1a2da	A fix for an integration test that was broken by on-the-fly indexing. Also, better reporting of Tribble exceptions in GATK integration tests. Trying to get the tests back up and running... git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4483 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-12 18:39:56 +00:00
ebanks	69652e08c6	Bug fix for reads that completely fall within an insertion: the I cigar string element was 1 base too long. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4482 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-12 14:46:21 +00:00
kiran	f348ca2976	Now processes VCF files with repeated loci without crashing. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4481 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-12 04:36:07 +00:00
ebanks	fd8351cd49	Get rid of useless test/'optimization' that was carried over from UGv1. New codde is (minimally) faster with same results. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4478 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-11 04:04:07 +00:00
ebanks	f28523e7de	Implemented SB for UGv2. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4477 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-11 03:56:01 +00:00
hanna	7008a469dc	Update MalformedReadFilter to pass reads that have cigar strings like 40S36I that have 0 aligned bases in the genome. We'll have to fix walkers as faults appear. Also added JIRA GSA-406: finer-grained control of MalformedReadFilter: want to exception out by default in these cases but pass them with a warning with a corresponding -U flag. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4476 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-11 03:01:04 +00:00
ebanks	530875817f	Experimental code for better filtering of bases in sam records. Not hooked up yet. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4475 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-11 02:19:51 +00:00
ebanks	a0de269c4b	Better message git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4474 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-10 20:11:51 +00:00
rpoplin	0a4cf02a52	Fix for index out of bounds exception in VR. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4473 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-10 17:35:15 +00:00
depristo	116309b3c3	More test cases for UG integration test. We currently fail doing multi-threaded gzip output, FYI git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4472 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-08 20:22:12 +00:00
depristo	38a67fed63	High performance version of standard vcf writer. New general static Tribble class for common constants, including general .idx constant and functions to get standard index name for a given file. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4471 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-08 19:53:21 +00:00
fromer	bdd3a9752e	Changed min MQ and BQ to 20 (for phasing) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4469 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-08 19:27:45 +00:00
asivache	05500d1a8d	An iterator wrapper/adapter: takes GenomeLoc iterators 1 and 2 and traverses intersections of intervals from 1 with intervals from 2. Both 1 and 2 must be SORTED and NON_OVERLAPPING, but this iterator does NOT perfrom any checks, so if these conditions are not met, the behavior is unspecified git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4468 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-08 16:34:00 +00:00
asivache	253d528e49	not ready for commit yet git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4467 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-08 15:30:55 +00:00
asivache	4f2f33b42a	fix method invocation to conform to new API; this version of the code will compile but new functionality is still not fully in git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4466 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-08 15:30:26 +00:00
asivache	cece19d4d2	not ready for commit yet git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4465 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-08 15:14:54 +00:00
asivache	39e373af6e	deleting accidentally committed junk git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4464 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-08 15:13:01 +00:00
asivache	b3d81984aa	renaming MergingIterator to RODMergingIterator as it is more appropriate for this specialized implementation git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4462 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-08 14:10:11 +00:00
asivache	77dddd0afa	renaming MergingIterator to RODMergingIterator as it is more appropriate for this specialized implementation git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4461 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-08 14:08:28 +00:00
chartl	21ec44339d	Somewhat major update. Changes: - ProduceBeagleInputWalker + Now takes a validation ROD and a prior to give it, will use those genotypes in place of the variant genotypes if both are present + Takes a bootstrap argument -- can use some given %age of the validation sites + Optionally takes a bootstrap output argument -- re-prints the validation VCF, filtering those sites used as part of the bootstrap -BeagleOutputToVCFWalker + Now filters sites where the genotypes have been reverted to hom ref + Now calls in to the new VCUtils to calculate AC/AN -Queue + New pipeline libraries for easy qscript creation, still a work in progress, but this is a considerable prototype + full calling pipeline v2 uses the above libraries + minor changes to some of my own scripts + no more need for contig interval lists, these will be parsed out of your normal interval list when it is provided git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4459 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-08 13:30:28 +00:00
ebanks	97b153f2fa	Quick fix git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4457 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-08 06:10:52 +00:00
ebanks	acd238f3f2	For Chris: pull out the chromosome counting code into VCUtils so that other tools can make use of it. Transitioned SelectVariants over to use it. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4456 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-08 04:37:54 +00:00
delangel	3838823262	Two ugly hopefully temporary fixes for new genotyping model: a) In Indel genotyper: we can't deal yet with extended events correctly and we are still triggering at each extended event which results in repeated records on a vcf. So, to avoid this, keep track of start position of candidate variantes we've visited and if we've visited a variant before we don't do it again. b) Avoid infinite terms in QUAL and in genotype likelihoods which can happen if posterior AF happens to be exactly zero. For now, hard-code a minimum value of each term of the posterior AF likelihood to be -300 (ie 1e-300 in lin space). This can be solved with better and smarter log-to-lin conversions and some precision fixes in AF calculation. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4455 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-08 00:53:16 +00:00
depristo	0a2e76e9dc	2nd step towards on the fly indexing. Also fixed parsing bug for headers with < symbols git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4454 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-07 21:38:46 +00:00
rpoplin	7bb9704592	Update the BeagleOutputToVCF integration test because of removing the source header line. Source headers are provided by the engine for all VCF files now. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4453 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-07 19:55:57 +00:00
rpoplin	0de658534d	Removed the qScale arguments in VariantRecalibrator. It is smarter about how it tries to find a cut so the arbitrary scale factor hopefully is no longer necessary. Now the recalibrated variant quality score more accurately reflects our believed lod of the call. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4451 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-07 18:04:57 +00:00
fromer	ee00dcb79d	1. Phasing now ignores bases without minimum base quality (BQ) and minimum mapping quality (MQ); 2. The probability of a non-called base is now divided by 3, to evenly split up the error probability over the non-called bases git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4450 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-07 17:40:59 +00:00
ebanks	6205910f9f	updating integration test for Sarah Calvo git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4449 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-07 04:03:37 +00:00
fromer	652a3e8de5	Added integration tests for ReadBackedPhasing git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4446 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-06 20:50:32 +00:00
fromer	f8f1cc45a3	Now ReadBackedPhasing caps Base Quality by Mapping Quality git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4445 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-06 20:48:57 +00:00
scalvo	bda427f078	Change specification of AnnotationInputTable, and fix 2 bugs. Previous output spec contained 3 columns: haplotypeReference,haplotypeAlternate,haplotypeStrand where haplotypeReference was always on the + strand, and haplotypeAlternate was on the strand specified by haplotypeStrand. The new specification contains 3 columns: haplotypeReference,haplotypeAlternate,transcriptStrand where haplotypeRef and haplotypeAlt are required to be on the + strand. transcriptStrand now specifies the strand of the transcript, which is needed for interpreting the haplotypes. Bugfix #1: fix incorrect assignment of variantCodon and variantAA (Previously variantCodon was incorrectly set to referenceCodon) Bugfix #2: fix incorrect codingCoordStr values for - strands (bug reported by Giulio Genovese), and incorrect usage of "m." for mitochondrial transcripts (bug reported by Steve Hershman) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4444 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-06 20:46:09 +00:00
scalvo	b5c127e643	Removed HAPLOTYPE_STRAND_COLUMN; Previously, GenomicAnnotation allowed a user to specify the strand of the haplotypeAlternate, and would reverseComplement the haplotypeAlternate if HAPLOTYPE_STRAND_COLUMN was "-". The new specification does not allow this functionality, and instead requires both the reference and the alternate haplotypes to be on the + strand (as in VCF format). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4443 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-06 20:37:41 +00:00

1 2 3 4 5 ...

3697 Commits (9dc2e931b641e5fc34996ec31001e2196cdecb8b)