gatk-3.8

Commit Graph

Author	SHA1	Message	Date
chartl	b3872386c9	Test to ensure that ConcordanceTruthTable and those walkers which rely on it for tabulating pooled truth information from truth information of the individuals within the pool is doing that calculation correctly. Tests single het, single hom (with/without reference), together, together without reference, and a mix of everything. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2082 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-19 15:26:32 +00:00
depristo	eeb3a3fffb	comments for Aaron git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2081 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-19 12:56:04 +00:00
aaron	7997455f38	first go of the genotyper for the GATK paper. More testing and review tomorrow to call it done. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2080 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-19 07:55:24 +00:00
ebanks	7b957d3e2e	Make the whining from Khalid's office stop already git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2079 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-19 03:04:48 +00:00
hanna	85bc9d3e91	(Hopefully) temporary hack: load contig information by contig name rather than contig id to avoid off-by-one errors. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2078 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-18 23:33:27 +00:00
rpoplin	0fbd81766b	CountCovariates now uses any rod of type VariationRod with the name dbsnp as the source of known variant sites to skip over. It also grabs the platform string out of the read group when deciding which algorithm to use to calculate machine cycle. In this way it can now handle multi-platform bams. I added a new covariate: PositionCovariate. This is simply the offset regardless of which platform the read came from. This will be useful for comparing between the two covariates. Finally, this message serves as a warning that I will be killing the old recalibrator tomorrow after I've updated and verified new integration tests. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2077 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-18 23:03:47 +00:00
ebanks	f667bed7fc	-Don't annotate allele balance or on-off genotype if there's no genotype data -If qscore is infinity (because of precision) make a best guess instead git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2076 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-18 22:01:32 +00:00
chartl	90212c643b	more effective & efficient test for SecondBaseSkew git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2075 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-18 20:53:32 +00:00
ebanks	087e01a439	minor changes for --noSLOD git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2074 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-18 18:48:01 +00:00
ebanks	a70cf2b763	A bunch of changes needed to make outputting pooled calls possible git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2073 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-18 18:42:57 +00:00
ebanks	0a35c8e0ba	1. The joint estimation model now constrains genotypes to be AA,AB,or BB only (i.e. to use a single alternate allele). Note that this doesn't work for the old models (point estimate or SSG) because calculations aren't divided by alternate allele. 2. Allele frequency spectrum is not emitted for single samples (since it doesn't make sense). 3. If in pooled mode, throw an exception of pool size isn't set appropriately. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2072 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-18 17:43:15 +00:00
chartl	405c6bf2c1	VariantEval genotype concordance for pools! Integration test coming soon git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2071 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-18 17:24:54 +00:00
depristo	6fe1c337ff	Pileup cleanup; pooled caller v1 git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2070 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-18 17:03:48 +00:00
rpoplin	f0a234ab29	TableRecalibration is now much smarter about hashing calculations, taking advantage of the sequential recalibration formulation. Instead of hashing RecalDatums it hashes the empirical quality score itself. This cuts the runtime by 20 percent. TableRecalibration also now skips over reads with zero mapping quality (outputs them to the new bam but doesn't touch their base quality scores). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2069 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-18 16:47:44 +00:00
chartl	be31d7f4cc	Added - a walker that outputs relevant information about false negatives given a bunch of hapmap individuals and corresponding integration tests for it. This will output for hapmap variant sites: chromosome position ref allele variant allele number of variant alleles of the individuals depth of coverage power to detect singletons at lod 3 number of variant bases seen whether or not variant was called git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2068 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-18 15:47:52 +00:00
chartl	b68d6e06b7	Rollback of the previous "fix" and implementation of the real fix. We totally do want to annotate the call if called by another walker. Totally boneheaded misenterpretation of what the code was doing -- Eric, please forgive me for being an idiot. Instead, change the StingException to what it really should be -- an IllegalStateException, which is not coincidentally already handled by the calling function. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2067 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-18 06:09:24 +00:00
chartl	95f1be94c0	Fix for the broken build: do not attempt to annotate if UnifiedGenotyper is called from another walker! Why this didn't break the build earlier I have no idea. Ultimately, there should be a better way of interfacing UG with another walker -- what if some other walker wants the annotations from UG? But since we're calling map directly -- and the annotations don't get returned directly from map -- this needs to be handled differently, while the map function should ultimately return the LOD score or quality under the GCM alone. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2066 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-18 05:56:31 +00:00
ebanks	9fb50e9bd9	Further refactoring so that pooled calling will work. Okay, Mark, you should be all set. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2065 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-18 00:18:13 +00:00
chartl	539f6f15e5	Added -- Second base skew annotations and integration tests. Nothing need be given except -A SecondBaseSkew; the statistic it annotates calls with is a chi-square statistic given by the deviation of the observed proportion of reference second-best-bases from the expected 1/3. Future additions may be to ask that the deviation be instead from a given transition table. A big note for all users: All IllegalStateExceptions from the variation ROD (e.g. the RodGeliText) are dealt with SILENTLY. I understand this isn't optimal, but I'd rather simply not annotate a non-bi-allelic site than fail completely (there are quite a few such sites even on the regions over which the integration test has been written). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2064 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-18 00:11:13 +00:00
depristo	42a0bbaf46	Minor reformating for pooled calling git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2063 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-17 22:06:11 +00:00
rpoplin	ec1a870905	Working with byte arrays is faster than working with Strings so the Covariates now take in byte arrays. None of the Covariates themselves used the reference base so I removed it. DinucCovariate now returns a Dinuc object which implements Comparable instead of returning a String because it was too slow. CountCovariates now uses a read filter to filter out unmapped reads and allows the user to specify -cov all which will use all of the available covariates, of which there are 7 now. If no covariates are specified it defaults to ReadGroup and QualityScore, the two required covariates. Initial code in place to leave SOLID bases alone if they have bad color space quality. TableRecalibration uses @Requires to tell the GATK to not give the reference bases since they weren't being used for anything. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2062 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-17 21:50:52 +00:00
ebanks	4d9c826766	Integration tests actually run on real data now. <tries to hide sheepish grin> git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2061 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-17 21:04:14 +00:00
ebanks	5e126875ea	temporarily disable (tests are broken) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2060 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-17 20:45:52 +00:00
ebanks	a048f5cdf1	-Refactored JointEstimation code so that pooled calling will work -Use phred-scale for fisher strand test -Use only 2N allele frequency estimation points git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2059 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-17 20:21:15 +00:00
chartl	43bd4c8e8f	Ignoring deletions in the primary pileup by default was causing the primary pileup to become shorter than the secondary pileup when building up the secondary base pileup string. This fix makes sure to include the primary Ds within the pileup so that not only are the pileups guaranteed to be the same size, the same offsets will truly correspond with the same read. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2058 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-17 17:20:13 +00:00
aaron	aece7fa4c7	a convenience method to join a map into a single string, which I need for some VCF work. Added some documentation to the join method as well. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2057 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-17 16:50:01 +00:00
asivache	21729d9311	Do not print debug message when debug mode is not requested!! git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2056 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-16 20:28:41 +00:00
rpoplin	967215066d	The old CountCovariates now warns the user if they didn't supply a dbSNP rod file. Thanks Kiran for the use case. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2055 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-16 19:16:46 +00:00
rpoplin	eb07c7f7f8	CountCovariates now warns the user if they didn't supply a dbSNP rod file. Thanks Kiran for the use case. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2054 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-16 18:44:54 +00:00
ebanks	4558375575	Stage 1 of the VariantFiltration refactoring is now complete. There now exists a parallel tool called VariantAnnotator which simply takes variant calls and annotates them with the same type of data that we used to use for filtering (e.g. DoC, allele balance). The output is a VCF with the INFO field appropriately annotated. VariantAnnotator can be called as a standalone walker or by another walker, as it is by the UnifiedGenotyper. UG now no longer computes any of this meta data - it relegates the task completely to the annotator (assuming the output format accepts it). This is a fairly all-encompassing check in. It involves changes to all of the UG code, bug fixes to much of the VCF code as things popped up, and other changes throughout. All integration tests pass and I've tediously confirmed that the annotation values are correct, but this framework could use some more rigorous testing. Stage 2 of the process will happen later this week. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2053 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-16 02:41:20 +00:00
hanna	ce5034dc5d	Finally reinstate the iterator-style interface. Get rid of some scaffolding code. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2052 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-16 02:34:19 +00:00
kiran	103763fc84	An accessor for the VCF header git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2051 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-15 09:28:25 +00:00
kiran	97ed945797	Example code for a bug in the VCF implementation. See JIRA entry at http://jira.broadinstitute.org:8008/browse/GSA-225 git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2050 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-15 09:27:12 +00:00
rpoplin	88fd762436	The -rf argument is now being used for read filter and is colliding with my walkers. Changed mine to -recalFile git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2048 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-14 19:37:46 +00:00
rpoplin	b05119987c	Clarified some of the comments in the individual covariates now that things have been moved around to speed up the code. In general most error checking and adjustments to the data are done per read instead of per base. This means that functionality was moved out of the covariate modules and into CovariateCounterWalker and TableRecalibrationWalker. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2047 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-14 18:44:05 +00:00
rpoplin	672472789e	Added some documentation to the helper classes. Fixed an error case in TableRecalibrationWalker. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2046 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-14 18:13:43 +00:00
hanna	15c14add4d	Repackage the aligner for better partitioning. The C aligner, for example, is now partitioned from the Java aligner, and both are partitioned from the more general- purpose BWT reader. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2045 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-13 22:55:27 +00:00
rpoplin	d1b525b428	Default window size for NQS covariate is 3 git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2040 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-13 19:24:27 +00:00
rpoplin	394c839974	Implemented NQS covariate. Extended Cycle covariate to handle 454 and SOLID reads. Added a Primer Round covariate for SOLID reads. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2039 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-13 19:22:21 +00:00
ebanks	bf451873ff	1. Bug fix: check that AF=0 doesn't contain more probability than 1-fraction 2. Fix for Kiran: allow UG to call SNPs at deletion sites; we'll add an annotation to the VariantAnotator for deletions at the locus (next week). 3. Added integration tests for joint estimation model git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2038 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-13 18:02:18 +00:00
asivache	1be36ca959	Bug fix: when cleanedReadIterator is initialized, it gets immediately set to the contig of the first cleaned read; when the first uncleaned read coming in is on the lower contig, this would trigger 'readNextContig' with that lower contig as an argument. As the result, the whole cleaned reads file would be read through the end and no cleaned reads would be ever seen by the code afterwards. Now we do not call readNextContig if the (uncleaned) read's contig is lower than the current contig already loaded into cleanedReadIterator. the 'readNextContig' method now also throws an exception if requested contig is less than the currently loaded one git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2037 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-13 15:41:26 +00:00
rpoplin	b1376e4216	structure refactored throughout for performance improvements git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2036 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-13 15:41:09 +00:00
depristo	cff31f2d06	comments for eric git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2035 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-13 14:19:31 +00:00
aaron	234bb71747	changed the toVariation() method to take a reference base, instead of using the reference base loaded from the underlying data source (if it was reference aware). Also changed some isVariant() methods which weren't using the passed in ref base. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2034 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-13 06:54:38 +00:00
ebanks	902cf84448	Bug fix: if the most likely allele frequency is 0, don't make a variant call (even if the Qscore for AF=1/n > threshold) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2033 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-13 04:10:32 +00:00
ebanks	555fb975de	1. Print out allele frequency range (from joint estimation model only). 2. Don't print verbose output from SLOD calculation (it's just a repeat of previous output). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2032 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-13 03:59:13 +00:00
mmelgar	72825c4848	A walker that generates a table of secondary base counts in a bam file. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2031 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-13 02:11:23 +00:00
hanna	7c386fa428	Another case of reordering of read groups blowing up checksums. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2030 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-13 00:07:35 +00:00
hanna	8145ed4672	Take 2, updating picard with bug fix for bam files containing no reads. Just stomped on the existing md5s because that's what Eric told me to do. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2029 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-12 22:52:08 +00:00
ebanks	61b5fb82ce	2 major changes: 1. Add dbsnp RS ID to VCF output from genotyper; to do this I needed to fix the dbsnp rod which did not correctly return this value. 2. Remove AlleleBalanceBacked and instead generalize the arbitrary info fields backing VCFs (and potentially others) in preparation for refactoring VariantFiltration next week. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2028 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-12 22:51:49 +00:00

1 2 3 4 5 ...

1746 Commits (4082f4677e77b4477f0bf2e47a717f6fed883fcc)