gatk-3.8

Commit Graph

Author	SHA1	Message	Date
Ryan Poplin	5f4a063def	Breaking up my massive commits into smaller pieces that I can successfully merge and digest. This one enables downsampling in the HaplotypeCaller (by lowering the default dcov to 20) and removes my long-standing, temporary region-based downsampling.	2013-01-30 16:14:07 -05:00
David Roazen	591df2be44	Move additional VariantContext utility methods back to the GATK Thanks to Eric for his feedback	2013-01-30 13:58:17 -05:00
Ryan Poplin	ff8ba03249	Updating BQSR integration test md5s to reflect the updates to the hierarchicalBayesianQualityEstimate function	2013-01-30 13:30:18 -05:00
Ryan Poplin	85dabd321f	Adding unit tests for hierarchicalBayesianQualityEstimate function	2013-01-30 13:26:07 -05:00
Ryan Poplin	07fe3dd1ef	Merge branch 'master' of github.com:broadinstitute/gsa-unstable	2013-01-30 13:19:24 -05:00
David Roazen	9985f82a7a	Move BaseUtils back to the GATK by request, along with associated utility methods	2013-01-30 13:09:44 -05:00
Ryan Poplin	2967776458	The Empirical quality column in the recalibration report can't be compared in the BQSRGatherer because the value is calculated using the Bayesian estimate with different priors. This value should never be used from a recalibration report anyway except during plotting.	2013-01-30 12:28:14 -05:00
Eric Banks	d067c7f136	Resolving merge conflicts	2013-01-30 10:47:59 -05:00
Eric Banks	9025567cb8	Refactoring the SimpleGenomeLoc into the now public utility UnvalidatingGenomeLoc and the RR-specific FinishedGenomeLoc. Moved the merging utility methods into GenomeLoc and moved the unit tests around accordingly.	2013-01-30 10:45:29 -05:00
Mark DePristo	4852c7404e	GenomeLocs are already comparable, so I'm removing the less complete GenomeLocComparator class and updating ReduceReads and CompressionStash to use built-in comparator	2013-01-30 10:12:27 -05:00
Ryan Poplin	59311aeea2	Getting back null values from the tables is perfectly reasonable if those covariates don't appear in your table. Need to handle them gracefully.	2013-01-30 10:06:14 -05:00
Ryan Poplin	e7d7d70247	Merge branch 'master' of github.com:broadinstitute/gsa-unstable	2013-01-30 10:01:06 -05:00
Mark DePristo	92c5635e19	Cleanup, document, and unit test ActiveRegion -- All functions tested. In the testing / review I discovered several bugs in the ActiveRegion routines that manipulate reads. New version should be correct -- Enforce correct ordering of supporting states in constructor -- Enforce read ordering when adding reads to an active region in add -- Fix bug in HaplotypeCaller map with new updating read spans. Now get the full span before clipping down reads in map, so that variants are correctly placed w.r.t. the full reference sequence -- Encapsulate isActive field with an accessor function -- Make sure that all state lists are unmodifiable, and that the docs are clear about this -- ActiveRegion equalsExceptReads is for testing only, so make it package protected -- ActiveRegion.hardClipToRegion must resort reads as they can become out of order -- Previous version of HC clipped reads but, due to clipping, these reads could no longer overlap the active region. The old version of HC kept these reads, while the enforced contracts on the ActiveRegion detected this was a problem and those reads are removed. Has a minor impact on PLs and RankSumTest values -- Updating HaplotypeCaller MD5s to reflect changes to ActiveRegions read inclusion policy	2013-01-30 09:47:12 -05:00
Mauricio Carneiro	3d9a83c759	BaseCoverageDistributions should be 'by reference' otherwise we miss all the 0 coverage spots.	2013-01-29 22:37:44 -05:00
Mauricio Carneiro	29fd536c28	Updating licenses manually Please check that your commit hook is properly pointing at ../../private/shell/pre-commit Conflicts: public/java/test/org/broadinstitute/variant/VariantBaseTest.java	2013-01-29 17:27:53 -05:00
David Roazen	a536e1da84	Move some VCF/VariantContext methods back to the GATK based on feedback -Moved some of the more specialized / complex VariantContext and VCF utility methods back to the GATK. -Due to this re-shuffling, was able to return things like the Pair class back to the GATK as well.	2013-01-29 16:56:55 -05:00
Eric Banks	e4ec899a87	First pass at adding unit tests for the RR framework: I have added 3 tests and all 3 uncovered RR bugs! One of the fixes was critical: SlidingWindow was not converting between global and relative positions correctly. Besides not being correct, it was resulting in a massive slow down of the RR traversal. That fix definitely breaks at least one of the integration tests, but it's not worth changing md5s now because I'll be changing things all over RR for the next few days, so I am going to let that test fail indefinitely until I can confirm general correctness of the tool.	2013-01-29 15:51:07 -05:00
Ryan Poplin	cba89e98ad	Refactoring the Bayesian empirical quality estimates to be in a single unit-testable function.	2013-01-29 15:50:46 -05:00
Guillermo del Angel	1d5b29e764	Unit tests for repeat covariates: generate 100 random reads consisting of tandem repeat units of random content and size, and check that covariates match expected values at all positions in reads. Fixed corner case where value of covariate at border between 2 tandem repeats of different length/content wasn't consistent	2013-01-29 15:23:02 -05:00
Guillermo del Angel	c11197e361	Refactored repeat covariates to eliminate duplicated code - now all inherit from basic RepeatCovariate abstract class. Comprehensive unit tests coming...	2013-01-29 10:10:24 -05:00
Ryan Poplin	35543b9cba	updating BQSR integration test values for the PR half of BQSR.	2013-01-29 09:47:57 -05:00
Ryan Poplin	bf25196a0b	Merge branch 'master' of github.com:broadinstitute/gsa-unstable	2013-01-28 22:33:13 -05:00
Ryan Poplin	1f254d29df	Don't set the empirical quality when reading in the recal table because then we won't be using the new quality estimates for the prior since the value is cached.	2013-01-28 22:16:43 -05:00
Guillermo del Angel	ff799cc79a	Fixed bad merge	2013-01-28 20:04:25 -05:00
Guillermo del Angel	5995f01a01	Big intermediate commit (mostly so that I don't have to go again through merge/rebase hell) in expanding BQSR capabilities. Far from done yet: a) Add option to stratify CalibrateGenotypeLikelihoods by repeat - will add integration test in next push. b) Simulator to produce BAM files with given error profile - for now only given SNP/indel error rate can be given. A bad context can be specified and if such context is present then error rate is increased to given value. c) Rewrote RepeatLength covariate to do the right thing - not fully working yet, work in progress. d) Additional experimental covariates to log repeat unit and combined repeat unit+length. Needs code refactoring/testing	2013-01-28 19:55:46 -05:00
Ryan Poplin	d665a8ba0c	The Bayesian calculation of Qemp in the BQSR is now hierarchical. This fixes issues in which the covariate bins were very sparse and the prior estimate being used was the original quality score. This resulted in large correction factors for each covariate which breaks the equation. There is also now a new option, qlobalQScorePrior, which can be used to ignore the given (very high) quality scores and instead use this value as the prior.	2013-01-28 15:56:33 -05:00
Ryan Poplin	aab160372a	No need to sort the BQSR tables by default.	2013-01-28 11:26:01 -05:00
David Roazen	f63f27aa13	org.broadinstitute.variant refactor, part 2 -removed sting dependencies from test classes -removed org.apache.log4j dependency -misc cleanup	2013-01-28 09:03:46 -05:00
Mauricio Carneiro	1aee8f205e	Tool to calculate per base coverage distribution GSATDG-29 #resolve	2013-01-27 23:38:46 -05:00
Mark DePristo	804caf7a45	HaplotypeCaller Optimization: return a inactive (p = 0.0) activity if the context has no bases in the pileup -- Allows us to avoid doing a lot of misc. work to set up the genotype when we don't have any data to genotype. Valuable in the case where we are passing through large regions without any data	2013-01-27 14:10:06 -05:00
Ami Levy-Moonshine	b4447cdca2	In cases where one uses VariantContextUtils.GenotypeMergeType.REQUIRE_UNIQUE we used to verify that the samples names are unique in VariantContextUtils.simpleMerge for each VCs. It couse to a bug that was reported on the forum (when a VCs had 2 VC from the same sample). Now we will check it only in CombineVariants.init using the headers. A new function was added to SamplesUtils with unitTests in CVunitTest.java.	2013-01-25 15:49:51 -05:00
Mark DePristo	3f95f39be3	Updating HC md5s for new cutting algorithm and default band pass filter parameters	2013-01-25 11:07:29 -05:00
Eric Banks	f7b80116d6	Don't let users play with the different exact model implementations.	2013-01-25 10:52:02 -05:00
Eric Banks	6dd0e1ddd6	Pulled out the --regenotype functionality from SelectVariants into its own tool: RegenotypeVariants. This allows us to move SelectVariants into the public suite of tools now.	2013-01-25 09:42:04 -05:00
Mark DePristo	592f90aaef	ActivityProfile now cuts intelligently at the best local minimum when in a larger than max size active region -- This new algorithm is essential to properly handle activity profiles that have many large active regions generated from lots of dense variant events. The new algorithm passes unit tests and passes visualize visual inspection of both running on 1000G and NA12878 -- Misc. commenting of the code -- Updated ActiveRegionExtension to include a min active region size -- Renamed ActiveRegionExtension to ActiveRegionTraversalParameters, as it carries more than just the traversal extension now	2013-01-24 13:48:00 -05:00
Eric Banks	6790e103e0	Moving lots of walkers back from protected to public (along with several of the VA annotations). Let's see whether Mauricio's automatic git hook really works!	2013-01-24 11:42:49 -05:00
Chris Hartl	a3b98daf1a	Merge branch 'master' of gsa2:/humgen/gsa-scr1/chartl/dev/unstable	2013-01-23 14:49:34 -05:00
Chris Hartl	7fcfa4668c	Since GenotypeConcordance is now a standalone walker, remove the old GenotypeConcordance evaluation module and the associated integration tests.	2013-01-23 14:47:23 -05:00
Mark DePristo	8026199e4c	Updating md5s for CountReadsInActiveRegions and HaplotypeCaller to reflect new activity profile mechanics -- In this process I discovered a few missed sites in the old code. The new approach actually produces better HC results than the previous version.	2013-01-23 13:46:01 -05:00
Mark DePristo	8d9b0f1bd5	Restructure ActivityProfiler into root class ActivityProfile and derived class BandPassActivityProfile -- Required before I jump in an redo the entire activity profile so it's can be run imcrementally -- This restructuring makes the differences between the two functionalities clearer, as almost all of the functionality is in the base class. The only functionality provided by the BandPassActivityProfile is isolated to a finalizeProfile function overloaded from the base class. -- Renamed ActivityProfileResult to ActivityProfileState, as this is a clearer indication of its actual functionality. Almost all of the misc. walker changes are due to this name update -- Code cleanup and docs for TraverseActiveRegions -- Expanded unit tests for ActivityProfile and ActivityProfileState	2013-01-23 13:45:21 -05:00
Chris Hartl	c500e1d8ac	Merge branch 'master' of gsa2:/humgen/gsa-scr1/chartl/dev/unstable	2013-01-22 15:31:30 -05:00
Chris Hartl	d33c755aea	Adding docs.	2013-01-22 15:29:33 -05:00
Chris Hartl	7060e01a8e	Fix for broken unit test plus some minor changes to comments. Unit tests were broken by my pulling the site status utility function into the enum. Thankfully the unit tests caught my silly duplication of a line.	2013-01-22 15:14:41 -05:00
Mauricio Carneiro	7b8b064165	Last manual license update (hopefully) if everyone updates their git hook accordingly, this will be the last time I have to manually run the script. GSATDG-5	2013-01-18 16:13:07 -05:00
Ami Levy-Moonshine	0fb7b73107	Merge branch 'master' of github.com:broadinstitute/gsa-unstable	2013-01-18 15:03:42 -05:00
Ami Levy-Moonshine	826c29827b	change the default VCFs gatherer of the GATK (not just the UG)	2013-01-18 15:03:12 -05:00
Eric Banks	cac439bc5e	Optimized the Allele Biased Downsampling: now it doesn't re-sort the pileup but just removes reads from the original one. Added a small fix that slightly changed md5s.	2013-01-18 11:17:31 -05:00
Chris Hartl	08d2da9057	Merge branch 'master' of gsa2:/humgen/gsa-scr1/chartl/dev/unstable	2013-01-18 10:28:45 -05:00
Chris Hartl	bf5748a538	Forgot to actually put in the md5. Also with the new change to record pairing and filtering, the multiple-records integration test changed: the indel records (T/TG \| T/TGACA) are matched up (rather than left separate) resulting in properly identifying mismatching alleles, rather than HET-UNAVAILABLE and UNAVAILABLE-HET. Very nice.	2013-01-18 10:25:36 -05:00
Chris Hartl	91030e9afa	Bugfix: records that get paired up during the resolution of multiple-records-per-site were not going into genotype-level filtering. Caught via testing. Testing for moltenized output, and for genotype-level filtering. This tool is now fully functional. There are three todo items: 1) Docs 2) An additional output table that gives concordance proportions normalized by records in both eval and comp (not just total in eval or total in comp) 3) Code cleanup for table creation (putting a table together the way I do takes -way- too many lines of code)	2013-01-18 09:49:48 -05:00

1 2 3 4 5 ...

461 Commits (b70733133260bc76cdd1cfcb6efdc89107f0f005)