gatk-3.8

Commit Graph

Author	SHA1	Message	Date
Ryan Poplin	63213e8eb5	Expanding the HaplotypeCaller integration tests to cover a wider range of data	2012-08-22 14:18:44 -04:00
Eric Banks	944e1c299d	Docs for --keepOriginalAC were wrong in SelectVariants	2012-08-22 13:07:13 -04:00
Guillermo del Angel	901f47d8af	Final step (for now) in VA refactoring: update MD5's because, a) since it's not guaranteed that we'll iterate through reads/pileups in the same order, the rank sum dithering will change annotations, b) FS uses new generic threshold to distinguish uninformative reads (it used to use ad-hoc thresholds), c) AD definition changed and throws away uninformative reads, d) shortened general ploidy integration tests for quicker debugging. May have missed some MD5's in the update so there may be lingering test failures still	2012-08-22 11:38:51 -04:00
Guillermo del Angel	7df0abf49b	Merge branch 'master' of ssh://gsa4.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-08-22 11:36:41 -04:00
Christopher Hartl	20601f034e	Updating the checkType() function to include the new StructuralIndel variant type. Fixes outstanding broken integration test.	2012-08-22 07:33:10 -07:00
Guillermo del Angel	6a8cf1c84a	Enable and adapt HaplotypeScore and MappingQualityZero as active region annotations now that we have per-read likelihoods passed in to annotations	2012-08-21 14:35:40 -04:00
Guillermo del Angel	d0644b3565	Merge branch 'master' of ssh://gsa4.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-08-21 10:35:23 -04:00
Ryan Poplin	94e7f677ad	Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-08-21 10:21:47 -04:00
Guillermo del Angel	418ace463a	More merge conflict resolution	2012-08-21 10:15:52 -04:00
Ryan Poplin	605acaae9c	Another round of FindBugs fixes. Object internally stores a reference to an externally mutable array. Very dangerous.	2012-08-21 09:33:58 -04:00
Ryan Poplin	55b7949d68	Another round of FindBugs fixes. Comparator doesn't implement Serializable.	2012-08-21 09:20:55 -04:00
Christopher Hartl	ba8622ff0d	number of stashed changes are lurking in here. In order of importance: - Fix for M_Trieb's error report on the forum, and addition of integration tests to cover the walker. - Addition of StructuralIndel as a class of variation within the VariantContext. These are for variants with a full alt allele that's >150bp in length. - Adaptation of the MVLikelihoodRatio to work for a set of trios (takes the max over the trios of the MVLR) - InsertSizeDistribution changed to use the new gatk report output (it was previously broken) - RetrogeneDiscovery changed to be compatible with the new gatk report - A maxIndelSize argument added to SelectVariants - ByTranscriptEvaluator rewritten for cleanliness - VariantRecalibrator modified to not exclude structural indels from recalibration if the mode is INDEL - Documentation added to DepthOfCoverageIntegrationTest (no, don't yell at chartl ;_; ) Also sorry for the long commit history behind this that is the result of fixing merge conflicts. Because this also fixes a conflict (from git stash apply), for some reason I can't rebase all of them away. I'm pretty sure some of the commit notes say "this note isn't important because I'm going to rebase it anyway".	2012-08-21 07:08:58 -04:00
Eric Banks	286b658fab	Re-enabling parallelism in the BaseRecalibrator now that the release is out.	2012-08-20 21:25:14 -04:00
Guillermo del Angel	7bbd2a7a20	Fixing merge conflicts	2012-08-20 20:38:25 -04:00
Guillermo del Angel	2041cb853c	New implementation of AD - ignore now non-informative reads based on per-read likelihoods	2012-08-20 20:31:34 -04:00
Ryan Poplin	77fbaec044	Another round of FindBugs fixes. Class implements its own compareTo() but uses base Object.equals() which can lead to unpredictable behavior.	2012-08-20 16:55:00 -04:00
Ryan Poplin	5db3bd6fd2	Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-08-20 15:28:57 -04:00
Ryan Poplin	464d49509a	Pulling out common caller arguments into its own StandardCallerArgumentCollection base class so that every caller isn't exposed to the unused arguments from every other caller.	2012-08-20 15:28:39 -04:00
Eric Banks	4450d66c64	Fixing the docs for DP and AD	2012-08-20 15:10:24 -04:00
Ryan Poplin	c67d708c51	Bug fix in HaplotypeCaller for non-regular bases in the reference or reads. Those events don't get created any more. Bug fix for advanced GenotypeFullActiveRegion mode: custom variant annotations created by the HC don't make sense when in this mode so don't try to calculate them.	2012-08-20 13:41:08 -04:00
Guillermo del Angel	5b5fee56cf	Next iteration of new VA interface: extend changes to per-genotype annotations as well. Will allow to have AD correctly implemented at last (that change not done yet)	2012-08-20 12:52:15 -04:00
Eric Banks	154f65e0de	Temporarily disabling multi-threaded usage of BaseRecalibrator for performance reasons.	2012-08-20 12:43:17 -04:00
Guillermo del Angel	c384677917	Merge branch 'master' of ssh://gsa4.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-08-20 10:27:25 -04:00
Eric Banks	97b191f578	Thanks to Guillermo I was able to isolate an instance of where the MLEAC > AN. It turns out that this is valid, e.g. when PLs are all 0s for a sample we no-call it but it's allowed to factor into the MLE (since that's the contract with the exact model). Removing the check in UG and instead protecting for it in the AlleleCount stratification.	2012-08-20 01:16:23 -04:00
Guillermo del Angel	963ad03f8b	Second step of interface cleanup for variant annotator: several bug fixes, don't hash pileup elements to Maps because the hashCode() for a pileup element is not implemented and strange things can happen. Still several things to do, not done yet	2012-08-19 21:18:18 -04:00
Mark DePristo	9121b98167	CombineVariants outputs the first non-MISSING qual, not the maximum -- When merging multiple VCF records at a site, the combined VCF record has the QUAL of the first VCF record with a non-MISSING QUAL value. The previous behavior was to take the max QUAL, which resulted in sometime strange downstream confusion.	2012-08-19 10:29:38 -04:00
Guillermo del Angel	d9641e3d57	Merge branch 'master' of ssh://gsa4.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-08-19 09:23:21 -04:00
Mark DePristo	980685af16	Fix GSA-137: Having both DataSource.REFERENCE and DataSource.REFERENCE_BASES is confusing to end users. -- Removed REFERENCE_BASES option. You only have REFERENCE now. There's no efficiency savings for the REFERENCE_BASES option any longer, since the reference bases are loaded lazy so if you don't use them there's effectively no cost to making the RefContext that could load them.	2012-08-17 14:55:38 -04:00
Eric Banks	2676b7fc2e	Put in a sanity check that MLEAC <= AN	2012-08-17 11:49:53 -04:00
Eric Banks	53383e82ec	Hmm, not good. Fixing the math in PBT resulted in changed MD5s for integration tests that look like significant changes. I am reverting and will report this to Laurent.	2012-08-16 21:41:18 -04:00
Guillermo del Angel	b61ecc7c19	Fix merge conflicts	2012-08-16 20:45:52 -04:00
Guillermo del Angel	d26183e0ec	First preliminary big refactoring of UG annotation engine. Goals: a) Remove gigantic hack that cached per-read haplotype likelihoods in a static array so that annotations would go back and retrieve them, b) unify interface for annotations between HaplotypeCaller and UnifiedGenotyper, c) as a consequence, removed and cleaned duplicated code. As a bonus, annotations have now more relevant info to help them compute values. Major idea is that per-read haplotype likelihoods are now stored in a single unified object of class PerReadAlleleLikelihoodMap. Class implementation in theory hides internal storage details from outside work (still may need work cleaning up interface), and this object(or rather, a Map from Sample->perReadAlleleLikelihoodMap) is produced by UGCalcLikelihoods. The genotype calculation is also able to potentially use this info if needed. All InfoFieldAnnotations now get an extra argument with this map. Currently, this map is only produced for indels in UG, or for all variants within HaplotypeCaller. If this map is absent (SNPs in UG), the old Pileup interface is used, but it's avoided whenever possible. FORMAT annotations are not yet changed but will be focus of second step. Major benefit will be that annotations will be able to very easily discard non-informative reads for certain events. HaplotypeCaller also uses this new class, and no longer hard-codes the mapping of allele ->list(reads) but instead uses the same objects and interfaces as the rest of the modules. Code still needs further testing/cleaning/reviewing/debugging	2012-08-16 20:36:53 -04:00
Eric Banks	3253fc216b	FindBugs 'Maintainability' fixes	2012-08-16 15:53:06 -04:00
Eric Banks	05cbf1c8c0	FindBugs 'Efficiency' fixes	2012-08-16 15:40:52 -04:00
Eric Banks	47b4f7b7e5	One final FindBugs related fix. I think it's safe to consider these changes 'fixes' that are allowed to go in during a code freeze.	2012-08-16 14:59:05 -04:00
Eric Banks	ded0e11b45	Killing off some FindBugs 'Realiability' issues	2012-08-16 14:00:48 -04:00
Eric Banks	dac3958461	Killing off some FindBugs 'Usability' issues	2012-08-16 13:32:44 -04:00
Eric Banks	611d9b61e2	Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-08-16 13:05:36 -04:00
Eric Banks	2df04dc48a	Fix for performance problem in GGA mode related to previous --regenotype commit. Instead of trying to hack around the determination of the calculation model when it's not needed, just simply overload the calculateGenotypes() method to add one that does simple genotyping. Re-enabling the Pool Caller integration tests.	2012-08-16 13:05:17 -04:00
Mark DePristo	132cdfd9c1	GSA-488: MLEAC > AN error when running variant eval fixed	2012-08-16 13:03:14 -04:00
Eric Banks	f368e568db	Implementing support in BaseRecalibrator for SOLiD no call strategies other than throwing an exception. For some reason we never transfered these capabilities into BQSRv2 earlier.	2012-08-15 22:52:56 -04:00
Eric Banks	9d09230c26	Better docs for verbose output of Pileup	2012-08-15 21:55:08 -04:00
Mark DePristo	669c43031a	BCF2 optimizations; parallel CombineVariants -- BCF2 now determines whether it can safely write out raw genotype blocks, which is true in the case where the VCF header of the input is a complete, ordered subset of the output header. Added utilities to determine this and extensive unit tests (headerLinesAreOrderedConsistently) -- Cleanup collapseStringList and exploreStringList for new unit tests of BCF2Utils. Fixed bug in edge case that never occurred in practice -- VCFContigHeaderLine now provides its own key (VCFHeader.CONTIG_KEY) directly instead of requiring the user to provide it (and hoping its right) -- More ways to access the data in VCFHeader -- BCF2Writer uses a cache to avoid recomputing unnecessarily whether raw genotype blocks can be emitted directly into the output -- Optimization of fullyDecodeAttributes -- attributes.size() is expensive and unnecessary. We just guess that on average we need ~10 elements for the attribute map -- CombineVariants optimization -- filters are online HashSet but are sorted at the end by creating a TreeSet -- makeCombinations is now makePermutations, and you can request to create the permutations with or without replacement	2012-08-15 21:13:16 -04:00
Mark DePristo	ae4d4482ac	Parallel combine variants! -- CombineVariants is now TreeReducible! -- Integration tests running in parallel all pass except one (will fix) due to incorrect use of db=0 flag on input from old VCF format	2012-08-15 21:13:15 -04:00
Eric Banks	87e41c83c5	In AlleleCount stratification, check to make sure the AC (or MLEAC) is valid (i.e. not higher than number of chromosomes) and throw a User Error if it isn't. Added a test for bad AC.	2012-08-14 15:02:30 -04:00
Eric Banks	8e3774fb0e	Fixing behavior of the --regenotype argument in SelectVariants to properly run in GenotypeGivenAlleles mode. Added integration tests to cover recent SV changes.	2012-08-14 14:21:42 -04:00
Eric Banks	34b62fa092	Two changes to SelectVariants: 1) don't add DP INFO annotation if DP wasn't used in the input VCF (it was adding DP=0 previously). 2) If MLEAC or MLEAF is present in the original VCF and the number of samples decreases, remove those annotations from the VC.	2012-08-14 12:54:31 -04:00
Khalid Shakir	f809f24afb	Removed SelectHeader's --include_reference_name option since the reference is always included. In SelectHeaders instead of including the path to the file, only include the name of the reference since dbGaP does not like paths in headers.	2012-08-13 16:49:27 -04:00
Mark DePristo	243af0adb1	Expanded the BQSR reporting script -- Includes header page -- Table of arguments (Arguments) -- Summary of counts (RecalData0) -- Summary of counts by qual (RecalData1) -- Fixed bug in output that resulted in covariates list always being null (updated md5s accordingly) -- BQSR.R loads all relevant libaries now, include gplots, grid, and gsalib to run correctly	2012-08-12 13:45:14 -04:00
Eric Banks	eca9613356	Adding support of X and = CIGAR operators to the GATK	2012-08-10 14:54:07 -04:00

1 2 3 4 5 ...

1380 Commits (27842ba44807bae5f6483444ddc28b5d69e53174)