gatk-3.8

Commit Graph

Author	SHA1	Message	Date
Mauricio Carneiro	d1febb89c8	Better documentation for ReadClippingStats walker * add overall walker GATKDocs * add explanation for skip parameter and make it advanced * reverse the logic on exculding unmapped reads for clarity * fix read length calculation to no longer include indels ps: I am not sure how useful this walker is (I didn't write it) but the skip logic is poor and calculates the entire statistic for the reads it is eventually going to skip. This would be an easy fix, but only worth our time if people actually use this.	2014-01-01 14:26:26 -05:00
Eric Banks	9355598129	Merge pull request #458 from broadinstitute/eb_dont_fail_when_using_incompatible_annotation Don't fail in annotations if the wrong tools are calling them, just silently skip them.	2013-12-31 21:22:26 -08:00
Eric Banks	050ca8ae09	Merge pull request #457 from broadinstitute/eb_rev_variant_for_doc_updates Updating variant jar.	2013-12-31 20:49:20 -08:00
Eric Banks	9665f75ad4	Don't fail in annotations if the wrong tools are calling them, just silently skip them. This is important for cases when users want to use annotation groups (like all experimental annotations).	2013-12-31 23:45:21 -05:00
Eric Banks	f82a7c3f4c	Updating variant jar. The update contains: 1. documentation changes for VariantContext and Allele (which used to discuss the now obsolete null allele) 2. better error messages for VCFs containing complex rearrangements with breakends 3. instead of failing badly on format field lists with '.'s, just ignore them Also, there is a trivial change to use a more efficient method to remove a bunch of attributes from a VC. Delivers PT#s 59675378, 59496612, and 60524016.	2013-12-31 22:48:29 -05:00
Eric Banks	5a1564d1f2	Merge pull request #456 from broadinstitute/eb_unify_hc_combination_steps Created a new walker to do the full combination of N gVCFs from the HC single-sample ref calc pipeline.	2013-12-31 18:57:27 -08:00
Eric Banks	83e09b1f64	Created a new walker to do the full combination of N gVCFs from the HC single-sample ref calc pipeline. Basically, it does 3 things (as opposed to having to call into 3 separate walkers): 1. merge the records at any given position into a single one with all alleles and appropriate PLs 2. re-genotype the record using the exact AF calculation model 3. re-annotate the record using the VariantAnnotatorEngine In the course of this work it became clear that we couldn't just use the simpleMerge() method used by CombineVariants; combining HC-based gVCFs is really a complicated process. So I added a new utility method to handle this merging and pulled any related code out of CombineVariants. I tried to clean up a lot of that code, but ultimately that's out of the scope of this project. Added unit tests for correctness testing. Integration tests cannot be used yet because the HC doesn't output correct gVCFs.	2013-12-31 12:07:56 -05:00
Eric Banks	9394af1230	Merge pull request #454 from jsilter/master Make na12878kb functionality more transparent to users	2013-12-19 08:47:24 -08:00
Eric Banks	26a7082018	Merge pull request #455 from broadinstitute/dr_add_min_max_argument_values Add ability to specify min/max required/recommended values for numeric arguments in the @Argument annotation	2013-12-18 20:40:06 -08:00
David Roazen	4a79831adc	Add ability to specify min/max required/recommended values for numeric arguments in the @Argument annotation -You can now add "minValue", "maxValue", "minRecommendedValue", and "maxRecommendedValue" attributes to @Argument annotations for command-line arguments -"minValue" and "maxValue" specify hard limits that generate an exception if violated -"minRecommendedValue" and "maxRecommendedValue" specify soft limits that generate a warning if violated -Works only for numeric arguments (int, double, etc.) with @Argument annotations -Only considers values actually specified by the user on the command line, not default values assigned in the code As requested by Geraldine	2013-12-18 18:09:08 -05:00
Jacob Silterra	0c7ea2d823	Add label and specVersion fields to MongoDBManager.Locator Add "BLANK" option for DBType Want to get away from adding extensions to dbname	2013-12-18 17:21:53 -05:00
Eric Banks	d32c900018	Merge pull request #453 from broadinstitute/eb_rev_variant_for_validation_bug Updated the variant jar to grab a bug fix that I made to it	2013-12-18 11:13:11 -08:00
Eric Banks	265cb3eb5b	Updated the variant jar to grab a bug fix that I made to it	2013-12-17 11:52:34 -05:00
Valentin Ruano Rubio	5ed627d448	Merge pull request #450 from broadinstitute/vrr_graphLikelihoods_fix250PCRFree Fixed issue > 0 log likelihoods using GraphBased likelihood engine reported by Mauricio	2013-12-13 09:22:46 -08:00
Valentin Ruano-Rubio	5db520c6fa	Fixed issue > 0 log likelihoods using GraphBased likelihood engine reported by Mauricio Added some integration test to check on the fix	2013-12-13 11:19:57 -05:00
Eric Banks	3e8feff429	Merge pull request #451 from broadinstitute/jt_mongo_migration Move the SelectVariantsFromMongo helper classes to archive	2013-12-13 07:40:35 -08:00
Joel Thibault	58217a5c4b	Move the SelectVariantsFromMongo helper classes to archive	2013-12-12 18:50:10 -05:00
Bertrand	d6169a28cd	Merge pull request #448 from broadinstitute/eb_add_stuff_to_the_bundle Eb add stuff to the bundle	2013-12-12 07:31:59 -08:00
Eric Banks	400e7c1404	Fixed bug in the filtering of lifted over variants where a deletion at the end of a contig could cause it to error out. Added a unit test.	2013-12-11 14:07:18 -05:00
Eric Banks	ab33db625f	Merge pull request #449 from broadinstitute/eb_move_calc_posteriors_to_protected Moved CalculatePosteriors from private to protected, in preparation for 3.0	2013-12-07 22:18:46 -08:00
Eric Banks	f1970b923e	Moved CalculatePosteriors from private to protected, in preparation for 3.0. Renamed it CalculateGenotypePosteriors. Also, moved the utility code to a proper utility class instead of where Chris left it. No actual code modifications made in this commit.	2013-12-08 00:08:34 -05:00
Eric Banks	418fbdfbab	Added HC trio calls and NA12878 KB snapshot to resource bundle. Also, don't touch the current link until the resources are finished being produced.	2013-12-07 22:08:34 -05:00
David Roazen	932cd3ada7	Fix 3rd-party library dependency issues in the HC/PairHMM tests In general, test classes cannot use 3rd-party libraries that are not also dependencies of the GATK proper without causing problems when, at release time, we test that the GATK jar has been packaged correctly with all required dependencies. If a test class needs to use a 3rd-party library that is not a GATK dependency, write wrapper methods in the GATK utils/* classes, and invoke those wrapper methods from the test class.	2013-12-06 13:16:55 -05:00
Eric Banks	70e2d21e12	Merge remote-tracking branch 'unstable/master'	2013-12-06 11:45:12 -05:00
Eric Banks	7ed5344f8b	Merge pull request #447 from broadinstitute/dr_segregate_kb_tests Separate tests that access the knowledge base from other tests	2013-12-06 08:43:07 -08:00
David Roazen	10dc038a24	Separate tests that access the knowledge base from other tests The tests that access the knowledge base are interfering with the basic ability to run the unit/integration test suite to completion -- these few tests often take hours to complete. Created a new class of test ("KnowledgeBaseTest") that runs separately from the unit/integration test suite, with corresponding build target. A new bamboo plan will be set up to run these tests independently so that they don't interfere with unit/integration testing. With this change, plus the recent changes to the parallel test runner, unit/integration test suite runtime should be back down to ~30 minutes on average.	2013-12-06 11:31:35 -05:00
Eric Banks	1a0e140ab5	Merge pull request #445 from broadinstitute/dr_rev_picard_for_2.8 Rev picard, sam-jdk, tribble, and variant jars to 1.104.1628	2013-12-05 15:03:27 -08:00
Eric Banks	32cca883fc	Merge pull request #444 from broadinstitute/dr_parallel_test_runner_adjustments Tweak parallel test runner in attempt to decrease spurious failures	2013-12-05 15:02:39 -08:00
David Roazen	47ea3c3b22	Tweak parallel test runner in attempt to decrease spurious failures -Run with -W 240 to give tests more time to complete and hopefully stop jobs from getting killed with TERM_RUNLIMIT -Switch to /humgen/gsa-hpprojects for test working directories, since /broad/hptmp has been unacceptably slow lately Time to create test working directory, 12/5/13: /broad/hptmp: 19 minutes /humgen/gsa-hpprojects: 4 minutes	2013-12-05 13:49:37 -05:00
David Roazen	0e65296efb	Rev picard, sam-jdk, tribble, and variant jars to 1.104.1628 -update VariantFiltration to work with new Lazy wrapper around the JexlEngine in VariantContextUtils	2013-12-05 12:45:32 -05:00
Eric Banks	6d2fcd2df9	Merge pull request #443 from broadinstitute/eb_better_doc_for_minpruning Added docs for the minPruning argument in the HC	2013-12-05 08:52:56 -08:00
Eric Banks	e022db4690	Added docs for the minPruning argument in the HC	2013-12-05 11:50:56 -05:00
Eric Banks	623aaa0d6f	Merge pull request #442 from broadinstitute/gg_fixdoc_deletions Fixed documentation for -deletions argument in the UAC	2013-12-04 17:37:18 -08:00
Geraldine Van der Auwera	3ab2f4edb2	Fixed documentation for -deletions argument in the UAC	2013-12-04 19:55:24 -05:00
amilev	0d94019bd6	Merge pull request #434 from broadinstitute/mc_dt_gccontent Add GC Content to DiagnoseTargets	2013-12-04 09:42:26 -08:00
Eric Banks	41a0aecb07	Merge pull request #441 from broadinstitute/jt_gvcf_idx_user_error Jt gvcf idx user error	2013-12-03 21:54:11 -08:00
Joel Thibault	5fe0531b4d	Throw a GVCFIndexException when the user doesn't specify the optimal indexing strategy	2013-12-03 23:12:14 -05:00
Joel Thibault	8571a641bf	Add @Advanced to variant_index_type and variant_index_parameter	2013-12-03 23:12:14 -05:00
Mauricio Carneiro	701ede2817	Add GC Content to DiagnoseTargets	2013-12-03 23:04:40 -05:00
droazen	61b50a02b1	Merge pull request #431 from broadinstitute/jt_custom_vcf_idx Add engine options to override the default VCF/BCF indexing strategy	2013-12-03 19:32:36 -08:00
Joel Thibault	fd0a02e52e	New VCF engine arguments to specify an alternate IndexCreator - CatVariants updates to use custom VCF indices - Scala scripts for VCF index testing	2013-12-03 13:31:02 -05:00
Joel Thibault	42f78bdb3a	Add a class-based DataProvider	2013-12-03 13:31:01 -05:00
Joel Thibault	cd3ee2ae7e	whitespace	2013-12-03 13:31:01 -05:00
Joel Thibault	ed6f069191	Rev Picard 1.102.1595	2013-12-03 13:31:01 -05:00
Eric Banks	d90b295570	Merge pull request #440 from broadinstitute/eb_selection_should_keep_pls Eb selection should keep pls	2013-12-03 07:08:01 -08:00
Eric Banks	cb2f228f5a	Archiving SelectVariantsFromMongo since it has started to diverge from SelectVariants	2013-12-03 09:23:16 -05:00
Eric Banks	6bee6a1b53	Change the behavior of SelectVariants for PL/AD when it encounters a record that has lost one or more alternate alleles. Previously, we would strip out the PLs and AD values since they were no longer accurate. However, this is not ideal because then that information is just lost and 1) users complain on the forum and post it as a bug and 2) it gives us problems in both the current and future (single sample) calling pipelines because we subset samples/alleles all the time and lose info. Now the PLs and AD get correctly selected down. While I was in there I also refactored some related code in subsetDiploidAlleles(). There were no real changes there - I just broke it out into smaller chunks as per our best practices. Added unit tests and updated integration tests. Addressed reviews.	2013-12-03 09:23:03 -05:00
Valentin Ruano Rubio	b1073fb17b	Merge pull request #439 from broadinstitute/vrr_graphLikelihoods2 Adding Graph-based likelihoods calculation	2013-12-02 18:54:15 -08:00
Valentin Ruano-Rubio	0f99778a59	Adding Graph-based likelihood ratio calculation to HC To active this feature add '--likelihoodCalculationEngine GraphBased' to the HC command line. New HC Options (both Advanced and Hidden): ========================================== --likelihoodCalculationEngine PairHMM/GraphBased/Random (default PairHMM) Specifies what engine should be used to generate read vs haplotype likelihoods. PairHMM : standard full-PairHMM approach. GraphBased : using the assembly graph to accelarate the process. Random : generate random likelihoods - used for benchmarking purposes only. --heterogeneousKmerSizeResolution COMBO_MIN/COMBO_MAX/MAX_ONLY/MIN_ONLY (default COMBO_MIN) It idicates how to merge haplotypes produced using different kmerSizes. Only has effect when used in combination with (--likelihooCalculationEngine GraphBased) COMBO_MIN : use the smallest kmerSize with all haplotypes. COMBO_MAX : use the larger kmerSize with all haplotypes. MIN_ONLY : use the smallest kmerSize with haplotypes assembled using it. MAX_ONLY : use the larger kmerSize with haplotypes asembled using it. Major code changes: =================== * Introduce multiple likelihood calculation engines (before there was just one). * Assembly results from different kmerSies are now packed together using the AssemblyResultSet class. * Added yet another PairHMM implementation with a different API in order to spport local PairHMM calculations, (e.g. a segment of the read vs a segment of the haplotype). Major components: ================ * FastLoglessPairHMM: New pair-hmm implemtation using some heuristic to speed up partial PairHMM calculations * GraphBasedLikelihoodCalculationEngine: delegates onto GraphBasedLikelihoodCalculationEngineInstance the exectution of the graph-based likelihood approach. * GraphBasedLikelihoodCalculationEngineInstance: one instance per active-region, implements the graph traversals to calcualte the likelihoods using the graph as an scafold. * HaplotypeGraph: haplotype threading graph where build from the assembly haplotypes. This structure is the one used by GraphBasedLikelihoodCalculationEngineInstance to do its work. * ReadAnchoring and KmerSequenceGraphMap: contain information as how a read map on the HaplotypeGraph that is used by GraphBasedLikelihoodCalcuationEngineInstance to do its work. Remove mergeCommonChains from HaplotypeGraph creation Fixed bamboo issues with HaplotypeGraphUnitTest Fixed probrems with HaplotypeCallerIntegrationTest Fixed issue with GraphLikelihoodVsLoglessAccuracyIntegrationTest Fixed ReadThreadingLikelihoodCalculationEngine issues Moved event-block iteration outside GraphBasedEngineInstance Removed unecessary parameter from ReadAnchoring constructor. Fixed test problem Added a bit more documentation to EventBlockSearchEngine Fixing some private - protected dependency issues Further refactoring making GraphBasedInstance and HaplotypeGraph slimmer. Addressed last pull request commit comments Fixed FastLoglessPairHMM public -> protected dependency Fixed probrem with HaplotypeGraph unit test Adding Graph-based likelihood ratio calculation to HC To active this feature add '--likelihoodCalculationEngine GraphBased' to the HC command line. New HC Options (both Advanced and Hidden): ========================================== --likelihoodCalculationEngine PairHMM/GraphBased/Random (default PairHMM) Specifies what engine should be used to generate read vs haplotype likelihoods. PairHMM : standard full-PairHMM approach. GraphBased : using the assembly graph to accelarate the process. Random : generate random likelihoods - used for benchmarking purposes only. --heterogeneousKmerSizeResolution COMBO_MIN/COMBO_MAX/MAX_ONLY/MIN_ONLY (default COMBO_MIN) It idicates how to merge haplotypes produced using different kmerSizes. Only has effect when used in combination with (--likelihooCalculationEngine GraphBased) COMBO_MIN : use the smallest kmerSize with all haplotypes. COMBO_MAX : use the larger kmerSize with all haplotypes. MIN_ONLY : use the smallest kmerSize with haplotypes assembled using it. MAX_ONLY : use the larger kmerSize with haplotypes asembled using it. Major code changes: =================== * Introduce multiple likelihood calculation engines (before there was just one). * Assembly results from different kmerSies are now packed together using the AssemblyResultSet class. * Added yet another PairHMM implementation with a different API in order to spport local PairHMM calculations, (e.g. a segment of the read vs a segment of the haplotype). Major components: ================ * FastLoglessPairHMM: New pair-hmm implemtation using some heuristic to speed up partial PairHMM calculations * GraphBasedLikelihoodCalculationEngine: delegates onto GraphBasedLikelihoodCalculationEngineInstance the exectution of the graph-based likelihood approach. * GraphBasedLikelihoodCalculationEngineInstance: one instance per active-region, implements the graph traversals to calcualte the likelihoods using the graph as an scafold. * HaplotypeGraph: haplotype threading graph where build from the assembly haplotypes. This structure is the one used by GraphBasedLikelihoodCalculationEngineInstance to do its work. * ReadAnchoring and KmerSequenceGraphMap: contain information as how a read map on the HaplotypeGraph that is used by GraphBasedLikelihoodCalcuationEngineInstance to do its work. Remove mergeCommonChains from HaplotypeGraph creation Fixed bamboo issues with HaplotypeGraphUnitTest Fixed probrems with HaplotypeCallerIntegrationTest Fixed issue with GraphLikelihoodVsLoglessAccuracyIntegrationTest Fixed ReadThreadingLikelihoodCalculationEngine issues Moved event-block iteration outside GraphBasedEngineInstance Removed unecessary parameter from ReadAnchoring constructor. Fixed test problem Added a bit more documentation to EventBlockSearchEngine Fixing some private - protected dependency issues Further refactoring making GraphBasedInstance and HaplotypeGraph slimmer. Addressed last pull request commit comments Fixed FastLoglessPairHMM public -> protected dependency Fixed probrem with HaplotypeGraph unit test	2013-12-02 19:37:19 -05:00
Valentin Ruano-Rubio	00116609e4	Archive addition as a result of the work on adding Graph-based likelihood ratio calculation to HC.	2013-12-02 19:33:14 -05:00

1 2 3 4 5 ...

12893 Commits (d1febb89c8921453480dcf6b323038db87d2fb7b) All Branches Search

12893 Commits (d1febb89c8921453480dcf6b323038db87d2fb7b)

All Branches