Commit Graph

12893 Commits (d1febb89c8921453480dcf6b323038db87d2fb7b)

Author SHA1 Message Date
Mauricio Carneiro d1febb89c8 Better documentation for ReadClippingStats walker
* add overall walker GATKDocs
* add explanation for skip parameter and make it advanced
* reverse the logic on exculding unmapped reads for clarity
* fix read length  calculation to no longer include indels

ps: I am not sure how useful this walker is (I didn't write it) but the skip logic is poor and
calculates the entire statistic for the reads it is eventually going to skip. This would be an easy
fix, but only worth our time if people actually use this.
2014-01-01 14:26:26 -05:00
Eric Banks 9355598129 Merge pull request #458 from broadinstitute/eb_dont_fail_when_using_incompatible_annotation
Don't fail in annotations if the wrong tools are calling them, just silently skip them.
2013-12-31 21:22:26 -08:00
Eric Banks 050ca8ae09 Merge pull request #457 from broadinstitute/eb_rev_variant_for_doc_updates
Updating variant jar.
2013-12-31 20:49:20 -08:00
Eric Banks 9665f75ad4 Don't fail in annotations if the wrong tools are calling them, just silently skip them.
This is important for cases when users want to use annotation groups (like all experimental annotations).
2013-12-31 23:45:21 -05:00
Eric Banks f82a7c3f4c Updating variant jar.
The update contains:
1. documentation changes for VariantContext and Allele (which used to discuss the now obsolete null allele)
2. better error messages for VCFs containing complex rearrangements with breakends
3. instead of failing badly on format field lists with '.'s, just ignore them
Also, there is a trivial change to use a more efficient method to remove a bunch of attributes from a VC.

Delivers PT#s 59675378, 59496612, and 60524016.
2013-12-31 22:48:29 -05:00
Eric Banks 5a1564d1f2 Merge pull request #456 from broadinstitute/eb_unify_hc_combination_steps
Created a new walker to do the full combination of N gVCFs from the HC single-sample ref calc pipeline.
2013-12-31 18:57:27 -08:00
Eric Banks 83e09b1f64 Created a new walker to do the full combination of N gVCFs from the HC single-sample ref calc pipeline.
Basically, it does 3 things (as opposed to having to call into 3 separate walkers):
1. merge the records at any given position into a single one with all alleles and appropriate PLs
2. re-genotype the record using the exact AF calculation model
3. re-annotate the record using the VariantAnnotatorEngine

In the course of this work it became clear that we couldn't just use the simpleMerge() method used
by CombineVariants; combining HC-based gVCFs is really a complicated process.  So I added a new
utility method to handle this merging and pulled any related code out of CombineVariants.  I tried
to clean up a lot of that code, but ultimately that's out of the scope of this project.

Added unit tests for correctness testing.
Integration tests cannot be used yet because the HC doesn't output correct gVCFs.
2013-12-31 12:07:56 -05:00
Eric Banks 9394af1230 Merge pull request #454 from jsilter/master
Make na12878kb functionality more transparent to users
2013-12-19 08:47:24 -08:00
Eric Banks 26a7082018 Merge pull request #455 from broadinstitute/dr_add_min_max_argument_values
Add ability to specify min/max required/recommended values for numeric arguments in the @Argument annotation
2013-12-18 20:40:06 -08:00
David Roazen 4a79831adc Add ability to specify min/max required/recommended values for numeric arguments in the @Argument annotation
-You can now add "minValue", "maxValue", "minRecommendedValue", and "maxRecommendedValue" attributes
 to @Argument annotations for command-line arguments

-"minValue" and "maxValue" specify hard limits that generate an exception if violated

-"minRecommendedValue" and "maxRecommendedValue" specify soft limits that generate a warning if violated

-Works only for numeric arguments (int, double, etc.) with @Argument annotations

-Only considers values actually specified by the user on the command line, not default values
 assigned in the code

As requested by Geraldine
2013-12-18 18:09:08 -05:00
Jacob Silterra 0c7ea2d823 Add label and specVersion fields to MongoDBManager.Locator
Add "BLANK" option for DBType

Want to get away from adding extensions to dbname
2013-12-18 17:21:53 -05:00
Eric Banks d32c900018 Merge pull request #453 from broadinstitute/eb_rev_variant_for_validation_bug
Updated the variant jar to grab a bug fix that I made to it
2013-12-18 11:13:11 -08:00
Eric Banks 265cb3eb5b Updated the variant jar to grab a bug fix that I made to it 2013-12-17 11:52:34 -05:00
Valentin Ruano Rubio 5ed627d448 Merge pull request #450 from broadinstitute/vrr_graphLikelihoods_fix250PCRFree
Fixed issue > 0 log likelihoods using GraphBased likelihood engine reported by Mauricio
2013-12-13 09:22:46 -08:00
Valentin Ruano-Rubio 5db520c6fa Fixed issue > 0 log likelihoods using GraphBased likelihood engine reported by Mauricio
Added some integration test to check on the fix
2013-12-13 11:19:57 -05:00
Eric Banks 3e8feff429 Merge pull request #451 from broadinstitute/jt_mongo_migration
Move the SelectVariantsFromMongo helper classes to archive
2013-12-13 07:40:35 -08:00
Joel Thibault 58217a5c4b Move the SelectVariantsFromMongo helper classes to archive 2013-12-12 18:50:10 -05:00
Bertrand d6169a28cd Merge pull request #448 from broadinstitute/eb_add_stuff_to_the_bundle
Eb add stuff to the bundle
2013-12-12 07:31:59 -08:00
Eric Banks 400e7c1404 Fixed bug in the filtering of lifted over variants where a deletion at the end of a contig could cause it to error out.
Added a unit test.
2013-12-11 14:07:18 -05:00
Eric Banks ab33db625f Merge pull request #449 from broadinstitute/eb_move_calc_posteriors_to_protected
Moved CalculatePosteriors from private to protected, in preparation for 3.0
2013-12-07 22:18:46 -08:00
Eric Banks f1970b923e Moved CalculatePosteriors from private to protected, in preparation for 3.0.
Renamed it CalculateGenotypePosteriors.
Also, moved the utility code to a proper utility class instead of where Chris left it.
No actual code modifications made in this commit.
2013-12-08 00:08:34 -05:00
Eric Banks 418fbdfbab Added HC trio calls and NA12878 KB snapshot to resource bundle.
Also, don't touch the current link until the resources are finished being produced.
2013-12-07 22:08:34 -05:00
David Roazen 932cd3ada7 Fix 3rd-party library dependency issues in the HC/PairHMM tests
In general, test classes cannot use 3rd-party libraries that are not
also dependencies of the GATK proper without causing problems when,
at release time, we test that the GATK jar has been packaged correctly
with all required dependencies.

If a test class needs to use a 3rd-party library that is not a GATK
dependency, write wrapper methods in the GATK utils/* classes, and
invoke those wrapper methods from the test class.
2013-12-06 13:16:55 -05:00
Eric Banks 70e2d21e12 Merge remote-tracking branch 'unstable/master' 2013-12-06 11:45:12 -05:00
Eric Banks 7ed5344f8b Merge pull request #447 from broadinstitute/dr_segregate_kb_tests
Separate tests that access the knowledge base from other tests
2013-12-06 08:43:07 -08:00
David Roazen 10dc038a24 Separate tests that access the knowledge base from other tests
The tests that access the knowledge base are interfering with the basic
ability to run the unit/integration test suite to completion -- these
few tests often take hours to complete.

Created a new class of test ("KnowledgeBaseTest") that runs separately
from the unit/integration test suite, with corresponding build target.
A new bamboo plan will be set up to run these tests independently so
that they don't interfere with unit/integration testing.

With this change, plus the recent changes to the parallel test runner,
unit/integration test suite runtime should be back down to ~30 minutes
on average.
2013-12-06 11:31:35 -05:00
Eric Banks 1a0e140ab5 Merge pull request #445 from broadinstitute/dr_rev_picard_for_2.8
Rev picard, sam-jdk, tribble, and variant jars to 1.104.1628
2013-12-05 15:03:27 -08:00
Eric Banks 32cca883fc Merge pull request #444 from broadinstitute/dr_parallel_test_runner_adjustments
Tweak parallel test runner in attempt to decrease spurious failures
2013-12-05 15:02:39 -08:00
David Roazen 47ea3c3b22 Tweak parallel test runner in attempt to decrease spurious failures
-Run with -W 240 to give tests more time to complete and hopefully
 stop jobs from getting killed with TERM_RUNLIMIT

-Switch to /humgen/gsa-hpprojects for test working directories, since
 /broad/hptmp has been unacceptably slow lately

     Time to create test working directory, 12/5/13:
     /broad/hptmp: 19 minutes
     /humgen/gsa-hpprojects: 4 minutes
2013-12-05 13:49:37 -05:00
David Roazen 0e65296efb Rev picard, sam-jdk, tribble, and variant jars to 1.104.1628
-update VariantFiltration to work with new Lazy wrapper around the
 JexlEngine in VariantContextUtils
2013-12-05 12:45:32 -05:00
Eric Banks 6d2fcd2df9 Merge pull request #443 from broadinstitute/eb_better_doc_for_minpruning
Added docs for the minPruning argument in the HC
2013-12-05 08:52:56 -08:00
Eric Banks e022db4690 Added docs for the minPruning argument in the HC 2013-12-05 11:50:56 -05:00
Eric Banks 623aaa0d6f Merge pull request #442 from broadinstitute/gg_fixdoc_deletions
Fixed documentation for -deletions argument in the UAC
2013-12-04 17:37:18 -08:00
Geraldine Van der Auwera 3ab2f4edb2 Fixed documentation for -deletions argument in the UAC 2013-12-04 19:55:24 -05:00
amilev 0d94019bd6 Merge pull request #434 from broadinstitute/mc_dt_gccontent
Add GC Content to DiagnoseTargets
2013-12-04 09:42:26 -08:00
Eric Banks 41a0aecb07 Merge pull request #441 from broadinstitute/jt_gvcf_idx_user_error
Jt gvcf idx user error
2013-12-03 21:54:11 -08:00
Joel Thibault 5fe0531b4d Throw a GVCFIndexException when the user doesn't specify the optimal indexing strategy 2013-12-03 23:12:14 -05:00
Joel Thibault 8571a641bf Add @Advanced to variant_index_type and variant_index_parameter 2013-12-03 23:12:14 -05:00
Mauricio Carneiro 701ede2817 Add GC Content to DiagnoseTargets 2013-12-03 23:04:40 -05:00
droazen 61b50a02b1 Merge pull request #431 from broadinstitute/jt_custom_vcf_idx
Add engine options to override the default VCF/BCF indexing strategy
2013-12-03 19:32:36 -08:00
Joel Thibault fd0a02e52e New VCF engine arguments to specify an alternate IndexCreator
- CatVariants updates to use custom VCF indices
- Scala scripts for VCF index testing
2013-12-03 13:31:02 -05:00
Joel Thibault 42f78bdb3a Add a class-based DataProvider 2013-12-03 13:31:01 -05:00
Joel Thibault cd3ee2ae7e whitespace 2013-12-03 13:31:01 -05:00
Joel Thibault ed6f069191 Rev Picard 1.102.1595 2013-12-03 13:31:01 -05:00
Eric Banks d90b295570 Merge pull request #440 from broadinstitute/eb_selection_should_keep_pls
Eb selection should keep pls
2013-12-03 07:08:01 -08:00
Eric Banks cb2f228f5a Archiving SelectVariantsFromMongo since it has started to diverge from SelectVariants 2013-12-03 09:23:16 -05:00
Eric Banks 6bee6a1b53 Change the behavior of SelectVariants for PL/AD when it encounters a record that has lost one or more alternate alleles.
Previously, we would strip out the PLs and AD values since they were no longer accurate.  However, this is not ideal because
then that information is just lost and 1) users complain on the forum and post it as a bug and 2) it gives us problems in both
the current and future (single sample) calling pipelines because we subset samples/alleles all the time and lose info.

Now the PLs and AD get correctly selected down.

While I was in there I also refactored some related code in subsetDiploidAlleles().  There were no real changes there - I just
broke it out into smaller chunks as per our best practices.

Added unit tests and updated integration tests.
Addressed reviews.
2013-12-03 09:23:03 -05:00
Valentin Ruano Rubio b1073fb17b Merge pull request #439 from broadinstitute/vrr_graphLikelihoods2
Adding Graph-based likelihoods calculation
2013-12-02 18:54:15 -08:00
Valentin Ruano-Rubio 0f99778a59 Adding Graph-based likelihood ratio calculation to HC
To active this feature add '--likelihoodCalculationEngine GraphBased' to the HC command line.

New HC Options (both Advanced and Hidden):
==========================================

  --likelihoodCalculationEngine PairHMM/GraphBased/Random (default PairHMM)

Specifies what engine should be used to generate read vs haplotype likelihoods.

  PairHMM : standard full-PairHMM approach.
  GraphBased : using the assembly graph to accelarate the process.
  Random : generate random likelihoods - used for benchmarking purposes only.

  --heterogeneousKmerSizeResolution COMBO_MIN/COMBO_MAX/MAX_ONLY/MIN_ONLY (default COMBO_MIN)

It idicates how to merge haplotypes produced using different kmerSizes.
Only has effect when used in combination with (--likelihooCalculationEngine GraphBased)

  COMBO_MIN : use the smallest kmerSize with all haplotypes.
  COMBO_MAX : use the larger kmerSize with all haplotypes.
  MIN_ONLY : use the smallest kmerSize with haplotypes assembled using it.
  MAX_ONLY : use the larger kmerSize with haplotypes asembled using it.

Major code changes:
===================

 * Introduce multiple likelihood calculation engines (before there was just one).

 * Assembly results from different kmerSies are now packed together using the AssemblyResultSet class.

 * Added yet another PairHMM implementation with a different API in order to spport
   local PairHMM calculations, (e.g. a segment of the read vs a segment of the haplotype).

Major components:
================

 * FastLoglessPairHMM: New pair-hmm implemtation using some heuristic to speed up partial PairHMM calculations

 * GraphBasedLikelihoodCalculationEngine: delegates onto GraphBasedLikelihoodCalculationEngineInstance the exectution
     of the graph-based likelihood approach.

 * GraphBasedLikelihoodCalculationEngineInstance: one instance per active-region, implements the graph traversals
     to calcualte the likelihoods using the graph as an scafold.

 * HaplotypeGraph: haplotype threading graph where build from the assembly haplotypes. This structure is the one
     used by GraphBasedLikelihoodCalculationEngineInstance to do its work.

 * ReadAnchoring and KmerSequenceGraphMap: contain information as how a read map on the HaplotypeGraph that is
     used by GraphBasedLikelihoodCalcuationEngineInstance to do its work.

Remove mergeCommonChains from HaplotypeGraph creation

Fixed bamboo issues with HaplotypeGraphUnitTest

Fixed probrems with HaplotypeCallerIntegrationTest

Fixed issue with GraphLikelihoodVsLoglessAccuracyIntegrationTest

Fixed ReadThreadingLikelihoodCalculationEngine issues

Moved event-block iteration outside GraphBased*EngineInstance

Removed unecessary parameter from ReadAnchoring constructor.
Fixed test problem

Added a bit more documentation to EventBlockSearchEngine

Fixing some private - protected dependency issues

Further refactoring making GraphBased*Instance and HaplotypeGraph slimmer. Addressed last pull request commit comments

Fixed FastLoglessPairHMM public -> protected dependency

Fixed probrem with HaplotypeGraph unit test

Adding Graph-based likelihood ratio calculation to HC

  To active this feature add '--likelihoodCalculationEngine GraphBased' to the HC command line.

New HC Options (both Advanced and Hidden):
==========================================

  --likelihoodCalculationEngine PairHMM/GraphBased/Random (default PairHMM)

Specifies what engine should be used to generate read vs haplotype likelihoods.

  PairHMM : standard full-PairHMM approach.
  GraphBased : using the assembly graph to accelarate the process.
  Random : generate random likelihoods - used for benchmarking purposes only.

  --heterogeneousKmerSizeResolution COMBO_MIN/COMBO_MAX/MAX_ONLY/MIN_ONLY (default COMBO_MIN)

It idicates how to merge haplotypes produced using different kmerSizes.
Only has effect when used in combination with (--likelihooCalculationEngine GraphBased)

  COMBO_MIN : use the smallest kmerSize with all haplotypes.
  COMBO_MAX : use the larger kmerSize with all haplotypes.
  MIN_ONLY : use the smallest kmerSize with haplotypes assembled using it.
  MAX_ONLY : use the larger kmerSize with haplotypes asembled using it.

Major code changes:
===================

 * Introduce multiple likelihood calculation engines (before there was just one).

 * Assembly results from different kmerSies are now packed together using the AssemblyResultSet class.

 * Added yet another PairHMM implementation with a different API in order to spport
   local PairHMM calculations, (e.g. a segment of the read vs a segment of the haplotype).

Major components:
================

 * FastLoglessPairHMM: New pair-hmm implemtation using some heuristic to speed up partial PairHMM calculations

 * GraphBasedLikelihoodCalculationEngine: delegates onto GraphBasedLikelihoodCalculationEngineInstance the exectution
     of the graph-based likelihood approach.

 * GraphBasedLikelihoodCalculationEngineInstance: one instance per active-region, implements the graph traversals
     to calcualte the likelihoods using the graph as an scafold.

 * HaplotypeGraph: haplotype threading graph where build from the assembly haplotypes. This structure is the one
     used by GraphBasedLikelihoodCalculationEngineInstance to do its work.

 * ReadAnchoring and KmerSequenceGraphMap: contain information as how a read map on the HaplotypeGraph that is
     used by GraphBasedLikelihoodCalcuationEngineInstance to do its work.

Remove mergeCommonChains from HaplotypeGraph creation

Fixed bamboo issues with HaplotypeGraphUnitTest

Fixed probrems with HaplotypeCallerIntegrationTest

Fixed issue with GraphLikelihoodVsLoglessAccuracyIntegrationTest

Fixed ReadThreadingLikelihoodCalculationEngine issues

Moved event-block iteration outside GraphBased*EngineInstance

Removed unecessary parameter from ReadAnchoring constructor.
Fixed test problem

Added a bit more documentation to EventBlockSearchEngine

Fixing some private - protected dependency issues

Further refactoring making GraphBased*Instance and HaplotypeGraph slimmer. Addressed last pull request commit comments

Fixed FastLoglessPairHMM public -> protected dependency

Fixed probrem with HaplotypeGraph unit test
2013-12-02 19:37:19 -05:00
Valentin Ruano-Rubio 00116609e4 Archive addition as a result of the work on adding Graph-based likelihood ratio calculation to HC. 2013-12-02 19:33:14 -05:00