Commit Graph

13257 Commits (b9b2dcc712c4e3a4d732f449c1baa6598d5ac7da)

Author SHA1 Message Date
Eric Banks b9b2dcc712 Mark had mis-named this input callset to the knowledgebase. It's the pilot2 liftover, not pilot1. 2014-03-17 12:14:29 -04:00
Eric Banks 34c697bf12 Merge pull request #554 from broadinstitute/bh_SOR_new_annotation
Bh sor new annotation
2014-03-17 10:58:13 -04:00
Eric Banks bb4a7bf87e Merge pull request #562 from broadinstitute/ldg_newCGPdocs
Added documentation category for CalculateGenotypePosteriors
2014-03-17 10:39:20 -04:00
Laura Gauthier 40c13d446a Added documentation category for CalculateGenotypePosteriors 2014-03-17 10:36:19 -04:00
Ryan Poplin c1b4390691 Merge pull request #559 from broadinstitute/vrr_assembly_graph_edge_info_revise
Improved criteria to select best haplotypes out from the assembly graph.
2014-03-17 09:27:19 -04:00
Valentin Ruano-Rubio 2e964c59b4 Improved criteria to select best haplotypes out from the assembly graph.
Currently the best haplotypes are those that accumulate the largest ABSOLUTE edge *multiplicity* sum across their path in the assembly graph.

The edge *mulitplicity* is equal to the number of reads that expand through that edge, i.e. have a kmer that uniquely map to some vertex up-stream from the edge and the following base calls extend across that edge to vertices downstream from it.

Despite that it is obvious that higher multiplicties correlated with haplotype probability this criterion fails short in some regards of which the most relevant is:

As it is evaluated in condensed seq-graph (as supposed to uncompressed read-threading-graphs) it is bias to haplotypes that have more short-sequence vetices
  ( -> ATGC -> CA -> has worse score than -> A -> T -> G -> C -> C -> A ->). This is partly result of how we modify the edge multiplicities when we merge vertices from a linear chain.

This pull-request addresses the problem by changing to a new scoring schema based in likelihood estimates:

Each haplotype's likelihood can be calculated as the multiplication of the likelihood of "taking" its edges in the assembly graph. The likelihood of "taking" an edge in the assembly
graph is calculated as its multiplicity divide by the sum of multiplicity of edges that share the same source vertex.

This pull-request addresses the following stories:

https://www.pivotaltracker.com/story/show/66691418
https://www.pivotaltracker.com/story/show/64319760

Change Summary:

1. Change to the new scoring schema.
2. Added a graph DOT printing code to KBestHaplotypeFinder in order to diagnose scoring.
3. Graph transformation have been modified in order to generate no 0-multiplicity edges. (Nevertheless the schema above should work with 0 edges assuming that they are in fact 0.5)
2014-03-14 18:37:01 -04:00
Bertrand Haas 82108d110f New abstract class StrandBiasTest() with old sub-class FisherStrand() and new sub-class StrandOddsRatio(). Latter is test based on symmetric odds ratio more appropriate than Fisher exact test when number of samples is large.
https://www.pivotaltracker.com/story/show/66087886
2014-03-14 18:33:21 -04:00
Eric Banks a0c252f084 Merge pull request #560 from broadinstitute/dr_fix_phone_home_packaging_error
Unconditionally include all of commons-httpclient in the GATK/Queue jars
2014-03-14 11:54:30 -04:00
David Roazen 1324120c17 Unconditionally include all of commons-httpclient in the GATK/Queue jars
The maven shade plugin was eliminating a necessary class (IgnoreCookiesSpec)
when packaging the GATK/Queue. Work around this by telling maven to
always package all of commons-httpclient.
2014-03-14 10:50:15 -04:00
Eric Banks 7c7ff90266 Merge pull request #558 from broadinstitute/rp_vqsr_nondeterminism_fix
Fix for non-determinism in the VQSR with very large data sets
2014-03-12 14:35:51 -04:00
Eric Banks d3a4b57491 Merge pull request #556 from broadinstitute/eb_use_iupac_in_FARM
Added new functionality to the FastaAlternateReferenceMaker to have it o...
2014-03-12 14:33:06 -04:00
Eric Banks ffaf92f871 Added new functionality to the FastaAlternateReferenceMaker to have it output IUPAC codes for het sites.
Enable it with the new --useIUPAC argument.
Added both unit and integration tests for the new functionality - and fixed up the
exising tests once I was in there.
2014-03-12 14:31:57 -04:00
Ryan Poplin 907d1d6160 Fix for non-determinism in the VQSR with very large data sets 2014-03-12 10:25:12 -04:00
ldgauthier 4e74e77e74 Merge pull request #555 from broadinstitute/eb_add_option_to_CGVCFs_for_all_sites_GVCF
Added an option to CombineGVCFs to create basepair resolution gVCFs from...
2014-03-12 10:01:18 -04:00
Eric Banks 1f6b761c7d Merge pull request #557 from broadinstitute/dr_add_warning_for_intel_pairhmm
Emit a warning whenever the VectorLoglessPairHMM is used
2014-03-12 09:59:32 -04:00
David Roazen c67ced5f3b Emit a warning whenever the VectorLoglessPairHMM is used 2014-03-12 09:55:35 -04:00
Eric Banks d697e0144f Added an option to CombineGVCFs to create basepair resolution gVCFs from banded ones.
Use the --convertToBasePairResolution argument to enable this functionality.
2014-03-12 01:32:51 -04:00
Ryan Poplin 35aa24ab54 Merge pull request #552 from broadinstitute/rp_HaplotypeCaller_1kg_consensus_mode
Added the consensus mode used for the 1000 Genomes Project to the Haplot...
2014-03-11 11:17:30 -04:00
Ryan Poplin 34d11fe40c Added the consensus mode used for the 1000 Genomes Project to the HaplotypeCaller.
-- All the provided alleles are added to the assembly graph as potential haplotypes but they aren't forcibly genotyped like in GGA mode.
-- Added integration test for this mode
2014-03-11 09:56:35 -04:00
droazen 8b53567dc7 Merge pull request #553 from broadinstitute/dr_rename_pipeline_tests
Rename existing PipelineTests to QueueTests to prepare for upcoming push of new pipeline tests
2014-03-10 21:36:45 -04:00
David Roazen 78562c14bb Rename existing PipelineTests to QueueTests to prepare for upcoming push of new pipeline tests
-These tests are really integration tests for Queue rather than generalized
 pipeline tests, so it makes sense to call them QueueTests.

-Rename test classes and maven build targets, and update shell scripts
 to reflect new naming.
2014-03-10 21:24:03 -04:00
droazen d0501d0083 Merge pull request #547 from broadinstitute/intel_pairhmm
Experimental native PairHMM implementation from Intel. Off by default.
2014-03-10 15:27:20 -04:00
David Roazen 7c34f05082 Merge remote-tracking branch 'origin/master' into intel 2014-03-10 14:07:36 -04:00
David Roazen 5a6aa54673 Revert "Update HaplotypeCaller and VariantAnnotator test MD5s"
This reverts commit 7faa44d576b06d7aef29562e82590a7855f216f4.
2014-03-10 14:06:51 -04:00
David Roazen e7d6db033b Revert "Revert "Change default HaplotypeCaller PairHMM implementation back to LOGLESS_CACHING""
This reverts commit c8a34749e631b92214a57bba162c6e0d849425f1.
2014-03-10 14:05:51 -04:00
amilev 90ed6bd4ab Merge pull request #551 from broadinstitute/ami-splitVCF
add an option to randomly (uniformly) split a vcf file/s to more than 2 ...
2014-03-10 11:02:38 -04:00
Ami Levy-Moonshine 2a6f05a8a1 add an option to randomly (uniformly) split a vcf file/s to more than 2 files.
The old code that allow split to two files (given in the input) is kept to allow uneven splitting between files.
2014-03-10 10:58:44 -04:00
David Roazen f070583f29 Update HaplotypeCaller and VariantAnnotator test MD5s
There are a few innocuous test failures on this branch --
updating MD5s after reviewing the differences in output
2014-03-07 10:54:27 -05:00
Karthik Gururaj 6e98e9e589 Removed g_haplotype* global variables in native code so that it works
with multi-threading in Java.
Modified VectorLoglessPairHMM.java so that jniInitializeRegion and
jniFinalizeRegion are empty
2014-03-06 22:08:35 -08:00
Karthik Gururaj 3999677c93 Changed to delete[] where applicable 2014-03-06 12:23:08 -08:00
amilev f706bcb1c0 Merge pull request #550 from broadinstitute/ami-updateHC_scipt
update HC scala script to allow the RNA-seq mode parameters of HC
2014-03-06 14:48:22 -05:00
Ami Levy-Moonshine 4de989ebf1 update HC scala script to allow the RNA-seq mode parameters of HC 2014-03-06 14:45:51 -05:00
Karthik Gururaj a29777765d Binary library 2014-03-06 11:14:46 -08:00
Karthik Gururaj 7844d956ac Modified delete to delete[] 2014-03-06 11:13:34 -08:00
Karthik Gururaj 27e640d640 Modified SSE4.1 and 4.2 checks with _may_i_use_cpu_feature() 2014-03-06 08:51:11 -08:00
Karthik Gururaj 6166d08183 Merge branch 'intel_pairhmm' of /data/broad/gsa-unstable into intel_pairhmm 2014-03-06 08:38:12 -08:00
Karthik Gururaj 37f107cb3a Using Mustafa's function _may_i_use_cpu_feature() for AVX check 2014-03-06 08:37:48 -08:00
Eric Banks c0093be06a Merge pull request #549 from broadinstitute/eb_remove_DownsampleReadsQC
Remove this bad, bad walker.  It doesn't even belong in private.
2014-03-06 11:03:20 -05:00
Eric Banks 084750b807 Remove this bad, bad walker. It doesn't even belong in private. 2014-03-06 11:01:41 -05:00
David Roazen 3f3df90412 Revert "Change default HaplotypeCaller PairHMM implementation back to LOGLESS_CACHING"
This reverts commit cef03f089fb3f131f3a77664b71feaec51a74cc8.
2014-03-06 10:15:35 -05:00
David Roazen 7f1973193c Fix linking bug in copy_release.sh script
This script was not correctly updating the "current" symlink
in the release directories due to a missing ln argument.
2014-03-06 07:26:17 -05:00
David Roazen 9df59bd8cc Update pom versions to mark the start of GATK 3.1 development 2014-03-06 00:05:58 -05:00
David Roazen 8fedaf541c Merge remote-tracking branch 'unstable/master' 2014-03-05 23:41:48 -05:00
David Roazen 34edcb8ddf Update pom versions for the 3.0 release 2014-03-05 23:37:21 -05:00
David Roazen a9ddfdb7c0 Remove external-example module from public pom.xml
This module was causing failures during the release
packaging tests. After discussing with Khalid, we've
decided to disable it for now until a fix can be
developed.
2014-03-05 20:25:38 -05:00
David Roazen 53895e15cd Change default HaplotypeCaller PairHMM implementation back to LOGLESS_CACHING 2014-03-05 19:26:37 -05:00
amilev 2defeba445 Merge pull request #548 from broadinstitute/ami-update-FullProcessingPipeline
upadte the script (for RNA pipeline and comment out RR from it)
2014-03-05 15:49:34 -05:00
Eric Banks d3de6413c9 Move warnings to debug logging status because they will definitely scare users 2014-03-05 15:05:21 -05:00
David Roazen 5b55380bdf Revert changes to archived build.xml 2014-03-05 14:40:57 -05:00
Ami Levy-Moonshine 60ebfbe543 upadte the script (for RNA pipeline and comment out RR from it) 2014-03-05 14:26:47 -05:00