Commit Graph

13300 Commits (313d20d849c6dee9ced8374eea8d3164fe50ea88)

Author SHA1 Message Date
jmthibault79 313d20d849 Merge pull request #581 from broadinstitute/jt_picard
Rev Picard because it's needed by another branch
2014-03-30 11:05:50 -04:00
Joel Thibault 2049eb1658 Rev Picard 1.110.1763
- SamPairUtils migrated in Picard r1737
- Revert IndelRealigner changes made in commit 4f4b85
-- Those changes were based on Picard revision 1722 to net/sf/picard/sam/SamPairUtil.java
-- Picard revision 1723 reverts these changes, so we also revert to match
2014-03-30 09:33:57 -04:00
MauricioCarneiro 5abb7ea2db Merge pull request #579 from broadinstitute/rp_fix_DP_annotation
Fix for dropping of reference sample depth in the DP annotation.
2014-03-24 15:49:20 -04:00
Ryan Poplin 6566dd6ca9 Fix for dropping of reference sample depth in the DP annotation.
-- In the case of hierarchical merge we can't assume that we have only one genotype.
-- Removed use of deprecated VC annotation access functions.
2014-03-24 14:01:50 -04:00
Ryan Poplin c61a791914 Merge pull request #578 from broadinstitute/eb_trivial_fix_to_IR
Fix for reads that are all insertions (e.g. 50I) and causing the IndelRe...
2014-03-21 15:24:58 -04:00
Ryan Poplin b8581d7d3a Merge pull request #576 from broadinstitute/rp_fix_AssessNA12878_dropping_contigs
Bug fix in AssessNA12878 when working with more than one contig.
2014-03-21 15:23:59 -04:00
Eric Banks 32a96e3ab3 Fix for reads that are all insertions (e.g. 50I) and causing the IndelRealigner to error out. 2014-03-21 15:01:34 -04:00
Ryan Poplin dd1b0a48db Bug fix in AssessNA12878 when working with more than one contig.
-- SmartSiteIterator needs to know to span across Chunks when iterating by polling the Chunk list
-- Added KB test to test for this case
-- Removed the maxSites argument in ExtractConsensusSites because it is counterintuitive and not useful.
2014-03-21 14:52:10 -04:00
Ryan Poplin 69eaf7c82d Merge pull request #577 from broadinstitute/eb_minor_fixes_for_fragment_utils
Fixed docs for method and fixed the edge case optimization to properly u...
2014-03-21 14:01:44 -04:00
Ryan Poplin ce39fcd8a3 Merge pull request #575 from broadinstitute/eb_various_fixes_for_gvcfs
Eb various fixes for gvcfs
2014-03-21 09:47:08 -04:00
Eric Banks 0d82a70633 Fixed docs for method and fixed the edge case optimization to properly use equals() on Integers.
Shouldn't affect actual results at all.
2014-03-20 15:55:09 -04:00
Eric Banks 7c8ce3cd6a Several improvements to GenotypeGVCFs: --includeNonVariantSites now actually works and we propagate AD to hom ref samples 2014-03-20 00:35:54 -04:00
Eric Banks 824983af1d Enable CombineGVCFs to process gVCFs that were created with basepair resolution. 2014-03-19 19:23:05 -04:00
Eric Banks 3b1c337401 Have CombineVariants throw a UserError when trying to combine GVCFs from the HaplotypeCaller.
Was previously throwing an IllegalArgumentException (in the wrong place in the code).
Error message tells users to use CombineGVCFs.
2014-03-19 19:11:40 -04:00
Ryan Poplin 7117bebb5e Merge pull request #572 from broadinstitute/rp_fix_manual_reviews_fix
Forgot to change the padding ref base as well.
2014-03-19 13:58:58 -04:00
Ryan Poplin a3aa68e626 Forgot to change the padding ref base as well. 2014-03-19 13:58:21 -04:00
Ryan Poplin c737b8ed1e Merge pull request #571 from broadinstitute/rp_fix_manual_reviews
Fixing the reference base in one of the manual review files.
2014-03-19 13:49:51 -04:00
Ryan Poplin 523fd40a07 Fixing the reference base in one of the manual review files. 2014-03-19 11:08:06 -04:00
Ryan Poplin 3f326b14be Merge pull request #570 from broadinstitute/rp_assessNA12878_arg_docs_fix
Small argument docs fix in AssessNA12878.
2014-03-19 09:01:23 -04:00
droazen 7b38019199 Merge pull request #569 from broadinstitute/vrr_speedup_integration_test
Reduce runtime of very long integration test
2014-03-18 23:39:16 -04:00
Valentin Ruano-Rubio 905b6066b2 Reduce runtime of very long integration test 2014-03-18 21:48:13 -04:00
Eric Banks 14eb0a8a30 Merge pull request #566 from jsilter/master
Improvements to na12878kb
2014-03-18 15:16:20 -04:00
droazen cec4ff3a2a Merge pull request #568 from broadinstitute/dr_fix_UtilsUnitTest
Fix typo in UtilsUnitTest data provider name
2014-03-18 14:31:14 -04:00
David Roazen e549f4a9d2 Fix typo in UtilsUnitTest data provider name
This is currently my leading suspect for the cause of the
intermittent NoSuchElementException errors on master, since
the maven surefire plugin seems unable to handle errors in
TestNG DataProviders without blowing up.
2014-03-18 11:52:29 -04:00
Ryan Poplin a02383fc6a Small argument docs fix in AssessNA12878. 2014-03-18 10:07:14 -04:00
David Roazen 4ba72d43cf Re-enable GATKRunReportUnitTest
This test is not, as I had initially thought, the cause of the
maven errors. Our master branch is failing intermittently
regardless of whether this test is enabled or disabled.

This reverts commit 45fc9ff515eec8d676b64a04fb34fb357492ff84.
2014-03-18 09:53:41 -04:00
David Roazen 3cd8158bed Merged bug fix from Stable into Unstable 2014-03-18 03:01:28 -04:00
David Roazen cfc45fdc0b Disable GATKRunReportUnitTest
These tests pass individually and as part of complete test suite runs,
but cause an intermittent NoSuchElementException in maven when the
unit tests are run on their own. Disabling these tests until the
cause of this can be identified.
2014-03-18 02:57:19 -04:00
David Roazen afa6abe554 Temporarily disable GATKRunReportUnitTest in unstable while maven issues are worked out
This test passes when run individually, as part of the commit tests, or as
part of the package tests. However, when running the unit tests in isolation
it causes maven/surefire to throw a NoSuchElementException.

This is clearly a maven/surefire bug or configuration issue. I will re-enable
this test on a branch as Khalid and I try to work through it.
2014-03-18 01:28:28 -04:00
David Roazen 2d8653f493 Update pom versions to mark the start of GATK 3.2 development 2014-03-18 01:18:59 -04:00
David Roazen 72492bb875 Merge remote-tracking branch 'unstable/master' 2014-03-18 01:11:23 -04:00
David Roazen a6a41c777c Update pom versions for 3.1 2014-03-18 01:09:29 -04:00
droazen 975b3f321e Merge pull request #567 from broadinstitute/dr_public_GATKRunReport_tests
Move GATKRunReport tests from private to public
2014-03-17 18:33:26 -04:00
David Roazen d5e38ec39b Move GATKRunReport tests from private to public
-Hide AWS downloader credentials in a private properties file
-Remove references to private ActiveRegion walker

Allows phone home functionality to be tested at release time
when we are running tests on the release jar.
2014-03-17 18:29:40 -04:00
droazen 6b3320f067 Merge pull request #561 from broadinstitute/ks_package_classpath
Updated package-tests classpath, and allowing javac -cp <package>.jar.
2014-03-17 17:38:24 -04:00
Jacob Silterra 0715a9fd7d Make sure to follow HTTP redirects when loading dbSpec 2014-03-17 16:31:14 -04:00
Jacob Silterra cfc4f708dc Load dbSpec into string before parsing it as JSON
This is to make the causes of errors more obvious; if there is an error parsing the json string the entire received message is included in the error message
2014-03-17 16:30:26 -04:00
Eric Banks 2e34ff7692 Merge pull request #563 from broadinstitute/aw_refactor_tribble
GATK changes to conform to Tribble refactoring as part improving Tabix s...
2014-03-17 13:35:46 -04:00
Eric Banks 2115195d18 Merge pull request #565 from broadinstitute/eb_remove_one_more_reference_to_rr
Remove unused and unnecessary argument
2014-03-17 12:29:26 -04:00
Eric Banks dabdd0a0fd Remove unused and unnecessary argument 2014-03-17 12:28:27 -04:00
Eric Banks 759c4c8c5a Merge pull request #564 from broadinstitute/eb_rename_truth_set
Mark had mis-named this input callset to the knowledgebase.  It's the pi...
2014-03-17 12:15:57 -04:00
Eric Banks b9b2dcc712 Mark had mis-named this input callset to the knowledgebase. It's the pilot2 liftover, not pilot1. 2014-03-17 12:14:29 -04:00
Alec Wysoker 0369f93b24 GATK changes to conform to Tribble refactoring as part improving Tabix support in Tribble (among other things).
1. Enable on-the-fly indexing for vcf.gz.
2. Handle on-the-fly indexing where file to be indexed is not a regular file, thus index should not be created.
3. Add method setProgressLogger to all SAMFileWriter implementations.
4. Revved picard to 1.109.1722
5. IndelRealigner md5s change because the MC tag is added to records now.

Fixed up and signed off by ebanks.
2014-03-17 11:56:22 -04:00
Eric Banks 34c697bf12 Merge pull request #554 from broadinstitute/bh_SOR_new_annotation
Bh sor new annotation
2014-03-17 10:58:13 -04:00
Eric Banks bb4a7bf87e Merge pull request #562 from broadinstitute/ldg_newCGPdocs
Added documentation category for CalculateGenotypePosteriors
2014-03-17 10:39:20 -04:00
Laura Gauthier 40c13d446a Added documentation category for CalculateGenotypePosteriors 2014-03-17 10:36:19 -04:00
Ryan Poplin c1b4390691 Merge pull request #559 from broadinstitute/vrr_assembly_graph_edge_info_revise
Improved criteria to select best haplotypes out from the assembly graph.
2014-03-17 09:27:19 -04:00
Khalid Shakir 639247ab48 Updated package-tests classpath, and allowing javac -cp <package>.jar.
Package tests now hard coding just the gatk-framework tests jar, to include ONLY BaseTest, until the exclusions may be debugged.
Removing cofoja's annotation service from the package jars, to allow javac -cp <package>.jar.
2014-03-17 05:47:59 -04:00
Valentin Ruano-Rubio 2e964c59b4 Improved criteria to select best haplotypes out from the assembly graph.
Currently the best haplotypes are those that accumulate the largest ABSOLUTE edge *multiplicity* sum across their path in the assembly graph.

The edge *mulitplicity* is equal to the number of reads that expand through that edge, i.e. have a kmer that uniquely map to some vertex up-stream from the edge and the following base calls extend across that edge to vertices downstream from it.

Despite that it is obvious that higher multiplicties correlated with haplotype probability this criterion fails short in some regards of which the most relevant is:

As it is evaluated in condensed seq-graph (as supposed to uncompressed read-threading-graphs) it is bias to haplotypes that have more short-sequence vetices
  ( -> ATGC -> CA -> has worse score than -> A -> T -> G -> C -> C -> A ->). This is partly result of how we modify the edge multiplicities when we merge vertices from a linear chain.

This pull-request addresses the problem by changing to a new scoring schema based in likelihood estimates:

Each haplotype's likelihood can be calculated as the multiplication of the likelihood of "taking" its edges in the assembly graph. The likelihood of "taking" an edge in the assembly
graph is calculated as its multiplicity divide by the sum of multiplicity of edges that share the same source vertex.

This pull-request addresses the following stories:

https://www.pivotaltracker.com/story/show/66691418
https://www.pivotaltracker.com/story/show/64319760

Change Summary:

1. Change to the new scoring schema.
2. Added a graph DOT printing code to KBestHaplotypeFinder in order to diagnose scoring.
3. Graph transformation have been modified in order to generate no 0-multiplicity edges. (Nevertheless the schema above should work with 0 edges assuming that they are in fact 0.5)
2014-03-14 18:37:01 -04:00
Bertrand Haas 82108d110f New abstract class StrandBiasTest() with old sub-class FisherStrand() and new sub-class StrandOddsRatio(). Latter is test based on symmetric odds ratio more appropriate than Fisher exact test when number of samples is large.
https://www.pivotaltracker.com/story/show/66087886
2014-03-14 18:33:21 -04:00