Commit Graph

13941 Commits (fe0b5e0fbe1f2e9537244c8bc994192164297eac)

Author SHA1 Message Date
Eric Banks fe0b5e0fbe Handle cases where a given sample has multiple spanning deletions.
When a sample has multiple spanning deletions and we are asked to assign
likelihoods to the spanning deletion allele, we currently choose the first
deletion.  Valentin pointed out that this isn't desired behavior.  I
promised Valentin that I would address this issue, so here it is.

I do not believe that the correct thing to do is to sum the likelihoods
over all spanning deletions (I came up with problematic cases where this
breaks down).

So instead I'm using a simple heuristic approach: using the hom alt PLs, find
the most likely spanning deletion for this position and use its likelihoods.

In the 10K-sample VCF from Monkol there were only 2 cases that this problem
popped up.  In both cases the heuristic approach works well.
2015-06-16 12:20:43 -04:00
Eric Banks 9522be8762 Merge pull request #1016 from broadinstitute/rhl_allele_rep_span_dels
Add spannning deletions allele
2015-06-13 22:12:23 -04:00
Ron Levine dbed660183 Add spannning deletions allele 2015-06-12 16:43:06 -04:00
Geraldine Van der Auwera 456fefa860 Merge pull request #1001 from broadinstitute/jw_clarify_overlaping_contigs
Changed error message for Contigs Out of Order
2015-06-12 15:03:10 -04:00
Joseph White 398dc7a123 Changed error message for Contigs Out of Order
Changed confusing error message for out of order contigs

Updated Exception message.
2015-06-11 21:46:06 -04:00
Geraldine Van der Auwera 2a7f95eddb Merge pull request #1009 from broadinstitute/gg_patch_depthofcoverage_#1002
User (mnw21cam) patch to fix DoC slowdown in 3.4
2015-06-10 11:16:08 -04:00
kcibul aad89cd653 Merge pull request #1005 from broadinstitute/kc_m2_pon
created panel of normals queue creation script and instructions
2015-06-09 11:10:17 -04:00
droazen 5e3f3d69db Merge pull request #1012 from broadinstitute/rhl_build_vec_pairhmm_lib
Built VectorLoglessPairHMM lib with icc with gcc 4.4.7
2015-06-08 15:25:57 -04:00
Geraldine Van der Auwera 95f2899f05 User (mnw21cam) patch to fix DoC slowdown in 3.4 2015-06-05 21:12:46 -04:00
Louis Bergelson 588d6f1180 Merge pull request #1013 from lbergelson/patch-1
fix typo in queue arguments
2015-06-05 19:27:51 -04:00
Louis Bergelson ebdda72c88 fix typo in queue arguments 2015-06-05 17:06:23 -04:00
Ron Levine 40d8fb99a3 Built VectorLoglessPairHMM lib with icc with gcc 4.4.7 2015-06-05 15:38:25 -04:00
droazen 847c832ef9 Merge pull request #999 from broadinstitute/rhl_load_vector_pair_hmm
Fix loading of VectorLoglessPairHMM by rolling back to Intel's lib version
2015-06-04 12:54:59 -04:00
Kristian Cibulskis 5ceb63cc35 created panel of normals queue creation script and instructions
increased runtime java memory, changed default PON for NN to be new ICE PON

updated FP rates, when using new default PON.  SNPs up by ~3%, INDELs down by 40%

updated git hash reference

updated git hash reference
2015-06-02 16:23:10 -04:00
Geraldine Van der Auwera 526f7c0d07 Merge pull request #985 from broadinstitute/sa_refactor_cleansing_hack_negative_zeros_973_depends_on_841
removed in-line conditional (hack) that changed the result from 0.0 to -0.0; see issue #841
2015-05-23 00:02:52 -04:00
Eric Banks 27d3bafcbd Merge pull request #997 from broadinstitute/eb_add_foreign_read_filter
Added a new filter that can be used to remove reads that are too smal…
2015-05-22 14:34:28 -04:00
Eric Banks 8c81e7df95 Added a new filter that can be used to remove reads that are too small and overly clipped. 2015-05-22 14:33:35 -04:00
Ron Levine 3b0cb028e6 Fix loading of VectorLoglessPairHMM by rolling back to Intel's lib version 2015-05-22 14:16:00 -04:00
Geraldine Van der Auwera 7f306bc4b6 Merge pull request #980 from broadinstitute/Sheila_QD_Update
Sheila qd update
2015-05-22 12:04:43 -04:00
Sheila Chandran dac0b8ddfc Added QD calculation 2015-05-22 11:59:10 -04:00
Geraldine Van der Auwera e96e52ee9d Merge pull request #986 from broadinstitute/rhl_select_genotype_filter_status
Site-level selection based on genotype filter status
2015-05-22 09:59:00 -04:00
Ron Levine a6ca97ef14 Site-level selection based on genotype filter status 2015-05-21 11:27:20 -04:00
melonistic 8d25b2ba40 removed in-line conditional (hack) that changed the result from 0.0 to -0.0; see issue #841
removed irrelevant -0 comments as specified in issue #841 but committed in #973
2015-05-16 23:12:09 -04:00
kcibul 28a7ea43ec Merge pull request #982 from broadinstitute/kc_fp_analysis
added "artifact detection mode" for PON creation
2015-05-15 07:45:20 -04:00
Kristian Cibulskis 3b1ee17727 added "artifact detection mode" for PON creation
added "str_contraction" artifact filter (improves specificity, especially in exomes)
refactored out VCF constants and added descriptions

added "artifact detection mode" for PON creation
added "str_contraction" artifact filter (improves specificity, especially in exomes)

added new dream evaulation markdown

added results for SMC 4

fixed up documentation, moved location to /dsde/working/mutect/dream_smc, and checked in scala script

added "artifact detection mode" for PON creation
added "str_contraction" artifact filter (improves specificity, especially in exomes)

fixed bug which would overwrite germline_risk filter errors
updated "how to" documents and records

fixed license text

thinned down FP regression test from 700 sites to 100.  we have better ways (DREAM, NN) to check accuracy of the method and 100 is good enough to catch regressions

why oh why do the MD5-based unit tests produce different results on different machine architectures?  I hate that :/

Thanks to GG, LDG and DR -- test should now produce the same results regardless of machine architecture

disabled downsampling... hopefully in the final attempt to make this work cross architecture!

enforced LOGLESS_CACHING... hopefully in the final final attempt to make this work cross architecture!

refactored out VCF constants and added descriptions
2015-05-15 07:14:33 -04:00
Geraldine Van der Auwera d1a7edd796 Update pom versions to mark the start of GATK 3.5 development 2015-05-15 00:44:54 -04:00
Geraldine Van der Auwera f19618653a Update pom versions for the 3.4 release 2015-05-15 00:40:39 -04:00
Geraldine Van der Auwera 6c195ffcb1 Merge remote-tracking branch 'unstable/master' 2015-05-15 00:38:39 -04:00
Geraldine Van der Auwera 8b20523f5e Merge pull request #979 from broadinstitute/ami-fixASE-bug
solve bug - now work also when the reads does not have mate
2015-05-14 21:09:52 -04:00
Geraldine Van der Auwera dee66b06fa Merge pull request #978 from broadinstitute/dr_rev_htsjdk_and_picard
Rev htsjdk to version 1.132 and picard to version 1.131, and switch to using the versions in maven central
2015-05-14 21:08:19 -04:00
David Roazen caafe84e74 Rev htsjdk to version 1.132 and picard to version 1.131, and switch to using the versions in maven central
-We now pull htsjdk and picard from maven central.

-Updated the GATK codebase as necessary to adapt to changes in the Feature
 interface.

-Since VCFHeader now requires that all header lines have unique keys, uniquified
 the keys of GVCFBlock header lines by including the min/max GQ in the key.
 Updated MD5s accordingly.

-Other MD5s changed as a result of an htsjdk fix to eliminate "-0" in VCF output.
2015-05-14 15:26:23 -04:00
Geraldine Van der Auwera f6b3d8e862 Merge pull request #947 from broadinstitute/rhl_invert_selection
Added --invert_selection flag for variant selection queries
2015-05-13 13:40:32 -04:00
Eric Banks 5652cc6b5a Merge pull request #981 from broadinstitute/eb_no_lone_dels
Fixed a small feature/bug that I introduced with the spanning deletions genotyping
2015-05-13 13:27:16 -04:00
Eric Banks c752b9bca6 Fixed a small feature/bug that I introduced with the spanning deletions genotyping.
In the case where there's a low quality SNP under a spanning deletion in the gvcfs:
if the SNP is not genotyped by GenotypeGVCFs (because it's just noise) we were still
emitting a record with just the symbolic DEL allele (because that allele is high quality).

We no longer do that.
2015-05-13 11:19:40 -04:00
Ami Levy-Moonshine 536d550794 solve bug - now work also when the reads does not have mate
reads with no mate will be counted as valid reads
2015-05-12 17:51:01 -04:00
Ron Levine 4a75d54e65 Added invert and exclude flags for variant selection queries 2015-05-12 15:08:28 -04:00
Geraldine Van der Auwera 7a75f4ae79 Merge pull request #974 from broadinstitute/jw_Var2BinPEDSwap
Correct errant array element swap in FAM file output.
2015-05-12 08:49:16 -04:00
Eric Banks 53a34cea4a Merge pull request #938 from broadinstitute/eb_fix_spanning_deletions_in_genotyping
Added a fix for genotyping positions over spanning deletions.
2015-05-11 23:11:47 -04:00
Joseph White abb6bc6f57 Correct errant array element swap in FAM file output.
dad and mom are swapped; paternal first, then maternal

updated MD5 chksums for test files

remove commented lines
2015-05-11 20:45:50 -04:00
Eric Banks 530e0e5ea6 Added a fix for combining/genotyping positions over spanning deletions.
Previously, if a SNP occurred in sample A at a position that was in the middle of a deletion for sample B,
sample B would be genotyped as homozygous reference there (but it's NOT reference - there's a deletion).
Now, sample B is genotyped as having a symbolic DEL allele.

Minor cleanup added.  Note that I also removed Laura's previous fix for this problem.

Existing integration tests change because I've added a new header line to the VCF being output.
I also added several tests for the new functionality showing:
1. genotyping from separate and already combined gvcfs give the same output
2. genotyping over multiple spanning deletions works
3. combining works too

Existing unit tests also cover this case.
2015-05-11 15:11:16 -04:00
Geraldine Van der Auwera f77cee5171 Merge pull request #948 from broadinstitute/jw_deprecate_merge_var
Jw deprecate mergeVariantsViaLD argument
2015-05-11 09:30:23 -04:00
Joseph White 5be8bc5dfc Deprecate --mergeVariantsViaLD in HC
New unit test for deprecated mergeVariantsViaLD
Update HaplotypeCallerIntegrationTest.java
Delete duplicate testHaplotypeCallerMergeVariantsViaLDException test.
2015-05-08 17:50:25 -04:00
Geraldine Van der Auwera 8a4a4f3fcf Merge pull request #968 from broadinstitute/gg_unexclude_annotations_#966
Un-exclude SD and TRA from HC annotators
2015-05-03 21:25:55 +02:00
Geraldine Van der Auwera 5d8b9a7c20 Moved MQ0 out of HC exclusion and into StandardUGAnnotation 2015-05-03 01:04:49 +02:00
Geraldine Van der Auwera 071d82d1bf Un-exclude SD and TRA from HC annotators; resolves #966
Exclude MQ0BySample
Move SD and TRA to new StandardUGAnnotation interface
There is now annotation interface (StandardUGAnnotation) holding annots that are standard in UG but should't be used as they are now with HC. This allows us to not have to exclude these annotations explicitly in HC, but still be able to use them for development purposes.
2015-05-03 00:45:53 +02:00
Geraldine Van der Auwera e49f6dfd0f Merge pull request #970 from broadinstitute/gg_minor_docfixes
Fairly minor if plentiful fixes to various gatkdocs. Merging this without formal review since all tests pass, the gatkdocs build, and no one really wants to review corrections to grammar, typos and layout for 120+ documents. Review will be done by users in production ;-)
2015-05-03 00:36:12 +02:00
Geraldine Van der Auwera 919c3eaa2e Numerous doc fixes; mostly formatting and clarifications 2015-05-03 00:28:46 +02:00
Geraldine Van der Auwera fddc5331e1 Merge pull request #965 from broadinstitute/gg_nsubtil_clamp_hmm_fix
Clamp the HMM window starting coordinate to 1 instead of 0
2015-05-01 22:18:20 +02:00
ldgauthier 4b55118822 Merge pull request #967 from broadinstitute/rhl_va_allele_trim
More allele trimming for VariantAnnotator
2015-04-30 09:48:32 -04:00
Ron Levine 9ff827c83a More allele trimming for VariantAnnotator 2015-04-29 21:11:49 -04:00