gatk-3.8

Commit Graph

Author	SHA1	Message	Date
John Wallace	8fc631b7ae	Fix for mis-sorted VCF files in CatVariants When using CatVariants, VCF files were being sorted solely on the base pair position of the first record, ignoring the chromosome. This can become problematic when merging files from different chromosomes, espeically if you have multiple VCFs per chromosome. As an example, assume the following 3 lines are all in separate files: 1 10 1 100 2 20 The merged VCF from CatVariants (without -assumeSorted) would read: 1 10 2 20 1 100 This has the potential to break tools that expect chromosomes to be contiguous within a VCF file. This commit changes the comparator from one of Pair<Integer, File> to one of Pair<VariantContext, File>. We construct a VariantContextComparator from the provided reference, which will sort the first record by chromosome and position properly. Additionally, if -assumeSorted is given, we simply use a null VariantContext as the first record, which will all be equal (as all will be null)	2015-07-14 14:12:31 -04:00
ldgauthier	45a1d82305	Merge pull request #1041 from broadinstitute/ldg_ContEst Ported latest (non-yet-public) ContEst into GATK-private	2015-07-10 19:42:03 -04:00
Laura Gauthier	1159cb3aa9	Ported latest (non-yet-public) ContEst into GATK; verified results against Firehose version Change file paths to put ContEst stuff in cancer directory	2015-07-09 17:06:06 -04:00
Geraldine Van der Auwera	8ea4dcab8d	Merge remote-tracking branch 'unstable/master'	2015-07-09 15:17:03 -04:00
kcibul	00526d4624	Merge pull request #1022 from broadinstitute/kc_m2_pon update results of NA12878 using official ICE PON (same git hash for t…	2015-07-07 16:37:46 -04:00
kcibul	e6dff9cc4e	Merge pull request #1037 from broadinstitute/ldg_M2_contaminationAnalysis Document contamination downsampling analysis	2015-07-07 16:36:52 -04:00
Geraldine Van der Auwera	c109a953f8	Merge pull request #1029 from broadinstitute/rhl_vqslod_definition Make VQSLOD definition accurate	2015-07-06 19:52:15 -04:00
Laura Gauthier	b6da9366a6	Document contamination downsampling analysis Add Yossi's Queue script to create synthetic contamination data	2015-07-06 12:42:13 -04:00
kcibul	aaf4e33e15	Merge pull request #1028 from broadinstitute/kc_oxog_fixes Fix Foxog NaN output and add read stats for indels	2015-07-04 10:24:21 -04:00
Kristian Cibulskis	fa04024303	fixes NaN output in Foxog (github issue 1025) and also emit read directions stats for indels (issue 1024) fixed docs	2015-07-03 08:44:17 -04:00
Eric Banks	d8e5d663fd	Merge pull request #1030 from broadinstitute/rhl_incorrect_rbp Merge if both GT are phased	2015-06-30 21:13:45 -04:00
Ron Levine	1a7e83fa50	Merge if both GT are phased	2015-06-30 13:03:16 -04:00
Eric Banks	5ea2aff379	Merge pull request #1033 from broadinstitute/eb_fix_spanning_dels_with_new_allele Update the allele remapping code to handle the new spanning deletion allele.	2015-06-29 22:57:14 -04:00
Eric Banks	f994220617	Update the allele remapping code to handle the new spanning deletion allele. Now that Ron updated the GATK so that we use star to represent spanning deletions, we need to catch those cases in the code that remaps alleles. Otherwise, we try to pad the stars and that's just bad. Added test from actual failing data.	2015-06-29 17:58:22 -04:00
Ron Levine	09686f4595	Make VQSLOD definition accurate	2015-06-25 16:47:50 -04:00
Geraldine Van der Auwera	719bb15340	Merge pull request #1019 from broadinstitute/rhl_var_index_param_gz Indexing parameters not required if output file has the g.vcf.gz exte…	2015-06-17 14:30:20 -04:00
Eric Banks	a4987310ae	Merge pull request #1014 from broadinstitute/gg_fix_combinevariants_del_allele_1000 Added else clause to handle symbolic alleles	2015-06-17 12:52:18 -04:00
Geraldine Van der Auwera	697c4b0cf1	Added else clause to handle symbolic alleles Add test for createAlleleMapping	2015-06-17 10:52:56 -04:00
Eric Banks	29ebfc32c3	Merge pull request #1020 from broadinstitute/eb_handle_multiple_spanning_dels Handle cases where a given sample has multiple spanning deletions.	2015-06-16 14:20:46 -04:00
Eric Banks	fe0b5e0fbe	Handle cases where a given sample has multiple spanning deletions. When a sample has multiple spanning deletions and we are asked to assign likelihoods to the spanning deletion allele, we currently choose the first deletion. Valentin pointed out that this isn't desired behavior. I promised Valentin that I would address this issue, so here it is. I do not believe that the correct thing to do is to sum the likelihoods over all spanning deletions (I came up with problematic cases where this breaks down). So instead I'm using a simple heuristic approach: using the hom alt PLs, find the most likely spanning deletion for this position and use its likelihoods. In the 10K-sample VCF from Monkol there were only 2 cases that this problem popped up. In both cases the heuristic approach works well.	2015-06-16 12:20:43 -04:00
Kristian Cibulskis	7018fd7203	update results of NA12878 using official ICE PON (same git hash for the caller)	2015-06-16 10:09:36 -04:00
kcibul	578d429348	Merge pull request #1017 from broadinstitute/ldg_contaminationDS Enable contamination correction via downsampling (as for HaplotypeCal…	2015-06-15 14:37:10 -04:00
Laura Gauthier	ce5ecf1383	Enable contamination correction via downsampling (as for HaplotypeCaller), added test Add oxoG read count annotation and add as default annotation Add ##SAMPLE VCF header line in accordance with TCGA VCF spec, specifying "File" line in sample header with BAM file name and "SampleName" with BAM sample name (Don't print sample file path if --no_cmdline_in_header is specified to help with test consistency) Turn on active region assembly-based physical phasing for M2 Clean up M2-related annotations so UG doesn't crash if M2 annotations are called	2015-06-15 07:59:15 -04:00
Eric Banks	9522be8762	Merge pull request #1016 from broadinstitute/rhl_allele_rep_span_dels Add spannning deletions allele	2015-06-13 22:12:23 -04:00
Ron Levine	b35085ca28	Indexing parameters not required if output file has the g.vcf.gz extensionv	2015-06-13 11:46:56 -04:00
Ron Levine	dbed660183	Add spannning deletions allele	2015-06-12 16:43:06 -04:00
Geraldine Van der Auwera	456fefa860	Merge pull request #1001 from broadinstitute/jw_clarify_overlaping_contigs Changed error message for Contigs Out of Order	2015-06-12 15:03:10 -04:00
Joseph White	398dc7a123	Changed error message for Contigs Out of Order Changed confusing error message for out of order contigs Updated Exception message.	2015-06-11 21:46:06 -04:00
Geraldine Van der Auwera	2a7f95eddb	Merge pull request #1009 from broadinstitute/gg_patch_depthofcoverage_#1002 User (mnw21cam) patch to fix DoC slowdown in 3.4	2015-06-10 11:16:08 -04:00
kcibul	aad89cd653	Merge pull request #1005 from broadinstitute/kc_m2_pon created panel of normals queue creation script and instructions	2015-06-09 11:10:17 -04:00
droazen	5e3f3d69db	Merge pull request #1012 from broadinstitute/rhl_build_vec_pairhmm_lib Built VectorLoglessPairHMM lib with icc with gcc 4.4.7	2015-06-08 15:25:57 -04:00
Geraldine Van der Auwera	95f2899f05	User (mnw21cam) patch to fix DoC slowdown in 3.4	2015-06-05 21:12:46 -04:00
Louis Bergelson	588d6f1180	Merge pull request #1013 from lbergelson/patch-1 fix typo in queue arguments	2015-06-05 19:27:51 -04:00
Louis Bergelson	ebdda72c88	fix typo in queue arguments	2015-06-05 17:06:23 -04:00
Ron Levine	40d8fb99a3	Built VectorLoglessPairHMM lib with icc with gcc 4.4.7	2015-06-05 15:38:25 -04:00
droazen	847c832ef9	Merge pull request #999 from broadinstitute/rhl_load_vector_pair_hmm Fix loading of VectorLoglessPairHMM by rolling back to Intel's lib version	2015-06-04 12:54:59 -04:00
Kristian Cibulskis	5ceb63cc35	created panel of normals queue creation script and instructions increased runtime java memory, changed default PON for NN to be new ICE PON updated FP rates, when using new default PON. SNPs up by ~3%, INDELs down by 40% updated git hash reference updated git hash reference	2015-06-02 16:23:10 -04:00
Geraldine Van der Auwera	526f7c0d07	Merge pull request #985 from broadinstitute/sa_refactor_cleansing_hack_negative_zeros_973_depends_on_841 removed in-line conditional (hack) that changed the result from 0.0 to -0.0; see issue #841	2015-05-23 00:02:52 -04:00
Eric Banks	27d3bafcbd	Merge pull request #997 from broadinstitute/eb_add_foreign_read_filter Added a new filter that can be used to remove reads that are too smal…	2015-05-22 14:34:28 -04:00
Eric Banks	8c81e7df95	Added a new filter that can be used to remove reads that are too small and overly clipped.	2015-05-22 14:33:35 -04:00
Ron Levine	3b0cb028e6	Fix loading of VectorLoglessPairHMM by rolling back to Intel's lib version	2015-05-22 14:16:00 -04:00
Geraldine Van der Auwera	7f306bc4b6	Merge pull request #980 from broadinstitute/Sheila_QD_Update Sheila qd update	2015-05-22 12:04:43 -04:00
Sheila Chandran	dac0b8ddfc	Added QD calculation	2015-05-22 11:59:10 -04:00
Geraldine Van der Auwera	e96e52ee9d	Merge pull request #986 from broadinstitute/rhl_select_genotype_filter_status Site-level selection based on genotype filter status	2015-05-22 09:59:00 -04:00
Ron Levine	a6ca97ef14	Site-level selection based on genotype filter status	2015-05-21 11:27:20 -04:00
melonistic	8d25b2ba40	removed in-line conditional (hack) that changed the result from 0.0 to -0.0; see issue #841 removed irrelevant -0 comments as specified in issue #841 but committed in #973	2015-05-16 23:12:09 -04:00
kcibul	28a7ea43ec	Merge pull request #982 from broadinstitute/kc_fp_analysis added "artifact detection mode" for PON creation	2015-05-15 07:45:20 -04:00
Kristian Cibulskis	3b1ee17727	added "artifact detection mode" for PON creation added "str_contraction" artifact filter (improves specificity, especially in exomes) refactored out VCF constants and added descriptions added "artifact detection mode" for PON creation added "str_contraction" artifact filter (improves specificity, especially in exomes) added new dream evaulation markdown added results for SMC 4 fixed up documentation, moved location to /dsde/working/mutect/dream_smc, and checked in scala script added "artifact detection mode" for PON creation added "str_contraction" artifact filter (improves specificity, especially in exomes) fixed bug which would overwrite germline_risk filter errors updated "how to" documents and records fixed license text thinned down FP regression test from 700 sites to 100. we have better ways (DREAM, NN) to check accuracy of the method and 100 is good enough to catch regressions why oh why do the MD5-based unit tests produce different results on different machine architectures? I hate that :/ Thanks to GG, LDG and DR -- test should now produce the same results regardless of machine architecture disabled downsampling... hopefully in the final attempt to make this work cross architecture! enforced LOGLESS_CACHING... hopefully in the final final attempt to make this work cross architecture! refactored out VCF constants and added descriptions	2015-05-15 07:14:33 -04:00
Geraldine Van der Auwera	d1a7edd796	Update pom versions to mark the start of GATK 3.5 development	2015-05-15 00:44:54 -04:00
Geraldine Van der Auwera	f19618653a	Update pom versions for the 3.4 release	2015-05-15 00:40:39 -04:00

1 2 3 4 5 ...

13964 Commits (8fc631b7aec16b9b30bcbfd5bf07f6c4c7e5dcf4) All Branches Search

13964 Commits (8fc631b7aec16b9b30bcbfd5bf07f6c4c7e5dcf4)

All Branches