Commit Graph

1332 Commits (8c18ead5e42aaebb9dd6bd9cd2a97be8c3de5567)

Author SHA1 Message Date
Laura Gauthier 8c18ead5e4 Clarify VCF version for supporting population alleles files
Clarify DeNovoPrior definition on PbyT
2015-07-20 13:42:57 -04:00
vruano 7f74303f2b Removes a very inefficient way to iterate in ReferenceConfidenceModel.isReadInformativeAboutIndelsOfSize(...)
Addresses performance issue #1048.
2015-07-16 12:04:12 -04:00
Geraldine Van der Auwera c109a953f8 Merge pull request #1029 from broadinstitute/rhl_vqslod_definition
Make VQSLOD definition accurate
2015-07-06 19:52:15 -04:00
Ron Levine 1a7e83fa50 Merge if both GT are phased 2015-06-30 13:03:16 -04:00
Eric Banks f994220617 Update the allele remapping code to handle the new spanning deletion allele.
Now that Ron updated the GATK so that we use star to represent spanning
deletions, we need to catch those cases in the code that remaps alleles.
Otherwise, we try to pad the stars and that's just bad.

Added test from actual failing data.
2015-06-29 17:58:22 -04:00
Ron Levine 09686f4595 Make VQSLOD definition accurate 2015-06-25 16:47:50 -04:00
Geraldine Van der Auwera 719bb15340 Merge pull request #1019 from broadinstitute/rhl_var_index_param_gz
Indexing parameters not required if output file has the g.vcf.gz exte…
2015-06-17 14:30:20 -04:00
Geraldine Van der Auwera 697c4b0cf1 Added else clause to handle symbolic alleles
Add test for createAlleleMapping
2015-06-17 10:52:56 -04:00
Eric Banks 29ebfc32c3 Merge pull request #1020 from broadinstitute/eb_handle_multiple_spanning_dels
Handle cases where a given sample has multiple spanning deletions.
2015-06-16 14:20:46 -04:00
Eric Banks fe0b5e0fbe Handle cases where a given sample has multiple spanning deletions.
When a sample has multiple spanning deletions and we are asked to assign
likelihoods to the spanning deletion allele, we currently choose the first
deletion.  Valentin pointed out that this isn't desired behavior.  I
promised Valentin that I would address this issue, so here it is.

I do not believe that the correct thing to do is to sum the likelihoods
over all spanning deletions (I came up with problematic cases where this
breaks down).

So instead I'm using a simple heuristic approach: using the hom alt PLs, find
the most likely spanning deletion for this position and use its likelihoods.

In the 10K-sample VCF from Monkol there were only 2 cases that this problem
popped up.  In both cases the heuristic approach works well.
2015-06-16 12:20:43 -04:00
Laura Gauthier ce5ecf1383 Enable contamination correction via downsampling (as for HaplotypeCaller), added test
Add oxoG read count annotation and add as default annotation
Add ##SAMPLE VCF header line in accordance with TCGA VCF spec, specifying "File" line in sample header with BAM file name and "SampleName" with BAM sample name (Don't print sample file path if --no_cmdline_in_header is specified to help with test consistency)
Turn on active region assembly-based physical phasing for M2
Clean up M2-related annotations so UG doesn't crash if M2 annotations are called
2015-06-15 07:59:15 -04:00
Ron Levine b35085ca28 Indexing parameters not required if output file has the g.vcf.gz extensionv 2015-06-13 11:46:56 -04:00
Ron Levine dbed660183 Add spannning deletions allele 2015-06-12 16:43:06 -04:00
Geraldine Van der Auwera 526f7c0d07 Merge pull request #985 from broadinstitute/sa_refactor_cleansing_hack_negative_zeros_973_depends_on_841
removed in-line conditional (hack) that changed the result from 0.0 to -0.0; see issue #841
2015-05-23 00:02:52 -04:00
Sheila Chandran dac0b8ddfc Added QD calculation 2015-05-22 11:59:10 -04:00
Ron Levine a6ca97ef14 Site-level selection based on genotype filter status 2015-05-21 11:27:20 -04:00
melonistic 8d25b2ba40 removed in-line conditional (hack) that changed the result from 0.0 to -0.0; see issue #841
removed irrelevant -0 comments as specified in issue #841 but committed in #973
2015-05-16 23:12:09 -04:00
Geraldine Van der Auwera d1a7edd796 Update pom versions to mark the start of GATK 3.5 development 2015-05-15 00:44:54 -04:00
Geraldine Van der Auwera f19618653a Update pom versions for the 3.4 release 2015-05-15 00:40:39 -04:00
David Roazen caafe84e74 Rev htsjdk to version 1.132 and picard to version 1.131, and switch to using the versions in maven central
-We now pull htsjdk and picard from maven central.

-Updated the GATK codebase as necessary to adapt to changes in the Feature
 interface.

-Since VCFHeader now requires that all header lines have unique keys, uniquified
 the keys of GVCFBlock header lines by including the min/max GQ in the key.
 Updated MD5s accordingly.

-Other MD5s changed as a result of an htsjdk fix to eliminate "-0" in VCF output.
2015-05-14 15:26:23 -04:00
Geraldine Van der Auwera f6b3d8e862 Merge pull request #947 from broadinstitute/rhl_invert_selection
Added --invert_selection flag for variant selection queries
2015-05-13 13:40:32 -04:00
Eric Banks c752b9bca6 Fixed a small feature/bug that I introduced with the spanning deletions genotyping.
In the case where there's a low quality SNP under a spanning deletion in the gvcfs:
if the SNP is not genotyped by GenotypeGVCFs (because it's just noise) we were still
emitting a record with just the symbolic DEL allele (because that allele is high quality).

We no longer do that.
2015-05-13 11:19:40 -04:00
Ron Levine 4a75d54e65 Added invert and exclude flags for variant selection queries 2015-05-12 15:08:28 -04:00
Geraldine Van der Auwera 7a75f4ae79 Merge pull request #974 from broadinstitute/jw_Var2BinPEDSwap
Correct errant array element swap in FAM file output.
2015-05-12 08:49:16 -04:00
Eric Banks 53a34cea4a Merge pull request #938 from broadinstitute/eb_fix_spanning_deletions_in_genotyping
Added a fix for genotyping positions over spanning deletions.
2015-05-11 23:11:47 -04:00
Joseph White abb6bc6f57 Correct errant array element swap in FAM file output.
dad and mom are swapped; paternal first, then maternal

updated MD5 chksums for test files

remove commented lines
2015-05-11 20:45:50 -04:00
Eric Banks 530e0e5ea6 Added a fix for combining/genotyping positions over spanning deletions.
Previously, if a SNP occurred in sample A at a position that was in the middle of a deletion for sample B,
sample B would be genotyped as homozygous reference there (but it's NOT reference - there's a deletion).
Now, sample B is genotyped as having a symbolic DEL allele.

Minor cleanup added.  Note that I also removed Laura's previous fix for this problem.

Existing integration tests change because I've added a new header line to the VCF being output.
I also added several tests for the new functionality showing:
1. genotyping from separate and already combined gvcfs give the same output
2. genotyping over multiple spanning deletions works
3. combining works too

Existing unit tests also cover this case.
2015-05-11 15:11:16 -04:00
Joseph White 5be8bc5dfc Deprecate --mergeVariantsViaLD in HC
New unit test for deprecated mergeVariantsViaLD
Update HaplotypeCallerIntegrationTest.java
Delete duplicate testHaplotypeCallerMergeVariantsViaLDException test.
2015-05-08 17:50:25 -04:00
Geraldine Van der Auwera 5d8b9a7c20 Moved MQ0 out of HC exclusion and into StandardUGAnnotation 2015-05-03 01:04:49 +02:00
Geraldine Van der Auwera 071d82d1bf Un-exclude SD and TRA from HC annotators; resolves #966
Exclude MQ0BySample
Move SD and TRA to new StandardUGAnnotation interface
There is now annotation interface (StandardUGAnnotation) holding annots that are standard in UG but should't be used as they are now with HC. This allows us to not have to exclude these annotations explicitly in HC, but still be able to use them for development purposes.
2015-05-03 00:45:53 +02:00
Geraldine Van der Auwera e49f6dfd0f Merge pull request #970 from broadinstitute/gg_minor_docfixes
Fairly minor if plentiful fixes to various gatkdocs. Merging this without formal review since all tests pass, the gatkdocs build, and no one really wants to review corrections to grammar, typos and layout for 120+ documents. Review will be done by users in production ;-)
2015-05-03 00:36:12 +02:00
Geraldine Van der Auwera 919c3eaa2e Numerous doc fixes; mostly formatting and clarifications 2015-05-03 00:28:46 +02:00
Ron Levine 9ff827c83a More allele trimming for VariantAnnotator 2015-04-29 21:11:49 -04:00
Laura Gauthier 97caf94807 Fix implementation of allowNonUniqueKmersInRef so that it applies to all kmer sizes 2015-04-23 13:01:47 -04:00
Ron Levine d5f98e99f0 Bypass reads with a bad CIGAR length 2015-04-21 11:55:56 -04:00
Kristian Cibulskis 45610a142c initial refactoring of arguments into individual argument collections
fix blasted license blurbs

updates based on PR comments (abstractify HaplotypeCallerArgumentCollection into AssemblyBasedCallerArgumentCollection)

comments on comments from PR review
2015-04-07 16:55:32 -04:00
Geraldine Van der Auwera 2053afe52a Merge pull request #914 from broadinstitute/ldg_fixDitheringRandomness
Initialize annotations so that --disableDithering actually works
2015-04-06 15:40:30 -04:00
Yossi Farjoun d30a6258bc added the missing file to the error message 2015-04-06 08:21:55 -04:00
Laura Gauthier 9c842df3a3 Initialize annotations so that --disableDithering actually works 2015-04-02 17:34:46 -04:00
Geraldine Van der Auwera d7f7022dce Merge pull request #904 from broadinstitute/pd_orig_dp
Added keepOriginalDP argument to SelectVariants
2015-03-30 09:01:33 -04:00
Laura Gauthier 5a10758e2e Annotation changes for M2:
Build a ReferenceContext in ActiveRegionWalkers to pass in to annotation engine so we can call the TandemRepeatAnnotator from M2
Make TandemRepeatAnnotator default annotation for M2.
Setup (but don't use yet) HC-style contamination downsampling.
New HC integration test with TandemRepeatAnnotator
2015-03-27 18:25:23 -04:00
Ron Levine aef0a83c52 Automatically choose indexing strategy by file extension 2015-03-27 11:10:35 -04:00
Phillip Dexheimer c97c253ec8 Added keepOriginalDP argument to SelectVariants
Fixes #830
2015-03-25 22:45:31 -04:00
Phillip Dexheimer 9e63696315 Remove indel-length normalization of QD for GGVCFs
* Fixes #848
* length normalization is now only applied if the annotation is calculated in UG
2015-03-24 08:22:19 -04:00
Geraldine Van der Auwera 0a45b2d79d Merge pull request #883 from broadinstitute/rhl_hc_mq0
Exclude MappingQualityZero from default annotations
2015-03-23 12:59:08 -04:00
Ami Levy-Moonshine c5fc5c4f8c create 2 new tools:
- ASEReadCounter (public tool) replce Tuuli's script to produce the input to Manny's tool.
   It count the number of reads that support the ref allele and the alt allele, filtereing low qual reads and bases and keep only properPaired reads
- ASECaller (private tool) take both RNA and DNA, and produce ontingencyTables ** still under development **

minor changes in other tools:
- update RNA HC variant calling scala script
- expose FS method pValueForContingencyTable to be able to call it from ASEcaller

In ASEReadCounter:
- allow different option to deal with overlaping read from the same fragment
- add option to ignore or include indels in the pileups
- add option to disabled DuplicateRead

add ASEReadCounterIntegrationTest.java and files for the test
2015-03-21 16:56:00 -04:00
Ron Levine 46668d469a Exclude MappingQualityZero from default annotations 2015-03-17 21:46:18 -04:00
Kristian Cibulskis ab1053e83c It compiles, and produces results!
fixed NPE when normal contains no reads

first integration test (micro) and unit tests, also rename of MuTectHC -> M2

adding in standard GATK license terms

incorporated HOSTILE mode to PCR Error Correction

removed tumor and normal name parameters and cleaned up internal name handling

changes to allow for calling without a matched normal (technically, not true 'tumor-only' calling).  Used for panel-of-normals creation

additional regression tests, based on DREAM data.  Removed accidental addition of TandemRepeatAnnotator to default annotations

updated MD5 based on run from GSA4 to fix bamboo issue

reverted unneeded visibility changes
2015-03-13 18:28:01 -04:00
Geraldine Van der Auwera 39a972f348 Merge pull request #872 from broadinstitute/eb_create_rgq_format_field
Added the RGQ format annotation to monomorphic sites in the VCF output of GenotypeGVCFs. Fixes #870
2015-03-13 13:59:53 -04:00
Eric Banks 1ff9463285 Added the RGQ format annotation to monomorphic sites in the VCF output of GenotypeGVCFs.
Now, instead of stripping out the GQs for mono sites, we transfer them to the RGQ.
This is extremely useful for people who want to know how confident the hom ref genotype calls are.
Perhaps this is just what CRSP needs for pertinent negatives.

Note that I also changed the tool to no longer use the GenotypeSummaries annotation by default since
it was adding some seemingly unnecessary annotations (like mean GQ now that we keep the GQ around and
number of no-calls).  Let me know if this was a mistake (although Laura gave me a thumbs up).
2015-03-13 10:27:20 -04:00