Eric Banks
d2feb5d742
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-08-06 15:06:45 -04:00
Eric Banks
4b5e1aed1f
We might completely overhaul the AssignSomaticStatus tool at some point, but for now I've just tweaked it a little so that it runs well with the TALEN data: fixed VCF header bugs and changed it so that any multi-sample VCF can be used and the user specifies which of the samples is the normal and which should be used as the tumor (only 1 T/N pair allowed now).
2012-08-06 15:06:36 -04:00
Mark DePristo
00858f16a6
Deleting empty unit test for AdaptiveContexts
2012-08-06 12:58:13 -04:00
Ryan Poplin
f1c30c3a59
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-08-06 12:02:26 -04:00
Ryan Poplin
d85b38e4da
Updating HaplotypeCaller integration tests
2012-08-06 12:02:19 -04:00
Mark DePristo
44f160f29f
indelGOP and indelGCP are now advanced, not hidden arguments
2012-08-06 11:42:55 -04:00
Mark DePristo
2f004665fb
Fixing public -> private dep
2012-08-06 11:42:55 -04:00
Mark DePristo
7bf5ca51ee
Major bugfix for adaptive contexts
...
-- Basically I was treating the context history in the wrong direction, effectively predicting the further bases in the context based on the closer one. Totally backward. Updated the code to build the tree in the right direction.
-- Added a few more useful outputs for analysis (minPenalty and maxPenalty)
-- Misc. cleanup of the code
-- Overall I'm not 100% certain this is even the right way to think about the problem. Clearly this is producing a reasonable output but the sum of chi2 values over the entire tree is just enormous. Perhaps a MCMC convergence / sampling criterion would be a better way to think about this problem?
2012-08-06 11:42:55 -04:00
Mark DePristo
b4841548f1
Bug fixes and misc. improvements to running the adaptive context tools
...
-- Better output file name defaults
-- Fixed nasty bug where I included non-existant quals in the contexts to process because they showed up in the Cycle covariate
-- Data is processed in qual order now, so it's easier to see progress
-- Logger messages explaining where we are in the process
-- When in UPDATE mode we still write out the information for an equivalent prune by depth for post analysis
2012-08-06 11:42:55 -04:00
Ryan Poplin
b8709d8c67
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-08-06 11:41:28 -04:00
Ryan Poplin
afa70a13a9
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-08-06 11:40:27 -04:00
Ryan Poplin
973d1d47ed
Merging together the computeDiploidHaplotypeLikelihoods functions in the HaplotypeCaller's LikelihoodEngine so they both benefit from the ReducedRead's RepresentativeCount
2012-08-06 11:40:07 -04:00
Eric Banks
210db5ec27
Update -maxAlleles argument to -maxAltAlleles to make it more accurate. The hidden GSA production -capMaxAllelesForIndels argument also gets updated.
2012-08-06 11:31:18 -04:00
Eric Banks
8f95a03bb6
Prevent NumberFormatExceptions when parsing the VCF POS field
2012-08-06 11:19:54 -04:00
Ryan Poplin
b7eec2fd0e
Bug fixes related to the changes in allele padding. If a haplotype started with an insertion it led to array index out of bounds. Haplotype allele insert function is now very simple because all alleles are treated the same way. HaplotypeUnitTest now uses a variant context instead of creating Allele objects directly.
2012-08-05 12:29:10 -04:00
Mark DePristo
e1bba91836
Ready for full-scale evaluation adaptive BQSR contexts
...
-- VisualizeContextTree now can write out an equivalent BQSR table determined after adaptive context merging of all RG x QUAL x CONTEXT trees
-- Docs, algorithm descriptions, etc so that it makes sense what's going on
-- VisualizeContextTree should really be simplified when into a single tool that just visualize the trees when / if we decide to make adaptive contexts standard part of BQSR
-- Misc. cleaning, organization of the code (recalibation tests were in private but corresponding actual files were public)
2012-08-03 16:02:53 -04:00
Guillermo del Angel
d2e8eb7b23
Fixed 2 haplotype caller unit tests: a) new interface for addReadLikelihoods() including read counts, b) disable test that test basic DeBruijn graph assembly, not ready yet
2012-08-03 14:26:51 -04:00
Ryan Poplin
c3b6e2b143
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-08-03 13:14:43 -04:00
Ryan Poplin
ff80f17721
Using PathComparatorTotalScore in the assembly graph traversal does a better job of capturing low frequency branches that are inside high frequnecy haplotypes.
2012-08-03 13:14:37 -04:00
Guillermo del Angel
6f8e7692d4
Merge branch 'master' of ssh://gsa4.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-08-03 12:24:37 -04:00
Guillermo del Angel
9e25b209e0
First pass of implementation of Reduced Reads with HaplotypeCaller. Main changes: a) Active region: scale PL's by representative count to determine whether region is active. b) Scale per-read, per-haplotype likelihoods by read representative counts. A read representative count is (temporarily) defined as the average representative count over all bases in read, TBD whether this is good enough to avoid biases in GL's. c) DeBruijn assembler inserts kmers N times in graph, where N is min representative count of read over kmer span - TBD again whether this is the best approach. d) Bug fixes in FragmentUtils: logic to merge fragments was wrong in cases where there is discrepancy of overlaps between unclipped/soft clipped bases. Didn't affect things before but RR makes prevalence of hard-clipped bases in CIGARs more prevalent so this was exposed. e) Cache read representative counts along with read likelihoods associated with a Haplotype. Code can/should be cleaned up and unified with PairHMMIndelErrorModelCode, as well as refactored to support arbitrary ploidy in HaplotypeCaller
2012-08-03 12:24:23 -04:00
Ryan Poplin
8817fc70d1
Merged bug fix from Stable into Unstable
2012-08-03 10:45:01 -04:00
Ryan Poplin
f40d0a0a28
Updating VQSR to work with the MNP and symbolic variants that are coming out of the HaplotypeCaller. Integration tests change because of the MNPs in dbSNP.
2012-08-03 10:44:36 -04:00
Joel Thibault
51bd03cc36
Add RemoveProgramRecords annotation to ActiveRegionWalker
2012-08-03 09:54:16 -04:00
Joel Thibault
addbfd6437
Add a RemoveProgramRecords annotation
...
* Add the RemoveProgramRecords annotation to LocusWalker
2012-08-03 09:54:16 -04:00
Joel Thibault
524d7ea306
Choose whether to keep program records based on Walker
...
* Add keepProgramRecords argument
* Make removeProgramRecords / keepProgramRecords override default
2012-08-03 09:54:16 -04:00
Mark DePristo
e04989f76d
Bugfix for new PASS position in dictionary in BCF2
2012-08-03 09:42:21 -04:00
Mark DePristo
d22b8cf86b
VisualizeContextTree now loops over M, I, and D states generating trees and analyzes
2012-08-02 17:30:30 -04:00
Mark DePristo
fb5dabce18
Update BCF2 to include a minor version number so we can rev (and report errors) with BCF2
...
-- We are no likely to fail with an error when reading old BCF files, rather than just giving bad results
-- Added new class BCFVersion that consolidates all of the version management of BCF
2012-08-02 17:30:30 -04:00
Eric Banks
b4f4d86c77
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-08-02 11:49:03 -04:00
Ryan Poplin
3ece4c4993
Merged bug fix from Stable into Unstable
2012-08-02 11:41:36 -04:00
Ryan Poplin
6f7a236cfc
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/stable
2012-08-02 11:41:20 -04:00
Ryan Poplin
cb8bc18aeb
Fix for error in HaplotypeCaller. HC has a UG argument collection for the UG engine but some of those arguments aren't appropriate to set.
2012-08-02 11:41:06 -04:00
Eric Banks
e3f89fb054
Missing/malformed GATK report files are user errors
2012-08-02 11:33:21 -04:00
Eric Banks
cc01f844d4
Merged bug fix from Stable into Unstable
2012-08-02 11:25:28 -04:00
Eric Banks
0381fd7c83
Hmm, I thought I used the right md5s last time. Let's try again.
2012-08-02 11:25:10 -04:00
Mark DePristo
2f585b91be
Update ex2.vcf and .bcf test files to new spec
2012-08-01 17:10:35 -04:00
Mark DePristo
c3c3d18611
Update BCF2 to put PASS as offset 0 not at the end
...
-- Unfortunately this commit breaks backward compatibility with all existing BCF2 files...
2012-08-01 17:09:22 -04:00
Mark DePristo
25c773ef33
Adding VE integration test file to private/testdata
2012-08-01 15:45:12 -04:00
Mark DePristo
ccac77d888
Bugfix for incorrect allele counting in IndelSummary
...
-- Previous version would count all alt alleles as present in a sample, even if only 1 were present, because of the way VariantEval subsetted VCs
-- Updated code for subsetting VCs by sample to be clearer about how it handles rederiving alleles
-- Update a few pieces of code to get previous correct behavior
-- Updated a few MD5s as now ref calls at sites in dbSNP are counted as having a comp sites, and therefore show up in known sites when Novelty strat is on (which I think is correct)
-- Walkers that used old subsetting function with true are now using clearer version that does rederive alleles by default
2012-08-01 15:45:12 -04:00
Joel Thibault
2b25df3d53
Add removeProgramRecords argument
...
* Add unit test for the removeProgramRecords
2012-08-01 15:33:05 -04:00
Ryan Poplin
d53105668b
Merged bug fix from Stable into Unstable
2012-08-01 14:53:06 -04:00
Ryan Poplin
fabca66d09
Another fix to VQSR docs
2012-08-01 14:52:49 -04:00
Ryan Poplin
2be29ebd22
Merged bug fix from Stable into Unstable
2012-08-01 14:35:30 -04:00
Ryan Poplin
4093909a56
Updating VQSR docs. Removing references to old best practices pages.
2012-08-01 14:30:24 -04:00
Eric Banks
52b93cab62
Merged bug fix from Stable into Unstable
2012-08-01 13:17:36 -04:00
Eric Banks
22bf052828
Fixing BQSR GATK docs
2012-08-01 13:17:16 -04:00
Guillermo del Angel
9ac72dbd4d
Merged bug fix from Stable into Unstable
2012-08-01 10:56:45 -04:00
Guillermo del Angel
84cd23f891
Merge branch 'master' of ssh://gsa4.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/stable
2012-08-01 10:55:37 -04:00
Guillermo del Angel
01265f78e6
Add sanity check and possible bug fix for forum user: if haplotypes cannot be created from given alleles when genotyping indels (e.g. too close to contig boundary, etc.) in pool mode, empty allele list, signifying site can't be genotyped
2012-08-01 10:50:00 -04:00