Commit Graph

10159 Commits (d85b38e4da8990925d457d2b7b613c6f2caf4015)

Author SHA1 Message Date
Ryan Poplin d85b38e4da Updating HaplotypeCaller integration tests 2012-08-06 12:02:19 -04:00
Ryan Poplin b8709d8c67 Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-08-06 11:41:28 -04:00
Ryan Poplin afa70a13a9 Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-08-06 11:40:27 -04:00
Ryan Poplin 973d1d47ed Merging together the computeDiploidHaplotypeLikelihoods functions in the HaplotypeCaller's LikelihoodEngine so they both benefit from the ReducedRead's RepresentativeCount 2012-08-06 11:40:07 -04:00
Eric Banks 210db5ec27 Update -maxAlleles argument to -maxAltAlleles to make it more accurate. The hidden GSA production -capMaxAllelesForIndels argument also gets updated. 2012-08-06 11:31:18 -04:00
Eric Banks 8f95a03bb6 Prevent NumberFormatExceptions when parsing the VCF POS field 2012-08-06 11:19:54 -04:00
Ryan Poplin b7eec2fd0e Bug fixes related to the changes in allele padding. If a haplotype started with an insertion it led to array index out of bounds. Haplotype allele insert function is now very simple because all alleles are treated the same way. HaplotypeUnitTest now uses a variant context instead of creating Allele objects directly. 2012-08-05 12:29:10 -04:00
Mark DePristo e1bba91836 Ready for full-scale evaluation adaptive BQSR contexts
-- VisualizeContextTree now can write out an equivalent BQSR table determined after adaptive context merging of all RG x QUAL x CONTEXT trees
-- Docs, algorithm descriptions, etc so that it makes sense what's going on
-- VisualizeContextTree should really be simplified when into a single tool that just visualize the trees when / if we decide to make adaptive contexts standard part of BQSR
 -- Misc. cleaning, organization of the code (recalibation tests were in private but corresponding actual files were public)
2012-08-03 16:02:53 -04:00
Guillermo del Angel d2e8eb7b23 Fixed 2 haplotype caller unit tests: a) new interface for addReadLikelihoods() including read counts, b) disable test that test basic DeBruijn graph assembly, not ready yet 2012-08-03 14:26:51 -04:00
Ryan Poplin c3b6e2b143 Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-08-03 13:14:43 -04:00
Ryan Poplin ff80f17721 Using PathComparatorTotalScore in the assembly graph traversal does a better job of capturing low frequency branches that are inside high frequnecy haplotypes. 2012-08-03 13:14:37 -04:00
Guillermo del Angel 6f8e7692d4 Merge branch 'master' of ssh://gsa4.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-08-03 12:24:37 -04:00
Guillermo del Angel 9e25b209e0 First pass of implementation of Reduced Reads with HaplotypeCaller. Main changes: a) Active region: scale PL's by representative count to determine whether region is active. b) Scale per-read, per-haplotype likelihoods by read representative counts. A read representative count is (temporarily) defined as the average representative count over all bases in read, TBD whether this is good enough to avoid biases in GL's. c) DeBruijn assembler inserts kmers N times in graph, where N is min representative count of read over kmer span - TBD again whether this is the best approach. d) Bug fixes in FragmentUtils: logic to merge fragments was wrong in cases where there is discrepancy of overlaps between unclipped/soft clipped bases. Didn't affect things before but RR makes prevalence of hard-clipped bases in CIGARs more prevalent so this was exposed. e) Cache read representative counts along with read likelihoods associated with a Haplotype. Code can/should be cleaned up and unified with PairHMMIndelErrorModelCode, as well as refactored to support arbitrary ploidy in HaplotypeCaller 2012-08-03 12:24:23 -04:00
Ryan Poplin 8817fc70d1 Merged bug fix from Stable into Unstable 2012-08-03 10:45:01 -04:00
Ryan Poplin f40d0a0a28 Updating VQSR to work with the MNP and symbolic variants that are coming out of the HaplotypeCaller. Integration tests change because of the MNPs in dbSNP. 2012-08-03 10:44:36 -04:00
Joel Thibault 51bd03cc36 Add RemoveProgramRecords annotation to ActiveRegionWalker 2012-08-03 09:54:16 -04:00
Joel Thibault addbfd6437 Add a RemoveProgramRecords annotation
* Add the RemoveProgramRecords annotation to LocusWalker
2012-08-03 09:54:16 -04:00
Joel Thibault 524d7ea306 Choose whether to keep program records based on Walker
* Add keepProgramRecords argument
* Make removeProgramRecords / keepProgramRecords override default
2012-08-03 09:54:16 -04:00
Mark DePristo e04989f76d Bugfix for new PASS position in dictionary in BCF2 2012-08-03 09:42:21 -04:00
Mark DePristo d22b8cf86b VisualizeContextTree now loops over M, I, and D states generating trees and analyzes 2012-08-02 17:30:30 -04:00
Mark DePristo fb5dabce18 Update BCF2 to include a minor version number so we can rev (and report errors) with BCF2
-- We are no likely to fail with an error when reading old BCF files, rather than just giving bad results
-- Added new class BCFVersion that consolidates all of the version management of BCF
2012-08-02 17:30:30 -04:00
Eric Banks b4f4d86c77 Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-08-02 11:49:03 -04:00
Ryan Poplin 3ece4c4993 Merged bug fix from Stable into Unstable 2012-08-02 11:41:36 -04:00
Ryan Poplin 6f7a236cfc Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/stable 2012-08-02 11:41:20 -04:00
Ryan Poplin cb8bc18aeb Fix for error in HaplotypeCaller. HC has a UG argument collection for the UG engine but some of those arguments aren't appropriate to set. 2012-08-02 11:41:06 -04:00
Eric Banks e3f89fb054 Missing/malformed GATK report files are user errors 2012-08-02 11:33:21 -04:00
Eric Banks cc01f844d4 Merged bug fix from Stable into Unstable 2012-08-02 11:25:28 -04:00
Eric Banks 0381fd7c83 Hmm, I thought I used the right md5s last time. Let's try again. 2012-08-02 11:25:10 -04:00
Mark DePristo 2f585b91be Update ex2.vcf and .bcf test files to new spec 2012-08-01 17:10:35 -04:00
Mark DePristo c3c3d18611 Update BCF2 to put PASS as offset 0 not at the end
-- Unfortunately this commit breaks backward compatibility with all existing BCF2 files...
2012-08-01 17:09:22 -04:00
Mark DePristo 25c773ef33 Adding VE integration test file to private/testdata 2012-08-01 15:45:12 -04:00
Mark DePristo ccac77d888 Bugfix for incorrect allele counting in IndelSummary
-- Previous version would count all alt alleles as present in a sample, even if only 1 were present, because of the way VariantEval subsetted VCs
-- Updated code for subsetting VCs by sample to be clearer about how it handles rederiving alleles
-- Update a few pieces of code to get previous correct behavior
-- Updated a few MD5s as now ref calls at sites in dbSNP are counted as having a comp sites, and therefore show up in known sites when Novelty strat is on (which I think is correct)
-- Walkers that used old subsetting function with true are now using clearer version that does rederive alleles by default
2012-08-01 15:45:12 -04:00
Joel Thibault 2b25df3d53 Add removeProgramRecords argument
* Add unit test for the removeProgramRecords
2012-08-01 15:33:05 -04:00
Ryan Poplin d53105668b Merged bug fix from Stable into Unstable 2012-08-01 14:53:06 -04:00
Ryan Poplin fabca66d09 Another fix to VQSR docs 2012-08-01 14:52:49 -04:00
Ryan Poplin 2be29ebd22 Merged bug fix from Stable into Unstable 2012-08-01 14:35:30 -04:00
Ryan Poplin 4093909a56 Updating VQSR docs. Removing references to old best practices pages. 2012-08-01 14:30:24 -04:00
Eric Banks 52b93cab62 Merged bug fix from Stable into Unstable 2012-08-01 13:17:36 -04:00
Eric Banks 22bf052828 Fixing BQSR GATK docs 2012-08-01 13:17:16 -04:00
Guillermo del Angel 9ac72dbd4d Merged bug fix from Stable into Unstable 2012-08-01 10:56:45 -04:00
Guillermo del Angel 84cd23f891 Merge branch 'master' of ssh://gsa4.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/stable 2012-08-01 10:55:37 -04:00
Guillermo del Angel 01265f78e6 Add sanity check and possible bug fix for forum user: if haplotypes cannot be created from given alleles when genotyping indels (e.g. too close to contig boundary, etc.) in pool mode, empty allele list, signifying site can't be genotyped 2012-08-01 10:50:00 -04:00
Eric Banks 459832ee16 Fixed bug in FastaAlternateReferenceMaker when input VCF has overlapping deletions as reported a while back on GS 2012-08-01 10:45:04 -04:00
Eric Banks a4a41458ef Update docs of FastaAlternateReferenceMaker as promised in older GS thread 2012-08-01 10:33:41 -04:00
Eric Banks 687df2341d Merged bug fix from Stable into Unstable 2012-08-01 10:27:15 -04:00
Eric Banks 05bf6e3726 Updating md5s in pipeline tests so that they finally pass 2012-08-01 10:27:00 -04:00
Eric Banks 38e5419b11 Merged bug fix from Stable into Unstable 2012-08-01 09:50:31 -04:00
Eric Banks 56f8afab97 Requested by Geraldine: adding a utility to register deprecated walkers (and the major version of the first release since they were removed) so that the User Error printed out for e.g. CountCovariates now states: Walker CountCovariates is no longer available in the GATK; it has been deprecated since version 2.0. 2012-08-01 09:50:00 -04:00
Eric Banks 7cf4b63d76 Disabling indel quals in BaseRecalibrator as it should be, not PrintReads. 2012-08-01 09:23:04 -04:00
Guillermo del Angel 0528337467 Merge branch 'master' of ssh://gsa4.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-07-31 18:17:50 -04:00