Commit Graph

2500 Commits (4b2e3cec0b8f8aeb9b22b487a8f81e0db4bd6997)

Author SHA1 Message Date
Eric Banks 4b2e3cec0b Quick pass of FindBugs 'inefficient use of keySet iterator instead of entrySet iterator' fixes for core tools. 2012-08-08 14:29:41 -04:00
Guillermo del Angel 3e2752667c Intermediate checkin for ReducedReads with HaplotypeCaller - change min read count over k-mer to average count over k-mer when doing assembly of a reduced read (not optimal, currently trying max and then will decide on best approach), fix merge conflicts 2012-08-08 12:07:33 -04:00
David Roazen a7811d673f Update URL for phone home / GATK key documentation output by the GATK upon error 2012-08-08 09:29:54 -04:00
Mark DePristo cda8d944b7 Bugfixes for BCF with VQSR
-- Old version converted doubles directly from strings.  New version uses VariantContext getAttributeAsDouble() that looks at the values directly to determine how to convert from Object to Double (via Double.valueOf, (Double), or (Double)(Integer)).
-- getAttributeAsDouble() is now smart in converting integers to doubles as needed
-- Removed unnecessary logging info in BCF2Codec
-- Added integration tests to ensure that VQSR works end-to-end with BCF2 using sites version of the file khalid sent to me
-- Added vqsr.bcf_test.snps.unfiltered.bcf file for this integration test
2012-08-07 17:22:39 -04:00
Mark DePristo 80b94a4f9a AdaptiveContexts implement pruning to a given chi2 p value
-- Added bonferroni corrected p-value pruning, so you tell it how significant of a different you are willing to collapse in the tree, and it prunes the tree down to this maximum threshold
-- Penalty is now a phred-scaled p-value not the raw chi2 value
-- Split command line arguments in VisualizeContextTree into separate arguments for each type of pruning
2012-08-07 17:22:39 -04:00
Mark DePristo 982c735c76 VisualizeAdaptiveTree now considers only leaf nodes when computing max/min penalty 2012-08-07 17:22:39 -04:00
Ryan Poplin 15085bf03e The UnifiedGenotyper now makes use of base insertion and base deletion quality scores if they exist in the reads. 2012-08-07 13:58:22 -04:00
Guillermo del Angel 97c5ed4feb Merge branch 'master' of ssh://gsa4.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-08-06 20:22:31 -04:00
Guillermo del Angel 238d55cb61 Fixes for running HaplotypeCaller with reduced reads: a) minor refactoring, pulled out code to compute mean representative count to ReadUtils, b) Don't use min representative count over kmer when constructing de Bruijn graph - this creates many paths with multiplicity=1 and makes us lose a lot of SNP's at edge of capture targets. Use mean instead 2012-08-06 20:22:12 -04:00
Mark DePristo 00858f16a6 Deleting empty unit test for AdaptiveContexts 2012-08-06 12:58:13 -04:00
Ryan Poplin f1c30c3a59 Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-08-06 12:02:26 -04:00
Mark DePristo 44f160f29f indelGOP and indelGCP are now advanced, not hidden arguments 2012-08-06 11:42:55 -04:00
Mark DePristo 2f004665fb Fixing public -> private dep 2012-08-06 11:42:55 -04:00
Mark DePristo 7bf5ca51ee Major bugfix for adaptive contexts
-- Basically I was treating the context history in the wrong direction, effectively predicting the further bases in the context based on the closer one.  Totally backward.  Updated the code to build the tree in the right direction.
-- Added a few more useful outputs for analysis (minPenalty and maxPenalty)
-- Misc. cleanup of the code
-- Overall I'm not 100% certain this is even the right way to think about the problem.  Clearly this is producing a reasonable output but the sum of chi2 values over the entire tree is just enormous.  Perhaps a MCMC convergence / sampling criterion would be a better way to think about this problem?
2012-08-06 11:42:55 -04:00
Mark DePristo b4841548f1 Bug fixes and misc. improvements to running the adaptive context tools
-- Better output file name defaults
-- Fixed nasty bug where I included non-existant quals in the contexts to process because they showed up in the Cycle covariate
-- Data is processed in qual order now, so it's easier to see progress
-- Logger messages explaining where we are in the process
-- When in UPDATE mode we still write out the information for an equivalent prune by depth for post analysis
2012-08-06 11:42:55 -04:00
Ryan Poplin b8709d8c67 Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-08-06 11:41:28 -04:00
Eric Banks 210db5ec27 Update -maxAlleles argument to -maxAltAlleles to make it more accurate. The hidden GSA production -capMaxAllelesForIndels argument also gets updated. 2012-08-06 11:31:18 -04:00
Eric Banks 8f95a03bb6 Prevent NumberFormatExceptions when parsing the VCF POS field 2012-08-06 11:19:54 -04:00
Ryan Poplin b7eec2fd0e Bug fixes related to the changes in allele padding. If a haplotype started with an insertion it led to array index out of bounds. Haplotype allele insert function is now very simple because all alleles are treated the same way. HaplotypeUnitTest now uses a variant context instead of creating Allele objects directly. 2012-08-05 12:29:10 -04:00
Mark DePristo e1bba91836 Ready for full-scale evaluation adaptive BQSR contexts
-- VisualizeContextTree now can write out an equivalent BQSR table determined after adaptive context merging of all RG x QUAL x CONTEXT trees
-- Docs, algorithm descriptions, etc so that it makes sense what's going on
-- VisualizeContextTree should really be simplified when into a single tool that just visualize the trees when / if we decide to make adaptive contexts standard part of BQSR
 -- Misc. cleaning, organization of the code (recalibation tests were in private but corresponding actual files were public)
2012-08-03 16:02:53 -04:00
Guillermo del Angel 6f8e7692d4 Merge branch 'master' of ssh://gsa4.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-08-03 12:24:37 -04:00
Guillermo del Angel 9e25b209e0 First pass of implementation of Reduced Reads with HaplotypeCaller. Main changes: a) Active region: scale PL's by representative count to determine whether region is active. b) Scale per-read, per-haplotype likelihoods by read representative counts. A read representative count is (temporarily) defined as the average representative count over all bases in read, TBD whether this is good enough to avoid biases in GL's. c) DeBruijn assembler inserts kmers N times in graph, where N is min representative count of read over kmer span - TBD again whether this is the best approach. d) Bug fixes in FragmentUtils: logic to merge fragments was wrong in cases where there is discrepancy of overlaps between unclipped/soft clipped bases. Didn't affect things before but RR makes prevalence of hard-clipped bases in CIGARs more prevalent so this was exposed. e) Cache read representative counts along with read likelihoods associated with a Haplotype. Code can/should be cleaned up and unified with PairHMMIndelErrorModelCode, as well as refactored to support arbitrary ploidy in HaplotypeCaller 2012-08-03 12:24:23 -04:00
Ryan Poplin 8817fc70d1 Merged bug fix from Stable into Unstable 2012-08-03 10:45:01 -04:00
Ryan Poplin f40d0a0a28 Updating VQSR to work with the MNP and symbolic variants that are coming out of the HaplotypeCaller. Integration tests change because of the MNPs in dbSNP. 2012-08-03 10:44:36 -04:00
Joel Thibault 51bd03cc36 Add RemoveProgramRecords annotation to ActiveRegionWalker 2012-08-03 09:54:16 -04:00
Joel Thibault addbfd6437 Add a RemoveProgramRecords annotation
* Add the RemoveProgramRecords annotation to LocusWalker
2012-08-03 09:54:16 -04:00
Joel Thibault 524d7ea306 Choose whether to keep program records based on Walker
* Add keepProgramRecords argument
* Make removeProgramRecords / keepProgramRecords override default
2012-08-03 09:54:16 -04:00
Mark DePristo e04989f76d Bugfix for new PASS position in dictionary in BCF2 2012-08-03 09:42:21 -04:00
Mark DePristo fb5dabce18 Update BCF2 to include a minor version number so we can rev (and report errors) with BCF2
-- We are no likely to fail with an error when reading old BCF files, rather than just giving bad results
-- Added new class BCFVersion that consolidates all of the version management of BCF
2012-08-02 17:30:30 -04:00
Eric Banks e3f89fb054 Missing/malformed GATK report files are user errors 2012-08-02 11:33:21 -04:00
Mark DePristo c3c3d18611 Update BCF2 to put PASS as offset 0 not at the end
-- Unfortunately this commit breaks backward compatibility with all existing BCF2 files...
2012-08-01 17:09:22 -04:00
Mark DePristo ccac77d888 Bugfix for incorrect allele counting in IndelSummary
-- Previous version would count all alt alleles as present in a sample, even if only 1 were present, because of the way VariantEval subsetted VCs
-- Updated code for subsetting VCs by sample to be clearer about how it handles rederiving alleles
-- Update a few pieces of code to get previous correct behavior
-- Updated a few MD5s as now ref calls at sites in dbSNP are counted as having a comp sites, and therefore show up in known sites when Novelty strat is on (which I think is correct)
-- Walkers that used old subsetting function with true are now using clearer version that does rederive alleles by default
2012-08-01 15:45:12 -04:00
Joel Thibault 2b25df3d53 Add removeProgramRecords argument
* Add unit test for the removeProgramRecords
2012-08-01 15:33:05 -04:00
Ryan Poplin d53105668b Merged bug fix from Stable into Unstable 2012-08-01 14:53:06 -04:00
Ryan Poplin fabca66d09 Another fix to VQSR docs 2012-08-01 14:52:49 -04:00
Ryan Poplin 2be29ebd22 Merged bug fix from Stable into Unstable 2012-08-01 14:35:30 -04:00
Ryan Poplin 4093909a56 Updating VQSR docs. Removing references to old best practices pages. 2012-08-01 14:30:24 -04:00
Eric Banks 52b93cab62 Merged bug fix from Stable into Unstable 2012-08-01 13:17:36 -04:00
Eric Banks 22bf052828 Fixing BQSR GATK docs 2012-08-01 13:17:16 -04:00
Eric Banks 459832ee16 Fixed bug in FastaAlternateReferenceMaker when input VCF has overlapping deletions as reported a while back on GS 2012-08-01 10:45:04 -04:00
Eric Banks a4a41458ef Update docs of FastaAlternateReferenceMaker as promised in older GS thread 2012-08-01 10:33:41 -04:00
Eric Banks 38e5419b11 Merged bug fix from Stable into Unstable 2012-08-01 09:50:31 -04:00
Eric Banks 56f8afab97 Requested by Geraldine: adding a utility to register deprecated walkers (and the major version of the first release since they were removed) so that the User Error printed out for e.g. CountCovariates now states: Walker CountCovariates is no longer available in the GATK; it has been deprecated since version 2.0. 2012-08-01 09:50:00 -04:00
Guillermo del Angel 0528337467 Merge branch 'master' of ssh://gsa4.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-07-31 18:17:50 -04:00
Guillermo del Angel 4a23f3cd11 Simple cleanup of pool caller code - since usage is much more general than just calling pools, AF calculation models and GL calculation models are renamed from Pool -> GeneralPloidy. Also, don't have users specify special arguments for -glm and -pnrm. Instead, when running UG with sample ploidy != 2, the correct general ploidy modules are automatically detected and loaded. -glm now reverts to old [SNP|INDEL|BOTH] usage 2012-07-31 16:34:20 -04:00
Eric Banks 6cb10cef96 Fixed older GS reported bug. Actually, the problem really lies in Picard (can't set max records in RAM without it throwing an exception, reported on their JIRA) so I just masked out the problem by removing this never-used argument from this rarely-used tool. 2012-07-31 16:00:36 -04:00
Eric Banks ab53d73459 Quick fix to user error catching 2012-07-31 15:50:32 -04:00
Eric Banks 10111450aa Fixed AlignmentUtils bug for handling Ns in the CIGAR string. Added a UG integration test that calls a BAM with such reads (provided by a user on GetSatisfaction). 2012-07-31 15:37:22 -04:00
Mark DePristo f7133ffc31 Cleanup syntax errors from BQSR reorganization 2012-07-31 08:11:05 -04:00
Mark DePristo dad9bb1192 Changes order of writing BaseRecalibrator results so that if R blows up you still get a meaningful tree 2012-07-31 08:11:04 -04:00