gatk-3.8

Commit Graph

Author	SHA1	Message	Date
Guillermo del Angel	5be7e0621d	Merge branch 'master' of ssh://gsa4.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-08-09 09:58:34 -04:00
Guillermo del Angel	71ee8d87b3	Rename per-sample ML allelic fractions and counts so that they don't have the same name as the per-site INFO fields, and clarify wording in VCF header	2012-08-09 09:58:20 -04:00
Eric Banks	6230b49a86	Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-08-08 21:44:43 -04:00
Eric Banks	35cec8530c	Make coverage threshold in FindCoveredIntervals a command-line argument	2012-08-08 21:44:24 -04:00
Mauricio Carneiro	250ffd2ad7	Merged bug fix from Stable into Unstable	2012-08-08 15:50:07 -04:00
Mauricio Carneiro	78c1556186	Fixing ReduceReads downsampling bug -- downsampled reads were not being excluded from the read window, causing them to trail back and get caught by the sliding window exception	2012-08-08 15:49:31 -04:00
Eric Banks	0a2a646a52	Other random FindBugs fixes	2012-08-08 14:56:27 -04:00
Eric Banks	f652d7806e	FindBugs found an infinite loop in the code	2012-08-08 14:44:49 -04:00
Eric Banks	4c84cc9486	Quick pass of FindBugs 'should be static inner class' fixes.	2012-08-08 14:42:06 -04:00
Eric Banks	a0196c9f5b	Quick pass of FindBugs 'method invokes inefficient Number constructor' fixes.	2012-08-08 14:34:16 -04:00
Eric Banks	4b2e3cec0b	Quick pass of FindBugs 'inefficient use of keySet iterator instead of entrySet iterator' fixes for core tools.	2012-08-08 14:29:41 -04:00
Guillermo del Angel	3e2752667c	Intermediate checkin for ReducedReads with HaplotypeCaller - change min read count over k-mer to average count over k-mer when doing assembly of a reduced read (not optimal, currently trying max and then will decide on best approach), fix merge conflicts	2012-08-08 12:07:33 -04:00
David Roazen	a7811d673f	Update URL for phone home / GATK key documentation output by the GATK upon error	2012-08-08 09:29:54 -04:00
Mark DePristo	cda8d944b7	Bugfixes for BCF with VQSR -- Old version converted doubles directly from strings. New version uses VariantContext getAttributeAsDouble() that looks at the values directly to determine how to convert from Object to Double (via Double.valueOf, (Double), or (Double)(Integer)). -- getAttributeAsDouble() is now smart in converting integers to doubles as needed -- Removed unnecessary logging info in BCF2Codec -- Added integration tests to ensure that VQSR works end-to-end with BCF2 using sites version of the file khalid sent to me -- Added vqsr.bcf_test.snps.unfiltered.bcf file for this integration test	2012-08-07 17:22:39 -04:00
Mark DePristo	80b94a4f9a	AdaptiveContexts implement pruning to a given chi2 p value -- Added bonferroni corrected p-value pruning, so you tell it how significant of a different you are willing to collapse in the tree, and it prunes the tree down to this maximum threshold -- Penalty is now a phred-scaled p-value not the raw chi2 value -- Split command line arguments in VisualizeContextTree into separate arguments for each type of pruning	2012-08-07 17:22:39 -04:00
Mark DePristo	982c735c76	VisualizeAdaptiveTree now considers only leaf nodes when computing max/min penalty	2012-08-07 17:22:39 -04:00
Ryan Poplin	15085bf03e	The UnifiedGenotyper now makes use of base insertion and base deletion quality scores if they exist in the reads.	2012-08-07 13:58:22 -04:00
Eric Banks	2c76f71a03	Update -maxAlleles argument in integration tests	2012-08-06 22:48:04 -04:00
Guillermo del Angel	c66a896b8e	Fix UG integration test broken by new -maxAltAlleles nomenclature	2012-08-06 21:29:21 -04:00
Guillermo del Angel	97c5ed4feb	Merge branch 'master' of ssh://gsa4.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-08-06 20:22:31 -04:00
Guillermo del Angel	238d55cb61	Fixes for running HaplotypeCaller with reduced reads: a) minor refactoring, pulled out code to compute mean representative count to ReadUtils, b) Don't use min representative count over kmer when constructing de Bruijn graph - this creates many paths with multiplicity=1 and makes us lose a lot of SNP's at edge of capture targets. Use mean instead	2012-08-06 20:22:12 -04:00
Eric Banks	d2feb5d742	Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-08-06 15:06:45 -04:00
Eric Banks	4b5e1aed1f	We might completely overhaul the AssignSomaticStatus tool at some point, but for now I've just tweaked it a little so that it runs well with the TALEN data: fixed VCF header bugs and changed it so that any multi-sample VCF can be used and the user specifies which of the samples is the normal and which should be used as the tumor (only 1 T/N pair allowed now).	2012-08-06 15:06:36 -04:00
Mark DePristo	00858f16a6	Deleting empty unit test for AdaptiveContexts	2012-08-06 12:58:13 -04:00
Ryan Poplin	f1c30c3a59	Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-08-06 12:02:26 -04:00
Ryan Poplin	d85b38e4da	Updating HaplotypeCaller integration tests	2012-08-06 12:02:19 -04:00
Mark DePristo	44f160f29f	indelGOP and indelGCP are now advanced, not hidden arguments	2012-08-06 11:42:55 -04:00
Mark DePristo	2f004665fb	Fixing public -> private dep	2012-08-06 11:42:55 -04:00
Mark DePristo	7bf5ca51ee	Major bugfix for adaptive contexts -- Basically I was treating the context history in the wrong direction, effectively predicting the further bases in the context based on the closer one. Totally backward. Updated the code to build the tree in the right direction. -- Added a few more useful outputs for analysis (minPenalty and maxPenalty) -- Misc. cleanup of the code -- Overall I'm not 100% certain this is even the right way to think about the problem. Clearly this is producing a reasonable output but the sum of chi2 values over the entire tree is just enormous. Perhaps a MCMC convergence / sampling criterion would be a better way to think about this problem?	2012-08-06 11:42:55 -04:00
Mark DePristo	b4841548f1	Bug fixes and misc. improvements to running the adaptive context tools -- Better output file name defaults -- Fixed nasty bug where I included non-existant quals in the contexts to process because they showed up in the Cycle covariate -- Data is processed in qual order now, so it's easier to see progress -- Logger messages explaining where we are in the process -- When in UPDATE mode we still write out the information for an equivalent prune by depth for post analysis	2012-08-06 11:42:55 -04:00
Ryan Poplin	b8709d8c67	Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-08-06 11:41:28 -04:00
Ryan Poplin	afa70a13a9	Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-08-06 11:40:27 -04:00
Ryan Poplin	973d1d47ed	Merging together the computeDiploidHaplotypeLikelihoods functions in the HaplotypeCaller's LikelihoodEngine so they both benefit from the ReducedRead's RepresentativeCount	2012-08-06 11:40:07 -04:00
Eric Banks	210db5ec27	Update -maxAlleles argument to -maxAltAlleles to make it more accurate. The hidden GSA production -capMaxAllelesForIndels argument also gets updated.	2012-08-06 11:31:18 -04:00
Eric Banks	8f95a03bb6	Prevent NumberFormatExceptions when parsing the VCF POS field	2012-08-06 11:19:54 -04:00
Ryan Poplin	b7eec2fd0e	Bug fixes related to the changes in allele padding. If a haplotype started with an insertion it led to array index out of bounds. Haplotype allele insert function is now very simple because all alleles are treated the same way. HaplotypeUnitTest now uses a variant context instead of creating Allele objects directly.	2012-08-05 12:29:10 -04:00
Mark DePristo	e1bba91836	Ready for full-scale evaluation adaptive BQSR contexts -- VisualizeContextTree now can write out an equivalent BQSR table determined after adaptive context merging of all RG x QUAL x CONTEXT trees -- Docs, algorithm descriptions, etc so that it makes sense what's going on -- VisualizeContextTree should really be simplified when into a single tool that just visualize the trees when / if we decide to make adaptive contexts standard part of BQSR -- Misc. cleaning, organization of the code (recalibation tests were in private but corresponding actual files were public)	2012-08-03 16:02:53 -04:00
Guillermo del Angel	d2e8eb7b23	Fixed 2 haplotype caller unit tests: a) new interface for addReadLikelihoods() including read counts, b) disable test that test basic DeBruijn graph assembly, not ready yet	2012-08-03 14:26:51 -04:00
Ryan Poplin	c3b6e2b143	Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-08-03 13:14:43 -04:00
Ryan Poplin	ff80f17721	Using PathComparatorTotalScore in the assembly graph traversal does a better job of capturing low frequency branches that are inside high frequnecy haplotypes.	2012-08-03 13:14:37 -04:00
Guillermo del Angel	6f8e7692d4	Merge branch 'master' of ssh://gsa4.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-08-03 12:24:37 -04:00
Guillermo del Angel	9e25b209e0	First pass of implementation of Reduced Reads with HaplotypeCaller. Main changes: a) Active region: scale PL's by representative count to determine whether region is active. b) Scale per-read, per-haplotype likelihoods by read representative counts. A read representative count is (temporarily) defined as the average representative count over all bases in read, TBD whether this is good enough to avoid biases in GL's. c) DeBruijn assembler inserts kmers N times in graph, where N is min representative count of read over kmer span - TBD again whether this is the best approach. d) Bug fixes in FragmentUtils: logic to merge fragments was wrong in cases where there is discrepancy of overlaps between unclipped/soft clipped bases. Didn't affect things before but RR makes prevalence of hard-clipped bases in CIGARs more prevalent so this was exposed. e) Cache read representative counts along with read likelihoods associated with a Haplotype. Code can/should be cleaned up and unified with PairHMMIndelErrorModelCode, as well as refactored to support arbitrary ploidy in HaplotypeCaller	2012-08-03 12:24:23 -04:00
Ryan Poplin	8817fc70d1	Merged bug fix from Stable into Unstable	2012-08-03 10:45:01 -04:00
Ryan Poplin	f40d0a0a28	Updating VQSR to work with the MNP and symbolic variants that are coming out of the HaplotypeCaller. Integration tests change because of the MNPs in dbSNP.	2012-08-03 10:44:36 -04:00
Joel Thibault	51bd03cc36	Add RemoveProgramRecords annotation to ActiveRegionWalker	2012-08-03 09:54:16 -04:00
Joel Thibault	addbfd6437	Add a RemoveProgramRecords annotation * Add the RemoveProgramRecords annotation to LocusWalker	2012-08-03 09:54:16 -04:00
Joel Thibault	524d7ea306	Choose whether to keep program records based on Walker * Add keepProgramRecords argument * Make removeProgramRecords / keepProgramRecords override default	2012-08-03 09:54:16 -04:00
Mark DePristo	e04989f76d	Bugfix for new PASS position in dictionary in BCF2	2012-08-03 09:42:21 -04:00
Mark DePristo	d22b8cf86b	VisualizeContextTree now loops over M, I, and D states generating trees and analyzes	2012-08-02 17:30:30 -04:00
Mark DePristo	fb5dabce18	Update BCF2 to include a minor version number so we can rev (and report errors) with BCF2 -- We are no likely to fail with an error when reading old BCF files, rather than just giving bad results -- Added new class BCFVersion that consolidates all of the version management of BCF	2012-08-02 17:30:30 -04:00

1 2 3 4 5 ...

10188 Commits (5be7e0621deee477be725aae1084e06be9a387a3) All Branches Search

10188 Commits (5be7e0621deee477be725aae1084e06be9a387a3)

All Branches