gatk-3.8

Commit Graph

Author	SHA1	Message	Date
Mark DePristo	ee2f12e2ac	Simpler naming convention for AlleleFrequencyCalculation => AFCalc	2012-10-15 07:53:55 -04:00
Mark DePristo	cf3f9d6ee8	Reorganize and cleanup AFCalculations -- Now contained in a package called afcalc -- Extracted standard alone classes from private static classes in ExactAF -- Most fields are now private, with accessors -- Overall cleaner organization now	2012-10-15 07:53:55 -04:00
Mark DePristo	13211231c7	Restructure and cleanup ExactAFCalculations -- Now there's no duplication between exact old and constrained models. The behavior is controlled by an overloaded abstract function -- No more static function to access the linear exact model -- you have to create the surrounding class. Updated code in the system -- Everything passes unit tests	2012-10-15 07:53:54 -04:00
Mark DePristo	f800f3fb88	Optimized diploid exact AF calculation uses maxACs to stop the calculation by maxAC by allele -- Added unit tests to ensure the approximation isn't so far from our reference implementation (DiploidExactAFCalculation)	2012-10-15 07:53:54 -04:00
Mark DePristo	efad215edb	Greedy version of function to compute the max achievable AC for each alt allele -- walks over the genotypes in VC, and computes for each alt allele the maximum AC we need to consider in that alt allele dimension. Does the calculation based on the PLs in each genotype g, choosing to update the max AC for the alt alleles corresponding to that PL. Only takes the first lowest PL, if there are multiple genotype configurations with the same PL value. It takes values in the order of the alt alleles.	2012-10-15 07:53:54 -04:00
Mark DePristo	7666a58773	Function to compute the max achievable AC for each alt allele -- Additional minor cleanup of ExactAFCalculation	2012-10-15 07:53:53 -04:00
Eric Banks	a8efa5451a	Protect against bad bases users have screwy data (or try to use zipped references)	2012-10-12 15:05:03 -04:00
Eric Banks	81532a0529	Missing file are user errors.	2012-10-12 09:48:12 -04:00
Eric Banks	fa77a83783	Update the out of space error to include another permutation	2012-10-12 09:38:12 -04:00
Eric Banks	85525d9e6e	Make Geraldine's life easier: from now on we treat problems where a temp file cannot be found when running the GATK with multiple threads as User Errors (since they are 99.9% of the time). This is an extremely large class of errors in Tableau and on the forums. Helpful error message tells users exactly what we tell them on the forums anyways (Geraldine: feel free to edit).	2012-10-12 09:19:50 -04:00
Eric Banks	ad60300bee	Catch malformed BAM files at the source since this is the largest class of errors in Tableau.	2012-10-12 09:07:57 -04:00
Eric Banks	593c8065d9	Fix docs for BadMateFilter	2012-10-12 08:35:45 -04:00
Christopher Hartl	6b9987cf1b	Merge branch 'master' of gsa2:/humgen/gsa-scr1/chartl/dev/unstable	2012-10-12 00:48:42 -04:00
David Roazen	3861212dab	Fix inefficiency in FilePointer GenomeLoc validation Validation of GenomeLocs in the FilePointer class was extremely inefficient when the GenomeLocs were added one at a time rather than all at once. Appears to mostly fix GSA-604	2012-10-11 19:55:14 -04:00
Ryan Poplin	08b8ce6903	Fixing merge conflicts related to the comment formatting in the BQSR.	2012-10-10 16:03:58 -04:00
Ryan Poplin	45717349dc	Fixing BQSR bug reported on the forum for reads that begin with insertions.	2012-10-10 16:01:37 -04:00
Ryan Poplin	2a9ee89c19	Turning on allele trimming for the haplotype caller.	2012-10-10 10:47:26 -04:00
Ryan Poplin	b543bddbb7	Fixing merge conflicts related to the comment formatting in the BQSR.	2012-10-08 10:23:08 -04:00
Ryan Poplin	b3cc04976f	Fixing BQSR bug reported on the forum for reads that being with insertions.	2012-10-08 10:18:29 -04:00
Eric Banks	36a26a7da6	md5s failed because I forgot to add --no_cmdline_in_header so it is different depending on where you run from. Fixed.	2012-10-07 08:35:55 -04:00
Eric Banks	a5aaa14aaa	Fix for GSA-601: Indels dropped during liftover. This was a true bug that was an effect of the switch over to the non-null representation of alleles in the VariantContext. Unfortunately, this tool didn't have integration tests - but it does now.	2012-10-07 01:19:52 -04:00
Eric Banks	82e40340c0	Use StringBuilder over StringBuffer	2012-10-07 00:02:15 -04:00
Eric Banks	5d6aad67e2	Fix for bug reported on forums: VariantsToTable does not handle lists and nested arrays correctly. Added an integration test to cover printing of PLs.	2012-10-07 00:01:27 -04:00
Eric Banks	e7798ddd2a	Fix for JIRA GSA-598: AD field not handled properly by CombineVariants. It was also not handled by SelectVariants either. We now strip the AD field out whenever combining/selecting makes it invalid due to a changing of the number of ALT alleles.	2012-10-06 23:02:36 -04:00
Eric Banks	bfc551f612	Fix for GSA-589: SelectVariants with -number gives biased results. The implementation was not good and it's not worth keeping this busted code around given that we have a working implementation of a fractional random sampling already in place, so I removed it.	2012-10-06 22:39:49 -04:00
Eric Banks	e8a6460a33	After merging with Yossi's fix I can confirm that the AD is fixed when going through the HC too. Added similar fixes to DP and FS annotations too.	2012-10-05 16:37:42 -04:00
Eric Banks	52326942cf	Merge branch 'master' of ssh://gsa2/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-10-05 16:15:07 -04:00
Eric Banks	04853252a0	Possible fix for reduced reads coming from the HaplotypeCaller in the AD	2012-10-05 16:15:04 -04:00
Yossi Farjoun	d419a33ed1	* Added an integration test for AD annotation in the Haplotype caller. * Corrected FS Anotation for UG as for AD. * HC still does not annotate ReducedReads correctly (for FS nor AD)	2012-10-05 15:23:59 -04:00
Yossi Farjoun	dc4dcb4140	fixed AD annotation for a ReducedReads BAM file. Added an integration test for this case with a new reduced BAM in private/testdata	2012-10-05 14:20:07 -04:00
Eric Banks	c66ef17cd0	Add a separate max alt alleles argument for indels that defaults to 2 instead of 3. PLEASE TAKE NOTE.	2012-10-04 13:52:14 -04:00
Mark DePristo	b6e20e083a	Copied DiploidExactAFCalc to placeholder OptimizedDiploidExact -- Will be removed. Only commiting now to fix public -> private dependency	2012-10-03 20:16:38 -07:00
Mark DePristo	51cafa73e6	Removing public -> private dependency	2012-10-03 20:05:03 -07:00
Mark DePristo	f6a2ca6e7f	Fixes / TODOs for meaningful results with AFCalculationResult -- Right now the state of the AFCaclulationResult can be corrupt (ie, log10 likelihoods can be -Infinity). Forced me to disable reasonable contracts. Needs to be thought through -- exactCallsLog should be optional -- Update UG integration tests as the calculation of the normalized posteriors is done in a marginally different way so the output is rounded slightly differently.	2012-10-03 19:55:12 -07:00
Mark DePristo	50e4a832ea	Generalize framework for evaluating the performance and scaling of the ExactAF models to tri-allelic variants -- Wow, big performance problems with multi-allelic exact model!	2012-10-03 19:55:11 -07:00
Mark DePristo	3663fe1555	Framework for evaluating the performance and scaling of the ExactAF models	2012-10-03 19:55:11 -07:00
Mark DePristo	17ca543937	More ExactModel cleanup -- UnifiedGenotyperEngine no longer keeps a thread local double[2] array for the normalized posteriors array. This is way heavy-weight compared to just making the array each time. -- Added getNormalizedPosteriorOfAFGTZero and getNormalizedPosteriorOfAFzero to AFResult object. That's the place it should really live -- Add tests for priors, uncovering bugs in the contracts of the tri-allelic priors w.r.t. the AC of the MAP. Added TODOs	2012-10-03 19:55:11 -07:00
Mark DePristo	f8ef4332de	Count the number of evaluations in AFResult; expand unit tests -- AFResult now tracks the number of evaluations (turns through the model calculation) so we can now compute the scaling of exact model itself as a function of n samples -- Added unittests for priors (flat and human) -- Discovered nasty general ploidy bug (enabled with Guillermo_FIXME)	2012-10-03 19:55:11 -07:00
Mark DePristo	33c7841c4d	Add tests for non-informative samples in ExactAFCalculationModel	2012-10-03 19:55:11 -07:00
Mark DePristo	de941ddbbe	Cleanup Exact model, better unit tests -- Added combinatorial unit tests for both Diploid and General (in diploid-case) for 2 and 3 alleles in all combinations of sample types (i.e., AA, AB, BB and equiv. for tri-allelic). More assert statements to ensure quality of the result. -- Added docs (DOCUMENT YOUR CODE!) to AlleleFrequencyCalculationResult, with proper input error handling and contracts. Made mutation functions all protected -- No longer need to call reset on your AlleleFrequencyCalculationResult -- it'd done for you in the calculation function. reset is a protected method now, so it's all cleaner and nicer this way -- TODO still -- need to add edge-case tests for non-informative samples (0,0,0), for the impact of priors, and I need to add some way to test the result of the pNonRef	2012-10-03 19:55:11 -07:00
Mark DePristo	3e01a76590	Clean up AlleleFrequencyCalculation classes -- Added a true base class that only does truly common tasks (like manage call logging) -- This base class provides the only public method (getLog10PNonRef) and calls into a protected compute function that's abstract -- Split ExactAF into superclass ExactAF with common data structures and two subclasses: DiploidExact and GeneralPloidyExact -- Added an abstract reduceScope function that manages the simplification of the input VariantContext in the case where there are too many alleles or other constraints require us to only attempt a smaller computation -- All unit tests pass	2012-10-03 19:55:11 -07:00
Mark DePristo	1c52db4cdd	Add exactCallsLog output file to ExactModel and StandardCallerArgumentCollection -- This allows us to log all of the information about the exact model call (alleles, priors, PLs, result, and runtime) to a file for later debugging / optimization	2012-10-03 19:55:11 -07:00
Christopher Hartl	ca31ddf2a5	Allow VCFs without PLs to be converted to a bed file with genotypes other than no-call (by setting the minimum GQ to <=0). Performance enhancements to GRM suite.	2012-10-03 21:36:35 -04:00
Christopher Hartl	1be8a88909	Changes: 1) GATKArgumentCollection has a command to turn off randomization if setting the seed isn't enough. Right now it's only hooked into RankSumTest. 2) RankSumTest now can be passed a boolean telling it whether to use a dithering or non-randomizing comparator. Unit tested. 3) VariantsToBinaryPed can now output in both individual-major and SNP-major mode. Integration test. 4) Updates to PlinkBed-handling python scripts and utilities. 5) Tool for calculating (LD-corrected) GRMs put under version control. This is analysis for T2D, but I don't want to lose it should something happen to my computer.	2012-10-03 16:02:42 -04:00
David Roazen	118e974731	GATK Engine: special-case "monolithic" FilePointers, and allow them to represent multiple contigs Sometimes the GATK engine creates a single monolithic FilePointer representing all regions in all BAM files. In such cases, the monolithic FilePointer is the only FilePointer emitted by the BAMScheduler, and it's safe to allow it to contain regions and intervals from multiple contigs. This fixes support for reading unindexed BAM files (since an unindexed BAM is one case in which the engine creates a monolithic FilePointer).	2012-10-02 15:30:03 -04:00
David Roazen	a96ed385df	ReadShard.getReadsSpan(): handle case where shard contains only unmapped mates Nasty, nasty bug -- if we were extremely unlucky with shard boundaries, we might end up with a shard containing only unmapped mates of mapped reads. In this case, ReadShard.getReadsSpan() would not behave correctly, since the shard as a whole would be marked "mapped" (since it refers to mapped intervals) yet consist only of unmapped mates of mapped reads located within those intervals.	2012-10-02 13:50:00 -04:00
David Roazen	ac87ed47bb	BQSR: allow logging recal table updates to a file For testing/debugging purposes only	2012-10-01 14:18:34 -04:00
Christopher Hartl	2508b0f5a7	Merged bug fix from Stable into Unstable	2012-09-29 00:57:43 -04:00
Christopher Hartl	365f1d2429	hmk123's error on the forum came from the reference context occasionally lacking bases needed for validating the reference bases in the variant context. (no @Window for VariantsToBinaryPed). This bugfix adresses this and other minor items: 1) ValidateVariants removed in favor of direct validation VariantContexts. Integration test added to test broken contexts. 2) Enabling indel and SV output. Still bi-allelic sites only. Integration tests added for these cases. 3) Found a bug where GQ recalculation (if a genotype has PLs but no GQ) would only happen for flipped encoding. Fixed. Integration test added.	2012-09-29 00:55:31 -04:00
David Roazen	e740977994	GATK Engine: do not merge FilePointers that span multiple contigs This affects both the non-experimental and experimental engine paths, and so may break tests, but this is a necessary change.	2012-09-27 18:02:25 -04:00

1 2 3 4 5 ...

2859 Commits (5a4e2a5fa4d7ee7c6d7773d261eebc8a3ff349f1)