gatk-3.8

Commit Graph

Author	SHA1	Message	Date
David Roazen	fba6a084e4	Testing github auto-mirroring attempt #2 ; please ignore	2012-10-10 15:28:13 -04:00
David Roazen	267d1ff59c	Revert "Testing the new github auto-mirroring; please ignore" This reverts commit bd8b321132167f6f393f234ea0e93edcfd8701ff.	2012-10-10 15:07:48 -04:00
David Roazen	66ee3f230f	Testing the new github auto-mirroring; please ignore	2012-10-10 15:06:50 -04:00
Ryan Poplin	15b405d458	Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-10-10 10:47:40 -04:00
Ryan Poplin	2a9ee89c19	Turning on allele trimming for the haplotype caller.	2012-10-10 10:47:26 -04:00
Khalid Shakir	f66284658d	RetryMemoryLimit now works with Scatter/Gather.	2012-10-09 21:51:03 -04:00
Johan Dahlberg	e9b9e2318c	Fixed SortSam bug, for .done file The *.bai.done file for the .bai file was written in the run directory instead of in the specified output directory. Changing getName() to getAbsolutePath() fixes this. Signed-off-by: Joel Thibault <thibault@broadinstitute.org>	2012-10-09 16:25:18 -04:00
Ryan Poplin	b543bddbb7	Fixing merge conflicts related to the comment formatting in the BQSR.	2012-10-08 10:23:08 -04:00
Ryan Poplin	b3cc04976f	Fixing BQSR bug reported on the forum for reads that being with insertions.	2012-10-08 10:18:29 -04:00
Eric Banks	be9fcba546	Don't allow triggering of polyploid consensus creation in regions where there is more than one het, as it just doesn't work properly. We could probably refactor at some point to make it work, but it's not worth doing that now (especially as it should be rare to have multiple proximal known hets in a single sample exome).	2012-10-07 16:32:48 -04:00
Eric Banks	08ac80c080	RR bug: when the last base in the window around the polyploid consensus is filtered (low quality), the filtered consensus is not flushed and subsequent filtered bases (but importantly not contiguous to this one) are just added to this position. In other words, bases were being added to the wrong genomic positions. Fixed.	2012-10-07 10:52:01 -04:00
Eric Banks	36a26a7da6	md5s failed because I forgot to add --no_cmdline_in_header so it is different depending on where you run from. Fixed.	2012-10-07 08:35:55 -04:00
Eric Banks	a5aaa14aaa	Fix for GSA-601: Indels dropped during liftover. This was a true bug that was an effect of the switch over to the non-null representation of alleles in the VariantContext. Unfortunately, this tool didn't have integration tests - but it does now.	2012-10-07 01:19:52 -04:00
Eric Banks	82e40340c0	Use StringBuilder over StringBuffer	2012-10-07 00:02:15 -04:00
Eric Banks	5d6aad67e2	Fix for bug reported on forums: VariantsToTable does not handle lists and nested arrays correctly. Added an integration test to cover printing of PLs.	2012-10-07 00:01:27 -04:00
Eric Banks	e7798ddd2a	Fix for JIRA GSA-598: AD field not handled properly by CombineVariants. It was also not handled by SelectVariants either. We now strip the AD field out whenever combining/selecting makes it invalid due to a changing of the number of ALT alleles.	2012-10-06 23:02:36 -04:00
Eric Banks	bfc551f612	Fix for GSA-589: SelectVariants with -number gives biased results. The implementation was not good and it's not worth keeping this busted code around given that we have a working implementation of a fractional random sampling already in place, so I removed it.	2012-10-06 22:39:49 -04:00
Eric Banks	e8a6460a33	After merging with Yossi's fix I can confirm that the AD is fixed when going through the HC too. Added similar fixes to DP and FS annotations too.	2012-10-05 16:37:42 -04:00
Eric Banks	b7639d7ceb	Merge branch 'master' of ssh://gsa2/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-10-05 16:21:17 -04:00
Eric Banks	52326942cf	Merge branch 'master' of ssh://gsa2/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-10-05 16:15:07 -04:00
Eric Banks	04853252a0	Possible fix for reduced reads coming from the HaplotypeCaller in the AD	2012-10-05 16:15:04 -04:00
Yossi Farjoun	ef90beb827	- forgot to use git rm to delete a file from git. Now that VCF is deleted. - uncommented a HC test that I missed.	2012-10-05 16:14:51 -04:00
Yossi Farjoun	6874a5ce76	This bam and bai are needed for testing the ADAnnotation tests (both UG and HC) The vcf file was mistakenly added previously, now removed.	2012-10-05 16:10:41 -04:00
Yossi Farjoun	d419a33ed1	* Added an integration test for AD annotation in the Haplotype caller. * Corrected FS Anotation for UG as for AD. * HC still does not annotate ReducedReads correctly (for FS nor AD)	2012-10-05 15:23:59 -04:00
Yossi Farjoun	dc4dcb4140	fixed AD annotation for a ReducedReads BAM file. Added an integration test for this case with a new reduced BAM in private/testdata	2012-10-05 14:20:07 -04:00
Eric Banks	f840d9edbd	HC test should continue using 3 alt alleles for indels	2012-10-05 02:03:34 -04:00
Eric Banks	c66ef17cd0	Add a separate max alt alleles argument for indels that defaults to 2 instead of 3. PLEASE TAKE NOTE.	2012-10-04 13:52:14 -04:00
Eric Banks	e13e61673b	Merge branch 'master' of ssh://gsa2/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-10-04 10:54:23 -04:00
Eric Banks	dfddc4bb0e	Protect against cases where there are counts but no quals	2012-10-04 10:52:30 -04:00
Eric Banks	0c46845c92	Refactored the BaseCounts classes so that they are safer and allow for calculations on the most probable base (which is not necessarily the most common base).	2012-10-04 10:37:11 -04:00
Mark DePristo	b6e20e083a	Copied DiploidExactAFCalc to placeholder OptimizedDiploidExact -- Will be removed. Only commiting now to fix public -> private dependency	2012-10-03 20:16:38 -07:00
Mark DePristo	51cafa73e6	Removing public -> private dependency	2012-10-03 20:05:03 -07:00
Mark DePristo	f6a2ca6e7f	Fixes / TODOs for meaningful results with AFCalculationResult -- Right now the state of the AFCaclulationResult can be corrupt (ie, log10 likelihoods can be -Infinity). Forced me to disable reasonable contracts. Needs to be thought through -- exactCallsLog should be optional -- Update UG integration tests as the calculation of the normalized posteriors is done in a marginally different way so the output is rounded slightly differently.	2012-10-03 19:55:12 -07:00
Mark DePristo	50e4a832ea	Generalize framework for evaluating the performance and scaling of the ExactAF models to tri-allelic variants -- Wow, big performance problems with multi-allelic exact model!	2012-10-03 19:55:11 -07:00
Mark DePristo	3663fe1555	Framework for evaluating the performance and scaling of the ExactAF models	2012-10-03 19:55:11 -07:00
Mark DePristo	17ca543937	More ExactModel cleanup -- UnifiedGenotyperEngine no longer keeps a thread local double[2] array for the normalized posteriors array. This is way heavy-weight compared to just making the array each time. -- Added getNormalizedPosteriorOfAFGTZero and getNormalizedPosteriorOfAFzero to AFResult object. That's the place it should really live -- Add tests for priors, uncovering bugs in the contracts of the tri-allelic priors w.r.t. the AC of the MAP. Added TODOs	2012-10-03 19:55:11 -07:00
Mark DePristo	f8ef4332de	Count the number of evaluations in AFResult; expand unit tests -- AFResult now tracks the number of evaluations (turns through the model calculation) so we can now compute the scaling of exact model itself as a function of n samples -- Added unittests for priors (flat and human) -- Discovered nasty general ploidy bug (enabled with Guillermo_FIXME)	2012-10-03 19:55:11 -07:00
Mark DePristo	33c7841c4d	Add tests for non-informative samples in ExactAFCalculationModel	2012-10-03 19:55:11 -07:00
Mark DePristo	de941ddbbe	Cleanup Exact model, better unit tests -- Added combinatorial unit tests for both Diploid and General (in diploid-case) for 2 and 3 alleles in all combinations of sample types (i.e., AA, AB, BB and equiv. for tri-allelic). More assert statements to ensure quality of the result. -- Added docs (DOCUMENT YOUR CODE!) to AlleleFrequencyCalculationResult, with proper input error handling and contracts. Made mutation functions all protected -- No longer need to call reset on your AlleleFrequencyCalculationResult -- it'd done for you in the calculation function. reset is a protected method now, so it's all cleaner and nicer this way -- TODO still -- need to add edge-case tests for non-informative samples (0,0,0), for the impact of priors, and I need to add some way to test the result of the pNonRef	2012-10-03 19:55:11 -07:00
Mark DePristo	3e01a76590	Clean up AlleleFrequencyCalculation classes -- Added a true base class that only does truly common tasks (like manage call logging) -- This base class provides the only public method (getLog10PNonRef) and calls into a protected compute function that's abstract -- Split ExactAF into superclass ExactAF with common data structures and two subclasses: DiploidExact and GeneralPloidyExact -- Added an abstract reduceScope function that manages the simplification of the input VariantContext in the case where there are too many alleles or other constraints require us to only attempt a smaller computation -- All unit tests pass	2012-10-03 19:55:11 -07:00
Mark DePristo	1c52db4cdd	Add exactCallsLog output file to ExactModel and StandardCallerArgumentCollection -- This allows us to log all of the information about the exact model call (alleles, priors, PLs, result, and runtime) to a file for later debugging / optimization	2012-10-03 19:55:11 -07:00
David Roazen	118e974731	GATK Engine: special-case "monolithic" FilePointers, and allow them to represent multiple contigs Sometimes the GATK engine creates a single monolithic FilePointer representing all regions in all BAM files. In such cases, the monolithic FilePointer is the only FilePointer emitted by the BAMScheduler, and it's safe to allow it to contain regions and intervals from multiple contigs. This fixes support for reading unindexed BAM files (since an unindexed BAM is one case in which the engine creates a monolithic FilePointer).	2012-10-02 15:30:03 -04:00
David Roazen	a96ed385df	ReadShard.getReadsSpan(): handle case where shard contains only unmapped mates Nasty, nasty bug -- if we were extremely unlucky with shard boundaries, we might end up with a shard containing only unmapped mates of mapped reads. In this case, ReadShard.getReadsSpan() would not behave correctly, since the shard as a whole would be marked "mapped" (since it refers to mapped intervals) yet consist only of unmapped mates of mapped reads located within those intervals.	2012-10-02 13:50:00 -04:00
Mauricio Carneiro	9a8f53e76c	Probably the GATK's most seen typo in the world	2012-10-02 13:34:37 -04:00
David Roazen	ac87ed47bb	BQSR: allow logging recal table updates to a file For testing/debugging purposes only	2012-10-01 14:18:34 -04:00
Christopher Hartl	2508b0f5a7	Merged bug fix from Stable into Unstable	2012-09-29 00:57:43 -04:00
Christopher Hartl	365f1d2429	hmk123's error on the forum came from the reference context occasionally lacking bases needed for validating the reference bases in the variant context. (no @Window for VariantsToBinaryPed). This bugfix adresses this and other minor items: 1) ValidateVariants removed in favor of direct validation VariantContexts. Integration test added to test broken contexts. 2) Enabling indel and SV output. Still bi-allelic sites only. Integration tests added for these cases. 3) Found a bug where GQ recalculation (if a genotype has PLs but no GQ) would only happen for flipped encoding. Fixed. Integration test added.	2012-09-29 00:55:31 -04:00
Eric Banks	2df5be702c	Added an argument to RR to allow polyploid consensus creation (by default it is turned off). This will eventually be replaced by the known SNPs track trigger.	2012-09-28 11:44:25 -04:00
David Roazen	e740977994	GATK Engine: do not merge FilePointers that span multiple contigs This affects both the non-experimental and experimental engine paths, and so may break tests, but this is a necessary change.	2012-09-27 18:02:25 -04:00
David Roazen	e82946e5c9	ExperimentalReadShardBalancer: create one monolithic FilePointer per contig Merge all FilePointers for each contig into a single, merged, optimized FilePointer representing all regions to visit in all BAM files for a given contig. This helps us in several ways: -It allows us to create a single, persistent set of iterators for each contig, finally and definitively eliminating all Shard/FilePointer boundary issues for the new experimental ReadWalker downsampling -We no longer need to track low-level file positions in the sharding system (which was no longer possible anyway given the new experimental downsampling system) -We no longer revisit BAM file chunks that we've visited in the past -- all BAM file access is purely sequential -We no longer need to constantly recreate our full chain of read iterators There are also potential dangers: -We hold more BAM index data in memory at once. Given that we merge and optimize the index data during the merge, and only hold one contig's worth of data at a time, this does not appear to be a major issue. TODO: confirm this! -With a huge number of samples and intervals, the FilePointer merge operation might become expensive. With the latest implementation, this does not appear to be an issue even with a huge number of intervals (for one sample, at least), but if it turns out to be a problem for > 1 sample there are things we can do. Still TODO: unit tests for the new FilePointer.union() method	2012-09-27 14:47:54 -04:00

1 2 3 4 5 ...

10702 Commits (fba6a084e4fba8a31aca0b9dad4d4f7232902507) All Branches Search

10702 Commits (fba6a084e4fba8a31aca0b9dad4d4f7232902507)

All Branches