gatk-3.8

Commit Graph

Author	SHA1	Message	Date
Eric Banks	b7639d7ceb	Merge branch 'master' of ssh://gsa2/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-10-05 16:21:17 -04:00
Eric Banks	52326942cf	Merge branch 'master' of ssh://gsa2/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-10-05 16:15:07 -04:00
Eric Banks	04853252a0	Possible fix for reduced reads coming from the HaplotypeCaller in the AD	2012-10-05 16:15:04 -04:00
Yossi Farjoun	ef90beb827	- forgot to use git rm to delete a file from git. Now that VCF is deleted. - uncommented a HC test that I missed.	2012-10-05 16:14:51 -04:00
Yossi Farjoun	6874a5ce76	This bam and bai are needed for testing the ADAnnotation tests (both UG and HC) The vcf file was mistakenly added previously, now removed.	2012-10-05 16:10:41 -04:00
Yossi Farjoun	d419a33ed1	* Added an integration test for AD annotation in the Haplotype caller. * Corrected FS Anotation for UG as for AD. * HC still does not annotate ReducedReads correctly (for FS nor AD)	2012-10-05 15:23:59 -04:00
Yossi Farjoun	dc4dcb4140	fixed AD annotation for a ReducedReads BAM file. Added an integration test for this case with a new reduced BAM in private/testdata	2012-10-05 14:20:07 -04:00
Eric Banks	f840d9edbd	HC test should continue using 3 alt alleles for indels	2012-10-05 02:03:34 -04:00
Eric Banks	c66ef17cd0	Add a separate max alt alleles argument for indels that defaults to 2 instead of 3. PLEASE TAKE NOTE.	2012-10-04 13:52:14 -04:00
Christopher Hartl	beaa1ac07e	Turns out GCTA replaces a missing variant with the mean dosage (2*frequency), but then normalizes the genetic distance by the number of non-missing genotype pairs. An odd thing to do, but with this the GRMs are confluent (up to a small tolerance)	2012-10-04 13:29:38 -04:00
Christopher Hartl	01dcdf2830	Waypoint: GRM is identical with GCTA if no genotypes are missing. Not sure how GCTA is treating these, but it's definitely not strictly excluding them.	2012-10-04 12:53:03 -04:00
Scott Frazer	3ffba77656	Revert "initial cancer pipeline with mutations and partial indel support" This reverts commit 4a2e5b1fcc3ad53dbb26d43eed1220b0257e9901.	2012-10-04 11:37:54 -04:00
Kristian Cibulskis	0afde9906a	initial cancer pipeline with mutations and partial indel support	2012-10-04 11:37:11 -04:00
Eric Banks	e13e61673b	Merge branch 'master' of ssh://gsa2/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-10-04 10:54:23 -04:00
Guillermo del Angel	49db96c8ad	BAM pipeline fixes: a) temp workaround for DEV-9: -nWayOut argument in IndelRealigner is broken, for now things will only really work in single sample mode, b) correct extension of RealignerTargetCreator output, previous extension caused an error	2012-10-04 10:53:13 -04:00
Eric Banks	dfddc4bb0e	Protect against cases where there are counts but no quals	2012-10-04 10:52:30 -04:00
Eric Banks	0c46845c92	Refactored the BaseCounts classes so that they are safer and allow for calculations on the most probable base (which is not necessarily the most common base).	2012-10-04 10:37:11 -04:00
Mark DePristo	b6e20e083a	Copied DiploidExactAFCalc to placeholder OptimizedDiploidExact -- Will be removed. Only commiting now to fix public -> private dependency	2012-10-03 20:16:38 -07:00
Mark DePristo	51cafa73e6	Removing public -> private dependency	2012-10-03 20:05:03 -07:00
Mark DePristo	f6a2ca6e7f	Fixes / TODOs for meaningful results with AFCalculationResult -- Right now the state of the AFCaclulationResult can be corrupt (ie, log10 likelihoods can be -Infinity). Forced me to disable reasonable contracts. Needs to be thought through -- exactCallsLog should be optional -- Update UG integration tests as the calculation of the normalized posteriors is done in a marginally different way so the output is rounded slightly differently.	2012-10-03 19:55:12 -07:00
Mark DePristo	50e4a832ea	Generalize framework for evaluating the performance and scaling of the ExactAF models to tri-allelic variants -- Wow, big performance problems with multi-allelic exact model!	2012-10-03 19:55:11 -07:00
Mark DePristo	3663fe1555	Framework for evaluating the performance and scaling of the ExactAF models	2012-10-03 19:55:11 -07:00
Mark DePristo	17ca543937	More ExactModel cleanup -- UnifiedGenotyperEngine no longer keeps a thread local double[2] array for the normalized posteriors array. This is way heavy-weight compared to just making the array each time. -- Added getNormalizedPosteriorOfAFGTZero and getNormalizedPosteriorOfAFzero to AFResult object. That's the place it should really live -- Add tests for priors, uncovering bugs in the contracts of the tri-allelic priors w.r.t. the AC of the MAP. Added TODOs	2012-10-03 19:55:11 -07:00
Mark DePristo	f8ef4332de	Count the number of evaluations in AFResult; expand unit tests -- AFResult now tracks the number of evaluations (turns through the model calculation) so we can now compute the scaling of exact model itself as a function of n samples -- Added unittests for priors (flat and human) -- Discovered nasty general ploidy bug (enabled with Guillermo_FIXME)	2012-10-03 19:55:11 -07:00
Mark DePristo	33c7841c4d	Add tests for non-informative samples in ExactAFCalculationModel	2012-10-03 19:55:11 -07:00
Mark DePristo	de941ddbbe	Cleanup Exact model, better unit tests -- Added combinatorial unit tests for both Diploid and General (in diploid-case) for 2 and 3 alleles in all combinations of sample types (i.e., AA, AB, BB and equiv. for tri-allelic). More assert statements to ensure quality of the result. -- Added docs (DOCUMENT YOUR CODE!) to AlleleFrequencyCalculationResult, with proper input error handling and contracts. Made mutation functions all protected -- No longer need to call reset on your AlleleFrequencyCalculationResult -- it'd done for you in the calculation function. reset is a protected method now, so it's all cleaner and nicer this way -- TODO still -- need to add edge-case tests for non-informative samples (0,0,0), for the impact of priors, and I need to add some way to test the result of the pNonRef	2012-10-03 19:55:11 -07:00
Mark DePristo	3e01a76590	Clean up AlleleFrequencyCalculation classes -- Added a true base class that only does truly common tasks (like manage call logging) -- This base class provides the only public method (getLog10PNonRef) and calls into a protected compute function that's abstract -- Split ExactAF into superclass ExactAF with common data structures and two subclasses: DiploidExact and GeneralPloidyExact -- Added an abstract reduceScope function that manages the simplification of the input VariantContext in the case where there are too many alleles or other constraints require us to only attempt a smaller computation -- All unit tests pass	2012-10-03 19:55:11 -07:00
Mark DePristo	1c52db4cdd	Add exactCallsLog output file to ExactModel and StandardCallerArgumentCollection -- This allows us to log all of the information about the exact model call (alleles, priors, PLs, result, and runtime) to a file for later debugging / optimization	2012-10-03 19:55:11 -07:00
Christopher Hartl	ca31ddf2a5	Allow VCFs without PLs to be converted to a bed file with genotypes other than no-call (by setting the minimum GQ to <=0). Performance enhancements to GRM suite.	2012-10-03 21:36:35 -04:00
Kristian Cibulskis	dca7c7fa9c	initial cancer pipeline with mutations and partial indel support	2012-10-03 16:25:34 -04:00
Christopher Hartl	1be8a88909	Changes: 1) GATKArgumentCollection has a command to turn off randomization if setting the seed isn't enough. Right now it's only hooked into RankSumTest. 2) RankSumTest now can be passed a boolean telling it whether to use a dithering or non-randomizing comparator. Unit tested. 3) VariantsToBinaryPed can now output in both individual-major and SNP-major mode. Integration test. 4) Updates to PlinkBed-handling python scripts and utilities. 5) Tool for calculating (LD-corrected) GRMs put under version control. This is analysis for T2D, but I don't want to lose it should something happen to my computer.	2012-10-03 16:02:42 -04:00
Guillermo del Angel	9e1592b8ba	Minor tweaks to CMIProcessing Pipeline: a) don't hard-code job mem limit to 4 G since it's too much for most AWS instances, leave it instead as input argument, b) minor doc cleanups	2012-10-03 12:05:57 -04:00
David Roazen	118e974731	GATK Engine: special-case "monolithic" FilePointers, and allow them to represent multiple contigs Sometimes the GATK engine creates a single monolithic FilePointer representing all regions in all BAM files. In such cases, the monolithic FilePointer is the only FilePointer emitted by the BAMScheduler, and it's safe to allow it to contain regions and intervals from multiple contigs. This fixes support for reading unindexed BAM files (since an unindexed BAM is one case in which the engine creates a monolithic FilePointer).	2012-10-02 15:30:03 -04:00
Mauricio Carneiro	7660e9f820	Reimplementation of the BAM procesing pipeline using the metadata information file. Pipeline runs end-to-end using example metadata and has been tested only for cases where everything is ideal. Next step is to bring this to the cloud, test all different scenario (multiple tumors, single ended, missing parameters etc). Parallel next step is to add QC metrics.	2012-10-02 14:05:34 -04:00
David Roazen	a96ed385df	ReadShard.getReadsSpan(): handle case where shard contains only unmapped mates Nasty, nasty bug -- if we were extremely unlucky with shard boundaries, we might end up with a shard containing only unmapped mates of mapped reads. In this case, ReadShard.getReadsSpan() would not behave correctly, since the shard as a whole would be marked "mapped" (since it refers to mapped intervals) yet consist only of unmapped mates of mapped reads located within those intervals.	2012-10-02 13:50:00 -04:00
Mauricio Carneiro	9a8f53e76c	Probably the GATK's most seen typo in the world	2012-10-02 13:34:37 -04:00
David Roazen	ac87ed47bb	BQSR: allow logging recal table updates to a file For testing/debugging purposes only	2012-10-01 14:18:34 -04:00
Christopher Hartl	2508b0f5a7	Merged bug fix from Stable into Unstable	2012-09-29 00:57:43 -04:00
Christopher Hartl	365f1d2429	hmk123's error on the forum came from the reference context occasionally lacking bases needed for validating the reference bases in the variant context. (no @Window for VariantsToBinaryPed). This bugfix adresses this and other minor items: 1) ValidateVariants removed in favor of direct validation VariantContexts. Integration test added to test broken contexts. 2) Enabling indel and SV output. Still bi-allelic sites only. Integration tests added for these cases. 3) Found a bug where GQ recalculation (if a genotype has PLs but no GQ) would only happen for flipped encoding. Fixed. Integration test added.	2012-09-29 00:55:31 -04:00
Ami Levy Moonshine	11540da98b	Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-09-28 21:01:25 -04:00
Eric Banks	2df5be702c	Added an argument to RR to allow polyploid consensus creation (by default it is turned off). This will eventually be replaced by the known SNPs track trigger.	2012-09-28 11:44:25 -04:00
Ami Levy Moonshine	fb9457d6fe	Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-09-27 22:40:39 -04:00
David Roazen	e740977994	GATK Engine: do not merge FilePointers that span multiple contigs This affects both the non-experimental and experimental engine paths, and so may break tests, but this is a necessary change.	2012-09-27 18:02:25 -04:00
David Roazen	e82946e5c9	ExperimentalReadShardBalancer: create one monolithic FilePointer per contig Merge all FilePointers for each contig into a single, merged, optimized FilePointer representing all regions to visit in all BAM files for a given contig. This helps us in several ways: -It allows us to create a single, persistent set of iterators for each contig, finally and definitively eliminating all Shard/FilePointer boundary issues for the new experimental ReadWalker downsampling -We no longer need to track low-level file positions in the sharding system (which was no longer possible anyway given the new experimental downsampling system) -We no longer revisit BAM file chunks that we've visited in the past -- all BAM file access is purely sequential -We no longer need to constantly recreate our full chain of read iterators There are also potential dangers: -We hold more BAM index data in memory at once. Given that we merge and optimize the index data during the merge, and only hold one contig's worth of data at a time, this does not appear to be a major issue. TODO: confirm this! -With a huge number of samples and intervals, the FilePointer merge operation might become expensive. With the latest implementation, this does not appear to be an issue even with a huge number of intervals (for one sample, at least), but if it turns out to be a problem for > 1 sample there are things we can do. Still TODO: unit tests for the new FilePointer.union() method	2012-09-27 14:47:54 -04:00
Mauricio Carneiro	a640afa995	adding some directories to gitignore	2012-09-27 11:09:41 -04:00
Mauricio Carneiro	3e68fee764	Removed the intellij files from the root and made an example package for new users. This allows users to start at the same page and then change it as they see fit without interfering with the repo (thanks guillermo!)	2012-09-27 11:04:56 -04:00
Christopher Hartl	abbe757907	Merged bug fix from Stable into Unstable	2012-09-27 00:15:35 -04:00
Christopher Hartl	55cdf4f9b7	Commit changes in Variants To Binary Ped to the stable repository to be available prior to next release.	2012-09-27 00:13:32 -04:00
Mauricio Carneiro	b9dab068ee	New version of the pipeline starting from an ALIGNED bam going all the way to reducing using n-way out cleaning	2012-09-26 16:16:53 -04:00
Mauricio Carneiro	f8b954334e	Revised implementation of the RAWBAM => BAM pipeline stripped out all the FQ pipeline and tumor/normal information.	2012-09-26 13:37:15 -04:00

... 8 9 10 11 12 ...

11158 Commits (4ced2e4ffc7d457cb9a8aad4c4aa2cb3cd3fb705) All Branches Search

11158 Commits (4ced2e4ffc7d457cb9a8aad4c4aa2cb3cd3fb705)

All Branches