gatk-3.8

Commit Graph

Author	SHA1	Message	Date
Guillermo del Angel	d4e7655d14	Added ability to call multiallelic indels, if -multiallelic is included in UG arguments. Simple idea: we genotype all alleles with count >= minIndelCnt. To support this, refactored code that computes consensus alleles. To ease merging of mulitple alt alleles, we create a single vc for each alt alleles and then use VariantContextUtils.simpleMerge to carry out merging, which takes care of handling all corner conditions already. In order to use this, interface to GenotypeLikelihoodsCalculationModel changed to pass in a GenomeLocParser object (why are these objects to hard to handle??). More testing is required and feature turned off my default.	2012-01-06 11:24:38 -05:00
Mark DePristo	dd80ffbbbe	Merged bug fix from Stable into Unstable	2012-01-05 21:51:48 -05:00
Mark DePristo	c96fee477c	Bug fix for VariantSummary -- Call sets with indels > 50 bp in length are tagged as CNVs in the tag (following the 1000 Genomes convention) and were unconditionally checking whether the CNV is already known, by looking at the known cnvs file, which is optional. Fixed. Has the annoying side effect that indels > 50bp in size are not counted as indels, and so are substrated from both the novel and known counts for indels. C'est la vie -- Added integration test to check for this case, using Mauricio's most recent VCF file for NA12878 which has many large indels. Using this more recent and representative file probably a good idea for more future tests in VE and other tools. File is NA12878.HiSeq.WGS.b37_decoy.indel.recalibrated.vcf in Validation_Data	2012-01-05 21:51:06 -05:00
Eric Banks	f5e10e9879	Merged bug fix from Stable into Unstable	2012-01-05 15:35:09 -05:00
Eric Banks	18ed954741	Compute Ti/Tv only if bi-allelic	2012-01-05 15:33:26 -05:00
Ryan Poplin	a6886a4cc0	Initial commit of the Active Region Traversal. Not ready to be used by anyone yet.	2012-01-04 17:03:21 -05:00
Guillermo del Angel	58d4539304	Enabled banded indel computation by default. Reversed logic in input UG argument so that we can still disable it if required. Minor changes to integration tests due to minor differences in GL's and in annotations	2012-01-04 15:28:26 -05:00
Mauricio Carneiro	9ff8a01da2	Merged bug fix from Stable into Unstable	2012-01-03 18:10:39 -05:00
Mauricio Carneiro	9b55505c03	Fixing PairHMMIndelErrorModel array out of bounds This error was due to the ReadClipper change of contract. Before the read utils would return null if a read was entirely clipped, now it returns an empty (safe) GATKSAMRecord.	2012-01-03 18:08:46 -05:00
Christopher Hartl	2c3a9ce02f	Merge branch 'master' of ssh://tin.broadinstitute.org/humgen/gsa-scr1/chartl/dev/unstable	2012-01-03 17:25:56 -05:00
David Roazen	621ee2b613	Merged bug fix from Stable into Unstable	2012-01-03 16:56:49 -05:00
Christopher Hartl	9093de1132	Cleanup: remove code to calculate the MLE AC in the UGE.	2012-01-03 15:58:51 -05:00
Christopher Hartl	2d093828a4	Final changes to Junky (been frozen for a while, but uncommitted) and the qscript for it. A first cursory implementation of the trellis-based Exact AC-constrained genotyping algorithm in UGE. Nothing calls into it, so this should be entirely safe (and, no surprise, it passes UG integration tests).	2012-01-03 15:33:04 -05:00
David Roazen	ea6e718cb8	SnpEff 2.0.5 support. Re-enabled SnpEff in the HybridSelectionPipeline. For now, we recommend only running with the GRCh37.64 database.	2012-01-03 15:18:36 -05:00
Christopher Hartl	93e1417b6e	Update to the VSS GATK documentation.	2012-01-03 13:39:31 -05:00
David Roazen	4984ca5e31	Merged bug fix from Stable into Unstable	2012-01-03 11:03:30 -05:00
David Roazen	f3f01da1af	Enforce serial dependencies in RecalibrationWalkersIntegrationTest Some tests in this class were intermittently not being executed due to being randomly scheduled before tests whose results they depend on. Now the serial dependencies are enforced to avoid problematic orderings.	2012-01-03 10:42:41 -05:00
Eric Banks	ab8d47d9a5	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-01-03 09:38:49 -05:00
Mauricio Carneiro	3d4bf273de	Added getPileupForReadGroups to ReadBackPileup * returns a pileup for all the read groups provided. * saves us from multiple calls to getPileup (which is very inefficient)	2012-01-03 09:35:11 -05:00
Mauricio Carneiro	4a208c7c06	Refactor of the downsampling machinery to accept different strategies * Implemented Adaptive downsampler * Added integration test * Added option to RRead scala script to choose downsampling strategy	2012-01-03 09:29:47 -05:00
Mauricio Carneiro	21ae3ef5f9	Added downsampling support to ReduceReads * Downsampling is now a parameter to the walker with default value of 0 (no downsampling) * Downsampling selects reads at random at the variant region window and strives to achieve uniform coverage if possible around the desired downsampling value. * Added integration test	2012-01-03 09:29:46 -05:00
Mauricio Carneiro	cd68cc239b	Added knuth-shuffle (KS) and randomSubset using KS to MathUtils * Knuth-shuffle is a simple, yet effective array permutator (hope this is good english). * added a simple randomSubset that returns a random subset without repeats of any given array with the same probability for every permutation. * added unit tests to both functions	2012-01-03 09:29:46 -05:00
Mauricio Carneiro	94791a2a75	Add support for reads starting with insertion * Modified cleanCigarShift to allow insertions in the beginning and end of the read * Allowed cigars starting/ending in insertions in the systematic ReadClipper tests * Updated all ReadClipper unit tests * ReduceReads does not hard clip leading insertions by default anymore * SlidingWindow adjusts start location if read starts with insertion * SlidingWindow creates an empty element with insertions to the right * Fixed all potential divide by zero with totalCount() (from BaseCounts) * Updated all Integration tests * Added new integration test for multiple interval reducing	2012-01-03 09:29:45 -05:00
Mark DePristo	d05f0c2318	GATKPerformanceOverTime script update -- Automatic detection of most recent version of GATK release (just tell the script now to use 1.2, 1.3, and 1.4) -- Uses 1.4 now -- By default we do 9 runs of each non-parallel test -- In PathUtils added convenience utility to find most recent release GATK jar with a specific release number	2012-01-02 09:58:46 -05:00
Mauricio Carneiro	1b6d52817e	fixing adaptor clipping effect on recalibration integration test	2012-01-01 22:20:06 -05:00
Eric Banks	393993e0c7	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-12-31 20:42:46 -05:00
Mauricio Carneiro	55cfa76cf3	Updated integration tests for the new adaptor clipping fix.	2011-12-30 18:47:14 -05:00
Mauricio Carneiro	c7d0a9ebee	Forgot to test for inter-chromosomal mates in the adaptor clipping * Fixing bug caught by Eric (and Kristian)	2011-12-30 00:19:53 -05:00
Matt Hanna	a259bfefd4	First commit addressing problems running RTC in parallel. Turns out that because the RTC is the first walker to 'correctly' tree reduce according to functional programming standards, the RTC has revealed a few problems with the tree reducer holding on to too much data. This is the first and smaller of two commits to reduce memory consumption. The second commit will likely be pushed after GATK1.4 is released.	2011-12-29 16:22:14 -05:00
Eric Banks	1a45ea5a05	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-12-29 11:37:15 -05:00
Mauricio Carneiro	f692911903	GATKSAMRecord emptyRead static constructor * Creates an empty GATKSAMRecord with empty (not null) Cigar, bases and quals. Allows empty reads to be probed without breaking. * All ReadClipper utilities now emit empty reads for fully clipped reads	2011-12-27 17:01:17 -05:00
Mauricio Carneiro	8259c748f2	No more Filtered Reads tag. All synthetic reads are marked with the reduced read tag.	2011-12-27 17:01:17 -05:00
Eric Banks	d20a25d681	A much better way of choosing the alternate allele(s) to genotype in the SNP model of UG: instead of looking at the sum of base qualities (which can and did lead to us over-genotyping esp. when allowing multiple alternate alleles), we look at the likelihoods themselves (free since we are already calculating likelihoods for all 10 genotypes). Now, even if the base quals exceed some arbitrary threshold, we only bother genotyping an alternate allele when there's a sample for which it is more likely than ref/ref (I can generate weird edge cases where this falls apart, but none that model truly variable sites that we actually want to call). This leads to a huge efficiency improvement esp. for exomes (and esp. for many samples) where we almost always were trying to genotype all 3 alternate alleles. Integration tests change only because ref calls have slight QUAL differences (because the best alt allele is still chosen arbitrarily, but differently).	2011-12-27 16:50:38 -05:00
Eric Banks	adff40ff58	Minor optimizations to avoid extra processing (esp. for reduced reads)	2011-12-27 13:16:25 -05:00
Mauricio Carneiro	17bfe48d5e	Made all class methods private in the ReadClipper * ReadClipperUnitTest now uses static methods * Haplotype caller now uses static methods * Exon Junction Genotyper now uses static methods	2011-12-27 02:11:32 -05:00
Eric Banks	dd990061f6	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-12-26 14:45:35 -05:00
Eric Banks	2130b39f33	Found the bug in the engine: RodLocusView was using the wrong seek method so that it would only move to the first locus of a shard (and with multi-locus shards, this meant that we never processed RODs from the other positions). In fact, because the seek(Shard) method is extremely misleading and now no longer used, I think it's safer to delete it and make everyone use the much more transparent seek(GenomeLoc). Note that I have not re-enabled my improvements to the intervals accumulation of ReferenceDataSource because that inefficiency is still present downstream in RodLocusView; need to discuss those changes with Matt.	2011-12-26 14:45:19 -05:00
Mauricio Carneiro	35c41409a1	Better contracts and docs for the ReadClipper * Described the ReadClipper contract in the top of the class * Added contracts where applicable * Added descriptive information to all tools in the read clipper * Organized public members and static methods together with the same javadoc	2011-12-23 19:36:57 -05:00
David Roazen	506c0e9c97	Disabling SnpEff support in the GATK and SnpEff annotation in the HybridSelectionPipeline SnpEff support will remain disabled until SnpEff 2.0.4 has been officially released and we've verified the quality of its annotations.	2011-12-23 19:12:57 -05:00
Eric Banks	24c84da60d	'Fixing' the changes in ReferenceDataSource so that a shard properly contains a list of GenomeLocs instead of a single merged one. However, that uncovered a probable bug in the engine, so instead of letting this code fester unfixed in the build (affecting everyone in the group) I've decided to revert the previous (slow, but working) version and fix the engine in my own branch.	2011-12-23 15:39:12 -05:00
Eric Banks	8762313a0d	Better TODO message	2011-12-22 20:54:35 -05:00
Eric Banks	a815e875a8	Removing debugging output	2011-12-22 15:49:11 -05:00
Eric Banks	deef542a38	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-12-22 15:44:58 -05:00
Eric Banks	6d260ec6ae	Start printing traversal stats after 30 seconds. I can't stand waiting 2 minutes.	2011-12-22 15:40:59 -05:00
David Roazen	510c71158c	Merged bug fix from Stable into Unstable	2011-12-22 10:49:52 -05:00
David Roazen	32cdef9682	Rename PerformanceTest test classes to LargeScaleTest This is in preparation for the installation of the new performance test suite in Bamboo. Note that "ant performancetest" is now "ant largescaletest"	2011-12-22 10:38:49 -05:00
Mauricio Carneiro	731a463415	Updated IntegrationTests with new adaptor clipper phew!	2011-12-20 17:48:52 -05:00
Mauricio Carneiro	cadff40247	getRefCoordSoftUnclippedStart and End refactor These functions are methods of the read, and supplement getAlignmentStart() and getUnclippedStart() by calculating the unclipped start counting only soft clips. * Removed from ReadUtils * Added to GATKSAMRecord * Changed name to getSoftStart() and getSoftEnd * Updated third party code accordingly.	2011-12-20 17:48:51 -05:00
Mauricio Carneiro	07128a2ad2	ReadUtils cleanup * Removed all clipping functionality from ReadUtils (it should all be done using the ReadClipper now) * Cleaned up functionality that wasn't being used or had been superseded by other code (in an effort to reduce multiple unsupported implementations) * Made all meaningful functions public and added better comments/explanation to the headers	2011-12-20 17:48:40 -05:00
Mauricio Carneiro	1c4774c475	Static versions of the hard clipping utilities For simplified access to the hard clipping utilities. No need to create a ReadClipper object if you are not doing multiple complicated clipping operations, just use the static methods. examples: ReadClipper.hardClipLowQualEnds(2); ReadClipper.hardClipAdaptorSequence();	2011-12-20 17:48:39 -05:00

1 2 3 4 5 ...

1430 Commits (90cc17ee2aa3ff9b266d1625da46c264169e4b9f)