gatk-3.8

Commit Graph

Author	SHA1	Message	Date
Eric Banks	ab8d47d9a5	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-01-03 09:38:49 -05:00
Mauricio Carneiro	ca669ae744	Optimizations to the CoverageByRG walker * outputs only the groups of read groups necessary, avoiding multiple pileup creations every call to map * now also counts the number of variants associated with a given ROD (dbSNP) exist in the interval * new column: interval size	2012-01-03 09:36:01 -05:00
Mauricio Carneiro	3d4bf273de	Added getPileupForReadGroups to ReadBackPileup * returns a pileup for all the read groups provided. * saves us from multiple calls to getPileup (which is very inefficient)	2012-01-03 09:35:11 -05:00
Roger Zurawicki	caa5da2fd2	Added parameter to combine RGs in CoverageByRG * -g takes a string of read groups separated by space " " * multiple -g creates multiple sum columns in the table Signed-off-by: Mauricio Carneiro <carneiro@broadinstitute.org>	2012-01-03 09:35:10 -05:00
Mauricio Carneiro	18f06ad913	Script to calculate gc content of intervals independently * necessary for baits because we don't want the overlapping intervals to be merged by the GATK engine	2012-01-03 09:35:10 -05:00
Mauricio Carneiro	0bdeda6f3f	Added single sample option for the ReduceReads calling script	2012-01-03 09:29:47 -05:00
Mauricio Carneiro	4a208c7c06	Refactor of the downsampling machinery to accept different strategies * Implemented Adaptive downsampler * Added integration test * Added option to RRead scala script to choose downsampling strategy	2012-01-03 09:29:47 -05:00
Mauricio Carneiro	cce8511d29	Some WGS performance upgrades for ReduceReads * Do not try to hard clip to the interval when doing WGS * Do not even add reads that have been completely clipped out in WGS	2012-01-03 09:29:46 -05:00
Mauricio Carneiro	21ae3ef5f9	Added downsampling support to ReduceReads * Downsampling is now a parameter to the walker with default value of 0 (no downsampling) * Downsampling selects reads at random at the variant region window and strives to achieve uniform coverage if possible around the desired downsampling value. * Added integration test	2012-01-03 09:29:46 -05:00
Mauricio Carneiro	cd68cc239b	Added knuth-shuffle (KS) and randomSubset using KS to MathUtils * Knuth-shuffle is a simple, yet effective array permutator (hope this is good english). * added a simple randomSubset that returns a random subset without repeats of any given array with the same probability for every permutation. * added unit tests to both functions	2012-01-03 09:29:46 -05:00
Mauricio Carneiro	94791a2a75	Add support for reads starting with insertion * Modified cleanCigarShift to allow insertions in the beginning and end of the read * Allowed cigars starting/ending in insertions in the systematic ReadClipper tests * Updated all ReadClipper unit tests * ReduceReads does not hard clip leading insertions by default anymore * SlidingWindow adjusts start location if read starts with insertion * SlidingWindow creates an empty element with insertions to the right * Fixed all potential divide by zero with totalCount() (from BaseCounts) * Updated all Integration tests * Added new integration test for multiple interval reducing	2012-01-03 09:29:45 -05:00
Mark DePristo	3ecb9a0bf7	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-01-02 13:56:55 -05:00
Mark DePristo	b3e613647a	GATKPerformanceOverTime bug fixes -- Don't try to do nt 16, it's just too painful as the threading doesn't work well and it consumes a large chunk of our available slots on gsa4 -- bugfix: only do multi-threaded test for each iteration, not expanding by subiterations, so we no longer try to do 3x3 nt 16 runs	2012-01-02 13:56:44 -05:00
Mark DePristo	188bd48139	runGATKReport only archives and shows errors for last days runs	2012-01-02 10:39:05 -05:00
Mark DePristo	d05f0c2318	GATKPerformanceOverTime script update -- Automatic detection of most recent version of GATK release (just tell the script now to use 1.2, 1.3, and 1.4) -- Uses 1.4 now -- By default we do 9 runs of each non-parallel test -- In PathUtils added convenience utility to find most recent release GATK jar with a specific release number	2012-01-02 09:58:46 -05:00
Mauricio Carneiro	a837970ea2	Merged bug fix from Stable into Unstable	2012-01-01 22:20:53 -05:00
Mauricio Carneiro	1b6d52817e	fixing adaptor clipping effect on recalibration integration test	2012-01-01 22:20:06 -05:00
Ryan Poplin	e45ca8bfa2	Protect against too many alternate alleles in the haplotype caller.	2012-01-01 19:12:48 -05:00
Eric Banks	393993e0c7	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-12-31 20:42:46 -05:00
Eric Banks	b0d68eb0e3	Merge remote-tracking branch 'unstable/master'	2011-12-31 20:26:44 -05:00
Mauricio Carneiro	55cfa76cf3	Updated integration tests for the new adaptor clipping fix.	2011-12-30 18:47:14 -05:00
Mauricio Carneiro	c7d0a9ebee	Forgot to test for inter-chromosomal mates in the adaptor clipping * Fixing bug caught by Eric (and Kristian)	2011-12-30 00:19:53 -05:00
Matt Hanna	a259bfefd4	First commit addressing problems running RTC in parallel. Turns out that because the RTC is the first walker to 'correctly' tree reduce according to functional programming standards, the RTC has revealed a few problems with the tree reducer holding on to too much data. This is the first and smaller of two commits to reduce memory consumption. The second commit will likely be pushed after GATK1.4 is released.	2011-12-29 16:22:14 -05:00
Matt Hanna	e6e80e8d3f	Update Picard to fix a bug Mauricio found in Picard where Picard unnecessarily depends on Snappy during some usages of SortingCollection.	2011-12-29 14:35:02 -05:00
Eric Banks	1a45ea5a05	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-12-29 11:37:15 -05:00
Roger Zurawicki	efe33a0a1b	BUG FIX: Output is correct The output would put zero coverage because the pileup filtered using the wrong method Signed-off-by: Mauricio Carneiro <carneiro@broadinstitute.org>	2011-12-28 23:05:43 -05:00
Roger Zurawicki	5672688a73	Optimized CoverageByRG and Added GCContent - CoverageByRG now uses a hashmap for its value instead of a list. It runs about 4 times faster. - Cleaned up some of the code - CoverageByRG now calculates GCContent Signed-off-by: Mauricio Carneiro <carneiro@broadinstitute.org>	2011-12-28 15:25:07 -05:00
Roger Zurawicki	0c05998c4c	Added CoverageByRG LocusWalker WIll take any number of input bams and intervals Returns a ReportTable with Average Coverage of each Read Group per Interval Signed-off-by: Mauricio Carneiro <carneiro@broadinstitute.org>	2011-12-28 15:25:07 -05:00
Mauricio Carneiro	f692911903	GATKSAMRecord emptyRead static constructor * Creates an empty GATKSAMRecord with empty (not null) Cigar, bases and quals. Allows empty reads to be probed without breaking. * All ReadClipper utilities now emit empty reads for fully clipped reads	2011-12-27 17:01:17 -05:00
Mauricio Carneiro	8259c748f2	No more Filtered Reads tag. All synthetic reads are marked with the reduced read tag.	2011-12-27 17:01:17 -05:00
Eric Banks	d20a25d681	A much better way of choosing the alternate allele(s) to genotype in the SNP model of UG: instead of looking at the sum of base qualities (which can and did lead to us over-genotyping esp. when allowing multiple alternate alleles), we look at the likelihoods themselves (free since we are already calculating likelihoods for all 10 genotypes). Now, even if the base quals exceed some arbitrary threshold, we only bother genotyping an alternate allele when there's a sample for which it is more likely than ref/ref (I can generate weird edge cases where this falls apart, but none that model truly variable sites that we actually want to call). This leads to a huge efficiency improvement esp. for exomes (and esp. for many samples) where we almost always were trying to genotype all 3 alternate alleles. Integration tests change only because ref calls have slight QUAL differences (because the best alt allele is still chosen arbitrarily, but differently).	2011-12-27 16:50:38 -05:00
Ryan Poplin	ef31b2f0a7	fixing merge conflicts.	2011-12-27 14:26:36 -05:00
Ryan Poplin	4f09a95221	Updating HaplotypeCaller for the new contracts in the adapter clipping.	2011-12-27 14:25:03 -05:00
Eric Banks	adff40ff58	Minor optimizations to avoid extra processing (esp. for reduced reads)	2011-12-27 13:16:25 -05:00
Mauricio Carneiro	17bfe48d5e	Made all class methods private in the ReadClipper * ReadClipperUnitTest now uses static methods * Haplotype caller now uses static methods * Exon Junction Genotyper now uses static methods	2011-12-27 02:11:32 -05:00
Mauricio Carneiro	ce493bf257	Added adaptor clipping to ReduceReads * made all clipping steps optional with arguments.	2011-12-27 01:19:06 -05:00
Mauricio Carneiro	f7a5752025	Let this one slip through my commits.	2011-12-26 21:55:02 -05:00
Mauricio Carneiro	c1eaf7cf81	ReduceReads will allows different context sizes for different events * Rename contextSize to contextSizeMismatches * Indel context size is now different from mismatches context size	2011-12-26 21:17:29 -05:00
Mauricio Carneiro	4633637af6	Moved ReduceReads to static ReadClipper * all clipping done in ReduceReads is done using the static methods of the ReadClipper now.	2011-12-26 21:14:40 -05:00
Mauricio Carneiro	9aa1c0c6e5	Better documentation and contracts for ReduceReads * added javadoc to all methods * added GATKDocs style documentation to the ReduceReadsWalker * revised contracts and made explicit in the documentation	2011-12-26 21:12:23 -05:00
Mauricio Carneiro	3051cdf9c5	fixed reduced reads integration tests	2011-12-26 21:12:22 -05:00
Mauricio Carneiro	256a7d8bd2	fixing the arguments for RRead script	2011-12-26 21:12:22 -05:00
Eric Banks	dd990061f6	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-12-26 14:45:35 -05:00
Eric Banks	2130b39f33	Found the bug in the engine: RodLocusView was using the wrong seek method so that it would only move to the first locus of a shard (and with multi-locus shards, this meant that we never processed RODs from the other positions). In fact, because the seek(Shard) method is extremely misleading and now no longer used, I think it's safer to delete it and make everyone use the much more transparent seek(GenomeLoc). Note that I have not re-enabled my improvements to the intervals accumulation of ReferenceDataSource because that inefficiency is still present downstream in RodLocusView; need to discuss those changes with Matt.	2011-12-26 14:45:19 -05:00
Mauricio Carneiro	02495a5fd5	renaming script, once more	2011-12-23 20:01:25 -05:00
Mauricio Carneiro	afc58b81b2	changing permissions on the scala script	2011-12-23 19:47:48 -05:00
Mauricio Carneiro	5198f3a287	Making -e optional and renaming script * Expanding intervals should be optional, not mandatory	2011-12-23 19:36:57 -05:00
Mauricio Carneiro	35c41409a1	Better contracts and docs for the ReadClipper * Described the ReadClipper contract in the top of the class * Added contracts where applicable * Added descriptive information to all tools in the read clipper * Organized public members and static methods together with the same javadoc	2011-12-23 19:36:57 -05:00
David Roazen	506c0e9c97	Disabling SnpEff support in the GATK and SnpEff annotation in the HybridSelectionPipeline SnpEff support will remain disabled until SnpEff 2.0.4 has been officially released and we've verified the quality of its annotations.	2011-12-23 19:12:57 -05:00
Eric Banks	24c84da60d	'Fixing' the changes in ReferenceDataSource so that a shard properly contains a list of GenomeLocs instead of a single merged one. However, that uncovered a probable bug in the engine, so instead of letting this code fester unfixed in the build (affecting everyone in the group) I've decided to revert the previous (slow, but working) version and fix the engine in my own branch.	2011-12-23 15:39:12 -05:00

1 2 3 4 5 ...

8499 Commits (ab8d47d9a556af4aab703e717715f3d5cfc2470f) All Branches Search

8499 Commits (ab8d47d9a556af4aab703e717715f3d5cfc2470f)

All Branches