Commit Graph

8505 Commits (621ee2b613bc518480599b8e0939ebfd79f7bbe5)

Author SHA1 Message Date
David Roazen 621ee2b613 Merged bug fix from Stable into Unstable 2012-01-03 16:56:49 -05:00
David Roazen ea6e718cb8 SnpEff 2.0.5 support. Re-enabled SnpEff in the HybridSelectionPipeline.
For now, we recommend only running with the GRCh37.64 database.
2012-01-03 15:18:36 -05:00
Christopher Hartl 93e1417b6e Update to the VSS GATK documentation. 2012-01-03 13:39:31 -05:00
David Roazen 4984ca5e31 Merged bug fix from Stable into Unstable 2012-01-03 11:03:30 -05:00
David Roazen f3f01da1af Enforce serial dependencies in RecalibrationWalkersIntegrationTest
Some tests in this class were intermittently not being executed due
to being randomly scheduled before tests whose results they depend on.
Now the serial dependencies are enforced to avoid problematic orderings.
2012-01-03 10:42:41 -05:00
David Roazen 055364d786 Always use full, three-part version numbers.
Previously, the initial release of a new GATK version had a version
number with only one part (eg., "1.4"). This could potentially mislead
people into thinking it's the most recent revision of a release, instead
of the least recent.

Now, initial releases will have full, three-part version numbers
(eg., "1.4-0-g472fc94") like everything else.
2012-01-03 10:25:19 -05:00
Eric Banks ab8d47d9a5 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-01-03 09:38:49 -05:00
Mauricio Carneiro ca669ae744 Optimizations to the CoverageByRG walker
* outputs only the groups of read groups necessary, avoiding multiple pileup creations every call to map
   * now also counts the number of variants associated with a given ROD (dbSNP) exist in the interval
   * new column: interval size
2012-01-03 09:36:01 -05:00
Mauricio Carneiro 3d4bf273de Added getPileupForReadGroups to ReadBackPileup
* returns a pileup for all the read groups provided.
   * saves us from multiple calls to getPileup (which is very inefficient)
2012-01-03 09:35:11 -05:00
Roger Zurawicki caa5da2fd2 Added parameter to combine RGs in CoverageByRG
* -g takes a string of read groups separated by space " "
   * multiple -g creates multiple sum columns in the table

Signed-off-by: Mauricio Carneiro <carneiro@broadinstitute.org>
2012-01-03 09:35:10 -05:00
Mauricio Carneiro 18f06ad913 Script to calculate gc content of intervals independently
* necessary for baits because we don't want the overlapping intervals to be merged by the GATK engine
2012-01-03 09:35:10 -05:00
Mauricio Carneiro 0bdeda6f3f Added single sample option for the ReduceReads calling script 2012-01-03 09:29:47 -05:00
Mauricio Carneiro 4a208c7c06 Refactor of the downsampling machinery to accept different strategies
* Implemented Adaptive downsampler
   * Added integration test
   * Added option to RRead scala script to choose downsampling strategy
2012-01-03 09:29:47 -05:00
Mauricio Carneiro cce8511d29 Some WGS performance upgrades for ReduceReads
* Do not try to hard clip to the interval when doing WGS
   * Do not even add reads that have been completely clipped out in WGS
2012-01-03 09:29:46 -05:00
Mauricio Carneiro 21ae3ef5f9 Added downsampling support to ReduceReads
* Downsampling is now a parameter to the walker with default value of 0 (no downsampling)
    * Downsampling selects reads at random at the variant region window and strives to achieve uniform coverage if possible around the desired downsampling value.
    * Added integration test
2012-01-03 09:29:46 -05:00
Mauricio Carneiro cd68cc239b Added knuth-shuffle (KS) and randomSubset using KS to MathUtils
* Knuth-shuffle is a simple, yet effective array permutator (hope this is good english).
         * added a simple randomSubset that returns a random subset without repeats of any given array with the same probability for every permutation.
         * added unit tests to both functions
2012-01-03 09:29:46 -05:00
Mauricio Carneiro 94791a2a75 Add support for reads starting with insertion
* Modified cleanCigarShift to allow insertions in the beginning and end of the read
      * Allowed cigars starting/ending in insertions in the systematic ReadClipper tests
      * Updated all ReadClipper unit tests
      * ReduceReads does not hard clip leading insertions by default anymore
      * SlidingWindow adjusts start location if read starts with insertion
      * SlidingWindow creates an empty element with insertions to the right
      * Fixed all potential divide by zero with totalCount() (from BaseCounts)
      * Updated all Integration tests
      * Added new integration test for multiple interval reducing
2012-01-03 09:29:45 -05:00
Mark DePristo 3ecb9a0bf7 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-01-02 13:56:55 -05:00
Mark DePristo b3e613647a GATKPerformanceOverTime bug fixes
-- Don't try to do nt 16, it's just too painful as the threading doesn't work well and it consumes a large chunk of our available slots on gsa4
-- bugfix: only do multi-threaded test for each iteration, not expanding by subiterations, so we no longer try to do 3x3 nt 16 runs
2012-01-02 13:56:44 -05:00
Mark DePristo 188bd48139 runGATKReport only archives and shows errors for last days runs 2012-01-02 10:39:05 -05:00
Mark DePristo d05f0c2318 GATKPerformanceOverTime script update
-- Automatic detection of most recent version of GATK release (just tell the script now to use 1.2, 1.3, and 1.4)
-- Uses 1.4 now
-- By default we do 9 runs of each non-parallel test
-- In PathUtils added convenience utility to find most recent release GATK jar with a specific release number
2012-01-02 09:58:46 -05:00
Mauricio Carneiro a837970ea2 Merged bug fix from Stable into Unstable 2012-01-01 22:20:53 -05:00
Mauricio Carneiro 1b6d52817e fixing adaptor clipping effect on recalibration integration test 2012-01-01 22:20:06 -05:00
Ryan Poplin e45ca8bfa2 Protect against too many alternate alleles in the haplotype caller. 2012-01-01 19:12:48 -05:00
Eric Banks 393993e0c7 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-12-31 20:42:46 -05:00
Eric Banks b0d68eb0e3 Merge remote-tracking branch 'unstable/master' 2011-12-31 20:26:44 -05:00
Mauricio Carneiro 55cfa76cf3 Updated integration tests for the new adaptor clipping fix. 2011-12-30 18:47:14 -05:00
Mauricio Carneiro c7d0a9ebee Forgot to test for inter-chromosomal mates in the adaptor clipping
* Fixing bug caught by Eric (and Kristian)
2011-12-30 00:19:53 -05:00
Matt Hanna a259bfefd4 First commit addressing problems running RTC in parallel.
Turns out that because the RTC is the first walker to 'correctly' tree reduce according to functional programming
standards, the RTC has revealed a few problems with the tree reducer holding on to too much data.  This is the first
and smaller of two commits to reduce memory consumption.  The second commit will likely be pushed after GATK1.4 is
released.
2011-12-29 16:22:14 -05:00
Matt Hanna e6e80e8d3f Update Picard to fix a bug Mauricio found in Picard where Picard unnecessarily depends on Snappy during some usages of SortingCollection. 2011-12-29 14:35:02 -05:00
Eric Banks 1a45ea5a05 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-12-29 11:37:15 -05:00
Roger Zurawicki efe33a0a1b BUG FIX: Output is correct
The output would put zero coverage because the pileup filtered using the wrong method

Signed-off-by: Mauricio Carneiro <carneiro@broadinstitute.org>
2011-12-28 23:05:43 -05:00
Roger Zurawicki 5672688a73 Optimized CoverageByRG and Added GCContent
- CoverageByRG now uses a hashmap for its value instead of a list. It runs about 4 times faster.
- Cleaned up some of the code
- CoverageByRG now calculates GCContent

Signed-off-by: Mauricio Carneiro <carneiro@broadinstitute.org>
2011-12-28 15:25:07 -05:00
Roger Zurawicki 0c05998c4c Added CoverageByRG LocusWalker
WIll take any number of input bams and intervals
Returns a ReportTable with Average Coverage of each Read Group per Interval

Signed-off-by: Mauricio Carneiro <carneiro@broadinstitute.org>
2011-12-28 15:25:07 -05:00
Mauricio Carneiro f692911903 GATKSAMRecord emptyRead static constructor
* Creates an empty GATKSAMRecord with empty (not null) Cigar, bases and quals. Allows empty reads to be probed without breaking.
 * All ReadClipper utilities now emit empty reads for fully clipped reads
2011-12-27 17:01:17 -05:00
Mauricio Carneiro 8259c748f2 No more Filtered Reads tag.
All synthetic reads are marked with the reduced read tag.
2011-12-27 17:01:17 -05:00
Eric Banks d20a25d681 A much better way of choosing the alternate allele(s) to genotype in the SNP model of UG: instead of looking at the sum of base qualities (which can and did lead to us over-genotyping esp. when allowing multiple alternate alleles), we look at the likelihoods themselves (free since we are already calculating likelihoods for all 10 genotypes). Now, even if the base quals exceed some arbitrary threshold, we only bother genotyping an alternate allele when there's a sample for which it is more likely than ref/ref (I can generate weird edge cases where this falls apart, but none that model truly variable sites that we actually want to call). This leads to a huge efficiency improvement esp. for exomes (and esp. for many samples) where we almost always were trying to genotype all 3 alternate alleles. Integration tests change only because ref calls have slight QUAL differences (because the best alt allele is still chosen arbitrarily, but differently). 2011-12-27 16:50:38 -05:00
Ryan Poplin ef31b2f0a7 fixing merge conflicts. 2011-12-27 14:26:36 -05:00
Ryan Poplin 4f09a95221 Updating HaplotypeCaller for the new contracts in the adapter clipping. 2011-12-27 14:25:03 -05:00
Eric Banks adff40ff58 Minor optimizations to avoid extra processing (esp. for reduced reads) 2011-12-27 13:16:25 -05:00
Mauricio Carneiro 17bfe48d5e Made all class methods private in the ReadClipper
* ReadClipperUnitTest now uses static methods
 * Haplotype caller now uses static methods
 * Exon Junction Genotyper now uses static methods
2011-12-27 02:11:32 -05:00
Mauricio Carneiro ce493bf257 Added adaptor clipping to ReduceReads
* made all clipping steps optional with arguments.
2011-12-27 01:19:06 -05:00
Mauricio Carneiro f7a5752025 Let this one slip through my commits. 2011-12-26 21:55:02 -05:00
Mauricio Carneiro c1eaf7cf81 ReduceReads will allows different context sizes for different events
* Rename contextSize to contextSizeMismatches
 * Indel context size is now different from mismatches context size
2011-12-26 21:17:29 -05:00
Mauricio Carneiro 4633637af6 Moved ReduceReads to static ReadClipper
* all clipping done in ReduceReads is done using the static methods of the ReadClipper now.
2011-12-26 21:14:40 -05:00
Mauricio Carneiro 9aa1c0c6e5 Better documentation and contracts for ReduceReads
* added javadoc to all methods
  * added GATKDocs style documentation to the ReduceReadsWalker
  * revised contracts and made explicit in the documentation
2011-12-26 21:12:23 -05:00
Mauricio Carneiro 3051cdf9c5 fixed reduced reads integration tests 2011-12-26 21:12:22 -05:00
Mauricio Carneiro 256a7d8bd2 fixing the arguments for RRead script 2011-12-26 21:12:22 -05:00
Eric Banks dd990061f6 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-12-26 14:45:35 -05:00
Eric Banks 2130b39f33 Found the bug in the engine: RodLocusView was using the wrong seek method so that it would only move to the first locus of a shard (and with multi-locus shards, this meant that we never processed RODs from the other positions). In fact, because the seek(Shard) method is extremely misleading and now no longer used, I think it's safer to delete it and make everyone use the much more transparent seek(GenomeLoc). Note that I have not re-enabled my improvements to the intervals accumulation of ReferenceDataSource because that inefficiency is still present downstream in RodLocusView; need to discuss those changes with Matt. 2011-12-26 14:45:19 -05:00