Mark DePristo
773af05980
Intermediate commit for proper error handling in the NanoScheduler
...
-- Refactored error handling from HMS into utils.TraversalErrorManager, which is now used by HMS and will be usable by NanoScheduler
-- Generalized EngineFeaturesIntegrationTest to test map / reduce error throwing for nt 1, nt 2 and nct 2 (disabled)
-- Added unit tests for failing input iterator in NanoScheduler (fails)
-- Made ErrorThrowing NanoScheduable
2012-09-19 17:03:13 -04:00
Mark DePristo
eb24dc920a
GATKPerformanceOverTime now includes ideal scaling line by default
2012-09-19 17:03:13 -04:00
Mark DePristo
d2046b67b1
Remove problematic @Ensures from InputProducer.
...
-- We need to figure out why CoFoJa is broken in the NanoScheduler
2012-09-19 17:03:13 -04:00
Mark DePristo
33fabb8180
Final V3 version of NanoScheduler
...
-- Fixed basic bugs in tracking of input -> map -> reduce jobs
-- Simplified classes
-- Expanded unit tests
2012-09-19 17:03:12 -04:00
Mark DePristo
e18bc4e7b1
Adding PrintReads -baq and -bqsr to standard performance testing
2012-09-19 17:03:12 -04:00
Mark DePristo
5734d756b5
Remove problematic @Invariant from EOFMarkedValue
2012-09-19 17:03:12 -04:00
Mark DePristo
aa9a1e8122
Warn GATK user if the number of requested threads > available processors on the machine
2012-09-19 17:03:12 -04:00
Mark DePristo
76027d17e6
Add a few more UnitTests for InputProducer
...
-- Cleaned up function calls for clarity
2012-09-19 17:03:12 -04:00
Mark DePristo
7605c6bcc4
Done GSA-515 Nanoscheduler / GSA-557 V3 nanoScheduler algorithm
...
-- V3 + V4 algorithm for NanoScheduler. The newer version uses 1 dedicated input thread and n - 1 map/reduce threads. These MapReduceJobs perform map and a greedy reduce. The main thread's only job is to shuttle inputs from the input producer thread, enqueueing MapReduce jobs for each one. We manage the number of map jobs now via a Semaphore instead of a BlockingQueue of fixed size.
-- This new algorithm should consume N00% CPU power for -nct N value.
-- Also a cleaner implementation in general
-- Vastly expanded unit tests
-- Deleted FutureValue and ReduceThread
2012-09-19 17:03:12 -04:00
Mark DePristo
69e418c3f5
Intermediate commit for v3 NanoScheduling algorithm
...
-- This version works but it blocks much more than I'd expect on input. Merging v2 and v3 to make v4 now
2012-09-19 17:03:12 -04:00
Joel Thibault
c72db70416
Update downsample_to_coverage to 60
2012-09-19 16:23:58 -04:00
Mauricio Carneiro
ee31a54a03
Merged bug fix from Stable into Unstable
2012-09-19 16:09:45 -04:00
Mauricio Carneiro
7cf9911924
Fixed ReduceReads bug where variant regions were missing.
...
This affected variant regions with more than 100 reads and less than 250 reads. Only bams reduced with GATK v2 and 2.1 were affected.
2012-09-19 16:09:08 -04:00
Ryan Poplin
26e35e5ee2
updating BQSR integration tests
2012-09-19 14:10:34 -04:00
Ryan Poplin
b99099f05c
The BaseRecalibrator and DelocalizedBaseRecalibrator have gotten out of sync. Fixing.
2012-09-19 12:30:26 -04:00
Ryan Poplin
7a7103a757
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-09-19 10:39:18 -04:00
Ryan Poplin
0ea543e1fd
Removing testing scaffolding from delocalized BQSR. The output recal table reports the data as doubles instead of integers. This changes the mapping-based BQSR integration tests. Final intermediate push before delocalized BQSR replaces previous BQSR.
2012-09-19 10:39:06 -04:00
Guillermo del Angel
bebd5c14b8
Update general ploidy md5's due to bad merge of md5's in previous commit, and new shortened interval definition for EMIT_ALL_CONFIDENT_SITES was buggy
2012-09-18 20:12:15 -04:00
Ami Levy Moonshine
ccc3f4ff8d
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-09-17 09:58:27 -04:00
Ami Levy Moonshine
ebf609f757
new R script for summmary tables of the pipeline
2012-09-17 09:57:10 -04:00
Ami Levy Moonshine
ee0b17d98f
typo in VE
2012-09-17 09:51:51 -04:00
Guillermo del Angel
ca010160a9
Merge fix
2012-09-14 14:05:21 -04:00
Guillermo del Angel
6b37350bc0
Two hairy bugs in pool caller: a) Site error model wasn't counting errors in insertions correctly - Alleles passed in had padded ref byte, but event base in PileupElement doesn't have it. As a result, mismatch rate was grossly overestimated with insertions and we missed several calls we should have made. Integration test reflects changes. b) Adding a ref GL to the exact model is correct mathematically but AFResult wasn't filled properly. As a result, QUAL was junk in pure ref sites, and in all other sites the last ref GL introduced wasn't properly updating Pr(AF>0). c) Added integration test that covers -out_mode EMIT_ALL_CONFIDENT_SITES. Not fully sure if the math is 100% correct (for both diploid and generalized case) but at least now diploid and non-diploid cases behave similarly. md5 of this new test will fail since it's taking me a long time to run so I'll update from Bamboo output shortly
2012-09-14 13:13:22 -04:00
Ryan Poplin
f4ac92e95c
Add clipping of the adaptor sequence to the delocalized BQSR.
2012-09-14 11:51:54 -04:00
Ryan Poplin
3585f5375e
Bug fix so that the delocalized BAQ GOP parameter is actually used by the BQSR.
2012-09-14 11:02:14 -04:00
Eric Banks
86be50f18d
Add note to docs that the --list argument requires full command-line
2012-09-14 10:58:44 -04:00
Menachem Fromer
182344ad89
Merge branch 'master' of ssh://gsa3.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-09-12 23:56:44 -04:00
Menachem Fromer
3d3578b1de
Deal with empty Seq
2012-09-12 23:54:41 -04:00
Ryan Poplin
d380ef9956
revert 82b0bab5fbc4e57e0db30b0ec3d4676fccef40ba, bad idea
2012-09-12 15:42:29 -04:00
Ryan Poplin
e7200f1a40
adding verbose debug statements in BQSR
2012-09-12 15:40:07 -04:00
Eric Banks
0206e09a6a
Merge branch 'master' of ssh://gsa2/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-09-12 15:18:27 -04:00
Eric Banks
d94d0d15c2
Complete overhaul of previous commits to make it all work with scatter-gather. Now tracks output files correctly and can print to stdout.
2012-09-12 15:15:40 -04:00
Ryan Poplin
699a7801b6
Force the in-walker BAQ calculation to use the new BAQGOP parameter.
2012-09-12 14:59:31 -04:00
Ryan Poplin
c9111bb23e
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-09-12 14:46:50 -04:00
Ryan Poplin
fafecf4ffd
Adding BAQGOP parameter to the delocalized BQSR.
2012-09-12 14:46:18 -04:00
Ryan Poplin
bc1e03a6d8
Adding HC integration test for _structural_ insertions and deletions.
2012-09-12 12:25:39 -04:00
Ryan Poplin
faad2972d6
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-09-12 12:23:24 -04:00
Ryan Poplin
849a2b8839
Adding HC integration test for _structural_ insertions and deletions.
2012-09-12 12:23:00 -04:00
Eric Banks
4bb7a99f08
Given that all classes implementing output stubs already have getters for the underlying OutputStream and File, it makes sense to unify that functionality into the Stub interface. Now it is possible to have an Engine utility method that iterates over all registered stubs to find the one representing a given OutputStream and return the File associated with it.
2012-09-12 11:51:44 -04:00
Eric Banks
994a4ff387
Track all outputs from BQSR (.table, .csv., and .pdf) as @Output arguments. Updated integration tests because we no longer have command-line options not to generate plots (now just don't provide a pdf) or to keep the intermediate csv (now, just provide a filename on the command-line). This is currently busted because we can't access the original filenames from the Engine's storage/stub system and therefore cannot call out to the Rscript with the executor (which requires filename strings).
2012-09-12 11:24:53 -04:00
Christopher Hartl
96be1cbea9
My own integration test isn't passing with a clean checkout. This fix to the walker ought to do it.
2012-09-12 10:11:06 -04:00
Christopher Hartl
546586b70e
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-09-12 10:09:42 -04:00
Mark DePristo
bfbf1686cd
Fixed nasty bug with defaulting to diploid no-call genotypes
...
-- For the pooled caller we were writing diploid no-calls even when other samples were haploid. Changed maxPloidy function to return a defaultPloidy, rather than 0, in the case where all samples are missing.
-- VCF/BCF Writers now create missing genotypes with the ploidy of other samples, or 2 if none are available at all.
-- Updating integration tests for general ploidy, as previously we wrote ./. even when other calls were 0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/1/1/1/1/1, but now we write ./././././././././././././././././././././././. (ugly but correct)
2012-09-12 07:08:03 -04:00
Mark DePristo
d1ba17df5d
Fixed nasty bug in BCF2 writer for case where all genotypes are missing
...
-- Previous code was looking for a -1 result from maxPloidy() but the result as actually 0, so instead of writing a diploid no call we were actually writing "unavailable" genotypes, and failing the BCF == VCF test in integration tests. Fixed.
2012-09-12 06:46:27 -04:00
Mark DePristo
91f3204534
VCF/BCF writers once again automatically write out no-call genotypes for samples in the VCFHeader but not in the VC itself
...
-- Turns out this was consuming 30% of the UG runtime, and causing problems elsewhere.
-- Removed addMissingSamples from VariantcontextUtils, and calls to it
-- Updated VCF / BCF writers to automatically write out a diploid no call for missing samples
-- Added unit tests for this behavior in VariantContextWritersUnitTest
2012-09-12 06:46:26 -04:00
Menachem Fromer
d3bdb9c67e
Choose queue based on assumed run time expectation
2012-09-12 03:36:57 -04:00
Menachem Fromer
5764f1037c
Added control of memory for matrix merging
2012-09-12 03:01:01 -04:00
Menachem Fromer
625fb25eca
Updated import
2012-09-12 02:17:24 -04:00
Menachem Fromer
2ea28499e2
Merge branch 'master' of ssh://gsa3.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-09-12 01:58:53 -04:00
Menachem Fromer
5cb08fd17c
Added XHMM option to outputTargetsBySamples
2012-09-12 01:58:04 -04:00