gatk-3.8

Commit Graph

Author	SHA1	Message	Date
Mauricio Carneiro	aa1d2f3a5b	Not every consensus is well aligned. Need to check more, but starting position has been fixed.	2012-09-21 10:00:45 -04:00
Mauricio Carneiro	97874b92d1	Program runs, but the consensus reads are all out of place and need more tags	2012-09-21 10:00:44 -04:00
Mauricio Carneiro	3494a52ddc	another intermediate commit to update changes from stable	2012-09-21 10:00:43 -04:00
Mauricio Carneiro	a89ff7b5dd	Intermediate commit to resolve conflicts coming from stable	2012-09-21 10:00:41 -04:00
Mark DePristo	5d758bf97f	Better run a shorter test -- should take 3 minutes total	2012-09-20 18:54:14 -04:00
Mark DePristo	d29218825d	Fix grouping for display of GATKPerformanceOverTime -- God I hate R	2012-09-20 18:45:16 -04:00
Mark DePristo	b5fa848255	Fix GSA-515 Nanoscheduler GSA-573 -nt and -nct interact badly w.r.t. output -- See https://jira.broadinstitute.org/browse/GSA-573 -- Uses InheritedThreadLocal storage so that children threads created by the NanoScheduler see the parent stubs in the main thread. -- Added explicit integration test that checks that -nt 1, 2 and -nct 1, 2 give the same results for GLM BOTH with the UG over 1 MB.	2012-09-20 18:45:16 -04:00
Mark DePristo	90b7df46cf	Add invocation count and shorter timeout to NanoSchedulerUnitTest	2012-09-20 18:45:16 -04:00
Mark DePristo	ba9e95a8fe	Revert "Reorganized NanoScheduler so that main thread does the reduces" Doesn't actually fix the problem, and adds an unnecessary delay in closing down NanoScheduler, so reverting. This reverts commit 66b820bf94ae755a8a0c71ea16f4cae56fd3e852.	2012-09-20 18:45:15 -04:00
Mark DePristo	7425ab9637	Reorganized NanoScheduler so that main thread does the reduces -- Enables us to run -nt 2 -nct 2 and get meaningful output -- Uses a sleep / poll mechanism. Not ideal -- will look into wait / notify instead.	2012-09-20 18:45:15 -04:00
Eric Banks	747694f7c2	Merge branch 'master' of ssh://gsa2/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-09-20 14:14:58 -04:00
Eric Banks	1316b579f0	Bad news folks: BQSR scatter-gather was totally busted; you absolutely cannot trust any BQSR table that was a product of SG (for any version of BQSR). I fixed BQSR-gathering, rewrote (and enabled) the unit test, and confirmed that outputs are now identical whether or not SG is used to create the table.	2012-09-20 14:14:34 -04:00
Christopher Hartl	c492185be6	Merge branch 'master' of gsa2:/humgen/gsa-scr1/chartl/dev/unstable	2012-09-20 12:56:07 -04:00
Christopher Hartl	d25579deeb	A couple of minor things. 1) Better documentation on the meta data file for VariantsToBinaryPed with examples of each file type 2) MannWhitneyU can now take an argument on creation to turn off dithering. This pertains to JIRA-GSA-571 but does not fix it, as it isn't hooked up to the command line. Next step is to add an argument to the command line where it's accessible to the annotation classes (e.g. from either UG or the VariantAnnotator). 3) Added some dumb python scripts to deal with Plink files, and a script to convert plink binaries to VCF to help sanity check. Basically if you want to do an analysis on genotype data stored in plink binary format, your choices are: 1) Add a new module to Plink [difficulty rating: Impossible -- code obfuscation] 2) Steal plink parsing code from software (Plink/PlinkSeq/GCTA/Emacks/etc) that readds the files [difficulty rating: Oppressive -- code not modularized at all) 3) Write your own dumb stuff [difficutly rating: Annoying] What's been added is the result of 3. It's a library so nobody else has to do this, so long as they're comfortable with python.	2012-09-20 12:48:13 -04:00
Eric Banks	2e6f533996	Adding both unit and integration tests to cover the previous edge case of mismatched PLs	2012-09-20 11:55:28 -04:00
Eric Banks	4b7edc72d1	Fixing edge case bug in the Exact model (both standard and generalized) where we could abort prematurely in the special case of multiple polymorphic alleles and samples with widely different depths of coverage (e.g. exome and low-pass). In these cases it was possible to call the site bi-allelic when in fact it was multi-allelic (but it wouldn't cause it to create a monomorphic call).	2012-09-20 10:59:42 -04:00
Ryan Poplin	ccb65a03e8	sorry, non-ASCII characters annoy some computers.	2012-09-20 10:14:48 -04:00
Mauricio Carneiro	1ef6fa7eed	QD and FS are doubles and select variants is more picky than variant filtration on that	2012-09-20 08:21:42 -04:00
Mauricio Carneiro	4e160a267d	quality control script for ReduceReads Takes in a full bam and a reduced bam, makes calls over a given interval, selects only the high quality	2012-09-20 00:11:32 -04:00
Mark DePristo	087247f1f0	Allow longs and doubles in recalibration report to allow some backward compatibility	2012-09-19 19:23:44 -04:00
Mark DePristo	2267b722b2	Proper error handling in NanoScheduler -- Renamed TraversalErrorManager to the more general MultiThreadedErrorTracker -- ErrorTracker is now used throughout the NanoScheduler. In order to properly handle errors, the work previously done by main thread (submit jobs, block on reduce) is now handled in a separate thread. The main thread simply wakes up peroidically and checks whether the reduce result is available or if an error has occurred, and handles each appropriately. -- EngineFeaturesIntegrationTest checks that -nt and -nct properly throw errors in Walkers -- Added NanoSchedulerUnitTest for input errors -- ThreadEfficiencyMonitoring is now disabled by default, and can be enabled with a GATK command line option. This is because the monitoring doesn't differentiate between threads that are supposed to do work, and those that are supposed to wait, and therefore gives misleading results. -- Build.xml no longer copies the unittest results verbosely	2012-09-19 17:03:13 -04:00
Mark DePristo	773af05980	Intermediate commit for proper error handling in the NanoScheduler -- Refactored error handling from HMS into utils.TraversalErrorManager, which is now used by HMS and will be usable by NanoScheduler -- Generalized EngineFeaturesIntegrationTest to test map / reduce error throwing for nt 1, nt 2 and nct 2 (disabled) -- Added unit tests for failing input iterator in NanoScheduler (fails) -- Made ErrorThrowing NanoScheduable	2012-09-19 17:03:13 -04:00
Mark DePristo	eb24dc920a	GATKPerformanceOverTime now includes ideal scaling line by default	2012-09-19 17:03:13 -04:00
Mark DePristo	d2046b67b1	Remove problematic @Ensures from InputProducer. -- We need to figure out why CoFoJa is broken in the NanoScheduler	2012-09-19 17:03:13 -04:00
Mark DePristo	33fabb8180	Final V3 version of NanoScheduler -- Fixed basic bugs in tracking of input -> map -> reduce jobs -- Simplified classes -- Expanded unit tests	2012-09-19 17:03:12 -04:00
Mark DePristo	e18bc4e7b1	Adding PrintReads -baq and -bqsr to standard performance testing	2012-09-19 17:03:12 -04:00
Mark DePristo	5734d756b5	Remove problematic @Invariant from EOFMarkedValue	2012-09-19 17:03:12 -04:00
Mark DePristo	aa9a1e8122	Warn GATK user if the number of requested threads > available processors on the machine	2012-09-19 17:03:12 -04:00
Mark DePristo	76027d17e6	Add a few more UnitTests for InputProducer -- Cleaned up function calls for clarity	2012-09-19 17:03:12 -04:00
Mark DePristo	7605c6bcc4	Done GSA-515 Nanoscheduler / GSA-557 V3 nanoScheduler algorithm -- V3 + V4 algorithm for NanoScheduler. The newer version uses 1 dedicated input thread and n - 1 map/reduce threads. These MapReduceJobs perform map and a greedy reduce. The main thread's only job is to shuttle inputs from the input producer thread, enqueueing MapReduce jobs for each one. We manage the number of map jobs now via a Semaphore instead of a BlockingQueue of fixed size. -- This new algorithm should consume N00% CPU power for -nct N value. -- Also a cleaner implementation in general -- Vastly expanded unit tests -- Deleted FutureValue and ReduceThread	2012-09-19 17:03:12 -04:00
Mark DePristo	69e418c3f5	Intermediate commit for v3 NanoScheduling algorithm -- This version works but it blocks much more than I'd expect on input. Merging v2 and v3 to make v4 now	2012-09-19 17:03:12 -04:00
Joel Thibault	c72db70416	Update downsample_to_coverage to 60	2012-09-19 16:23:58 -04:00
Mauricio Carneiro	ee31a54a03	Merged bug fix from Stable into Unstable	2012-09-19 16:09:45 -04:00
Mauricio Carneiro	7cf9911924	Fixed ReduceReads bug where variant regions were missing. This affected variant regions with more than 100 reads and less than 250 reads. Only bams reduced with GATK v2 and 2.1 were affected.	2012-09-19 16:09:08 -04:00
Ryan Poplin	26e35e5ee2	updating BQSR integration tests	2012-09-19 14:10:34 -04:00
Ryan Poplin	b99099f05c	The BaseRecalibrator and DelocalizedBaseRecalibrator have gotten out of sync. Fixing.	2012-09-19 12:30:26 -04:00
Ryan Poplin	7a7103a757	Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-09-19 10:39:18 -04:00
Ryan Poplin	0ea543e1fd	Removing testing scaffolding from delocalized BQSR. The output recal table reports the data as doubles instead of integers. This changes the mapping-based BQSR integration tests. Final intermediate push before delocalized BQSR replaces previous BQSR.	2012-09-19 10:39:06 -04:00
Guillermo del Angel	bebd5c14b8	Update general ploidy md5's due to bad merge of md5's in previous commit, and new shortened interval definition for EMIT_ALL_CONFIDENT_SITES was buggy	2012-09-18 20:12:15 -04:00
Ami Levy Moonshine	ccc3f4ff8d	Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-09-17 09:58:27 -04:00
Ami Levy Moonshine	ebf609f757	new R script for summmary tables of the pipeline	2012-09-17 09:57:10 -04:00
Ami Levy Moonshine	ee0b17d98f	typo in VE	2012-09-17 09:51:51 -04:00
Guillermo del Angel	ca010160a9	Merge fix	2012-09-14 14:05:21 -04:00
Guillermo del Angel	6b37350bc0	Two hairy bugs in pool caller: a) Site error model wasn't counting errors in insertions correctly - Alleles passed in had padded ref byte, but event base in PileupElement doesn't have it. As a result, mismatch rate was grossly overestimated with insertions and we missed several calls we should have made. Integration test reflects changes. b) Adding a ref GL to the exact model is correct mathematically but AFResult wasn't filled properly. As a result, QUAL was junk in pure ref sites, and in all other sites the last ref GL introduced wasn't properly updating Pr(AF>0). c) Added integration test that covers -out_mode EMIT_ALL_CONFIDENT_SITES. Not fully sure if the math is 100% correct (for both diploid and generalized case) but at least now diploid and non-diploid cases behave similarly. md5 of this new test will fail since it's taking me a long time to run so I'll update from Bamboo output shortly	2012-09-14 13:13:22 -04:00
Ryan Poplin	f4ac92e95c	Add clipping of the adaptor sequence to the delocalized BQSR.	2012-09-14 11:51:54 -04:00
Ryan Poplin	3585f5375e	Bug fix so that the delocalized BAQ GOP parameter is actually used by the BQSR.	2012-09-14 11:02:14 -04:00
Eric Banks	86be50f18d	Add note to docs that the --list argument requires full command-line	2012-09-14 10:58:44 -04:00
Menachem Fromer	182344ad89	Merge branch 'master' of ssh://gsa3.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-09-12 23:56:44 -04:00
Menachem Fromer	3d3578b1de	Deal with empty Seq	2012-09-12 23:54:41 -04:00
Ryan Poplin	d380ef9956	revert 82b0bab5fbc4e57e0db30b0ec3d4676fccef40ba, bad idea	2012-09-12 15:42:29 -04:00

1 2 3 4 5 ...

10615 Commits (aa1d2f3a5b47412a8aa75e34e60e5f5c683a1780) All Branches Search

10615 Commits (aa1d2f3a5b47412a8aa75e34e60e5f5c683a1780)

All Branches