gatk-3.8

Commit Graph

Author	SHA1	Message	Date
Mark DePristo	bd7ed0d028	Enable efficient parallel output of BCF2 -- Previous IO stub was hardcoded to write VCF. So when you ran -nt 2 -o my.bcf you actually created intermediate VCF files that were then encoded single threaded as BCF. Now we emit natively per thread BCF, and use the fast mergeInfo code to read BCF -> write BCF. Upcoming optimizations to avoid decoding genotype data unnecessarily will enable us to really quickly process BCF2 in parallel -- VariantContextWriterStub forces BCF output for intermediate files -- Nicer debug log message in BCF2Codec -- Turn off debug logging of BCF2LazyGenotypesDecoder -- BCF2FieldWriterManager now uses .debug not .info, so you won't see all of that field manager debugging info with BCF2 any longer -- VariantContextWriterFactory.isBCFOutput now has version that accepts just a file path, not path + options	2012-08-15 21:13:15 -04:00
Mark DePristo	290fd33f3b	GATKPerformanceOfTime.R updated to new script outputs	2012-08-15 21:13:15 -04:00
Mark DePristo	9459e6203a	Clean, documented implementation of ThreadFactory that monitors running / blocking / waiting time of threads it creates -- Expanded unit tests -- Support for clean logging of results to logger -- Refactored MyTime into AutoFormattingTime in Utils, out of TraversalEngine, for cleanliness and reuse -- Added docs and contracts to StateMonitoringThreadFactory	2012-08-15 21:13:15 -04:00
Mark DePristo	be3230a1fd	Initial implementation of ThreadFactory that monitors running / blocking / waiting time of threads it creates -- Created makeCombinations utility function (very useful!). Moved template from VariantContextTestProvider -- UnitTests for basic functionality	2012-08-15 21:13:15 -04:00
Mark DePristo	fc1bd82011	GATK performance of time lives -- Cut out CountCovariates and TableRecalibrator (no longer in GATK2) -- Parallelism tests go up to 32 cores by default now -- Only tests 1.6 and 2.0 now -- Useful -justUG option to just run all of the UG performance tests	2012-08-15 21:13:15 -04:00
Mark DePristo	f277d7c09e	Removing parallelism bottleneck in the GATK -- GenomeLocParser cache was a major performance bottleneck in parallel GATK performance. With 10 thread > 50% of each thread's time was spent blocking on the MasterSequencingDictionary object. Made this a thread local variable. -- Now we can run the GATK with 48 threads efficiently on GSA4! -- Running -nt 1 => 75 minutes (didn't let is run all of the way through so likely would take longer) -- Running -nt 24 => 3.81 minutes	2012-08-15 21:13:15 -04:00
Mauricio Carneiro	cbf290ada0	Reving CoverageByRG to the new GATKReport API	2012-08-15 16:12:40 -04:00
David Roazen	9b84fa20bf	Fix an issue with the classpath for external builds in build.xml Use "path" instead of "pathconvert" to construct the external.gatk.classpath. This allows the path to evolve as the build progresses, instead of being fixed early on to a value that (in some cases) could be incorrect.	2012-08-15 16:08:21 -04:00
Guillermo del Angel	db92671b7f	Merge branch 'master' of ssh://gsa4.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-08-15 13:32:07 -04:00
Guillermo del Angel	7599f94296	Updating large-scale validation script to new UG syntax	2012-08-15 13:31:45 -04:00
Eric Banks	87e41c83c5	In AlleleCount stratification, check to make sure the AC (or MLEAC) is valid (i.e. not higher than number of chromosomes) and throw a User Error if it isn't. Added a test for bad AC.	2012-08-14 15:02:30 -04:00
Eric Banks	8e3774fb0e	Fixing behavior of the --regenotype argument in SelectVariants to properly run in GenotypeGivenAlleles mode. Added integration tests to cover recent SV changes.	2012-08-14 14:21:42 -04:00
Eric Banks	34b62fa092	Two changes to SelectVariants: 1) don't add DP INFO annotation if DP wasn't used in the input VCF (it was adding DP=0 previously). 2) If MLEAC or MLEAF is present in the original VCF and the number of samples decreases, remove those annotations from the VC.	2012-08-14 12:54:31 -04:00
Eric Banks	cfb994abd2	Trivial removal of ununsed variable (mentioned in resolved JIRA entry)	2012-08-13 22:55:02 -04:00
Khalid Shakir	f809f24afb	Removed SelectHeader's --include_reference_name option since the reference is always included. In SelectHeaders instead of including the path to the file, only include the name of the reference since dbGaP does not like paths in headers.	2012-08-13 16:49:27 -04:00
Khalid Shakir	22b4466cf5	Added setupRetry() to modify jobs when Queue is run with '-retry' and jobs are about to restart after an error. Implemented a mixin called "RetryMemoryLimit" which will by default double the memory. GridEngine memory request parameter can be selected on the command line via '-resMemReqParam mem_free' or '-resMemReqParam virtual_free'. Java optimizations now enabled by default: - Only 4 GC threads instead of each job using java's default O(number of cores) GC threads. Previously on a machine with N cores if you have N jobs running and java allocates N GC threads by default, then the machines are using up to N^2 threads if all jobs are in heavy GC (thanks elauzier). - Exit if GC spends more than 50% of time in GC (thanks ktibbett). - Exit if GC reclaims lest than 10% of max heap (thanks ktibbett). Added a -noGCOpt command line option to disable new java optimizations.	2012-08-13 15:43:05 -04:00
Mark DePristo	6ad75d2f5c	Reverting changes to BCF2 ranges -- The previously expanded ones are actually the missing values in the range. The previous ranges were correct. Removed the TODO to confirm them, as they are now officially confirmed	2012-08-13 15:06:28 -04:00
Mark DePristo	4d3fad38e9	Increase allowable range for BCF2 by -1 on low-end	2012-08-13 14:20:26 -04:00
Mark DePristo	4cbd11faf5	Fixed spelling error in BQSR.R	2012-08-13 10:01:33 -04:00
Mark DePristo	aab417c94d	Fix missing argument in unittest	2012-08-12 13:58:14 -04:00
Mark DePristo	f032e0aba4	A bit better output for ContextCovariate context size logging	2012-08-12 13:45:52 -04:00
Mark DePristo	243af0adb1	Expanded the BQSR reporting script -- Includes header page -- Table of arguments (Arguments) -- Summary of counts (RecalData0) -- Summary of counts by qual (RecalData1) -- Fixed bug in output that resulted in covariates list always being null (updated md5s accordingly) -- BQSR.R loads all relevant libaries now, include gplots, grid, and gsalib to run correctly	2012-08-12 13:45:14 -04:00
Mark DePristo	458bbdee8f	Add useful logger.info telling us the mismatch and indel context sizes	2012-08-12 10:27:05 -04:00
Ami Levy Moonshine	6fefdaf428	"update integration tests in CombineVariantsIntegrationTest" Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-08-10 17:00:35 -04:00
Ami Levy Moonshine	4968daf0a5	update integration tests at CombineVariantsIntegrationTest	2012-08-10 16:58:05 -04:00
Eric Banks	1a87f67258	Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-08-10 15:11:12 -04:00
Eric Banks	40f0320a1c	When adding a unit test to LIBS for X and = CIGAR operators, I uncovered a bug with the implementation of the ReadBackedPileup.depthOfCoverage() method.	2012-08-10 14:58:29 -04:00
Eric Banks	eca9613356	Adding support of X and = CIGAR operators to the GATK	2012-08-10 14:54:07 -04:00
Joel Thibault	949ed207ca	capMaxAllelesForIndels -> capMaxAltAllelesForIndels	2012-08-10 14:25:13 -04:00
Joel Thibault	b17edaad66	Change memoryValues to List[Double]	2012-08-10 14:25:12 -04:00
Joel Thibault	32a66b5ae4	Add -nt parameter	2012-08-10 14:25:12 -04:00
Ryan Poplin	2a113977a9	Resolving merge conflicts with the new MD5s	2012-08-10 11:47:00 -04:00
Ryan Poplin	5f82ffd5d8	Adding LowQual filter to the output of the HaplotypeCaller.	2012-08-10 11:25:14 -04:00
David Roazen	d7d7ccf789	Revert unintentional license change	2012-08-09 17:10:47 -04:00
David Roazen	d56a4631dc	Update cofoja version in build.xml	2012-08-09 17:08:43 -04:00
Ami Levy Moonshine	68fb04b8f7	Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable into testing	2012-08-09 16:48:22 -04:00
Mark DePristo	3362584014	Updating cofoja to the latest version	2012-08-09 16:36:18 -04:00
Mark DePristo	06258c8a01	BCF2 optimizations -- Added Write method to BCF2 types that directly converts int value to byte stream. Deleted writeRawBytes(int) -- encodeTypeDescriptor semi-inlined into encodeType so that the tests for overflow are done in just one place -- Faster implementation of determineIntegerType for int[] values	2012-08-09 16:36:18 -04:00
Mark DePristo	c6bd9b15ff	BCF2 optimizations -- BCF2Type enum has an overloaded method to read the type as an int from an input stream. This gets rid of a case statement and replaces it with just minimum tiny methods that should be better optimized. As side effect of this optimization is an overall cleaner code organization	2012-08-09 16:36:18 -04:00
Mark DePristo	9a0dda71d4	BCF2 optimizations -- All low-level reads throw IOException instead of catching it directly. This allows us to not try/catch in readByte, improving performance by 5% or so -- Optimize encodeTypeDescriptor with final variables. Avoid using Math.min instead do inline comparison -- Inlined willOverflow directly in its single use	2012-08-09 16:36:18 -04:00
Ryan Poplin	9887bc4410	Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-08-09 16:31:06 -04:00
Ryan Poplin	f4c72a26d5	A few quick, minor findbugs fixes.	2012-08-09 16:30:58 -04:00
Ryan Poplin	c7f22e410f	A few quick, minor findbugs fixes.	2012-08-09 16:22:08 -04:00
Mauricio Carneiro	abb168e1ba	Merged bug fix from Stable into Unstable	2012-08-09 16:09:58 -04:00
Mauricio Carneiro	67d4148b32	Fixing but reported by Thomas in the forum where reads were soft-clipped beyond the limits of the contig and ReduceReads was failing with a NoSuchElement exception. Now we hard clip anything that goes beyond the boundaries of the contig.	2012-08-09 15:58:18 -04:00
Mauricio Carneiro	58420098ac	Merged bug fix from Stable into Unstable	2012-08-09 13:02:23 -04:00
Mauricio Carneiro	c6132ebe26	Fixed divide by zero bug when downsampler goes over regions where reads are all filtered out. Added Guillermo's bug report as an integration test	2012-08-09 13:02:11 -04:00
Eric Banks	def077c4e5	There's actually a subtle but important difference between foo++ and ++foo	2012-08-09 12:42:50 -04:00
Ryan Poplin	e48727dae3	Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-08-09 10:31:10 -04:00
Guillermo del Angel	5be7e0621d	Merge branch 'master' of ssh://gsa4.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-08-09 09:58:34 -04:00

... 2 3 4 5 6 ...

10391 Commits (bedcdbdc5f1fbfa1d2c99ea3afa2a902c60cab50) All Branches Search

10391 Commits (bedcdbdc5f1fbfa1d2c99ea3afa2a902c60cab50)

All Branches