gatk-3.8

Commit Graph

Author	SHA1	Message	Date
Mark DePristo	6a2862e8bc	GSA-483: Bug in GATKdocs for Enums -- Fixed to no long show constants in enums as constant values in the gatkdocs	2012-08-16 16:24:17 -04:00
Eric Banks	3253fc216b	FindBugs 'Maintainability' fixes	2012-08-16 15:53:06 -04:00
Eric Banks	05cbf1c8c0	FindBugs 'Efficiency' fixes	2012-08-16 15:40:52 -04:00
Mark DePristo	d8071c66ed	Removing SlowGenotype object from GATK	2012-08-16 15:23:06 -04:00
Eric Banks	a22e7a5358	Should've run 'ant clean' instead of just 'ant'. In any event, these are 2 cases where we are setting a class's internal static variable directly. Very dangerous.	2012-08-16 15:07:32 -04:00
Eric Banks	47b4f7b7e5	One final FindBugs related fix. I think it's safe to consider these changes 'fixes' that are allowed to go in during a code freeze.	2012-08-16 14:59:05 -04:00
Eric Banks	ded0e11b45	Killing off some FindBugs 'Realiability' issues	2012-08-16 14:00:48 -04:00
Eric Banks	dac3958461	Killing off some FindBugs 'Usability' issues	2012-08-16 13:32:44 -04:00
Eric Banks	611d9b61e2	Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-08-16 13:05:36 -04:00
Eric Banks	2df04dc48a	Fix for performance problem in GGA mode related to previous --regenotype commit. Instead of trying to hack around the determination of the calculation model when it's not needed, just simply overload the calculateGenotypes() method to add one that does simple genotyping. Re-enabling the Pool Caller integration tests.	2012-08-16 13:05:17 -04:00
Mark DePristo	132cdfd9c1	GSA-488: MLEAC > AN error when running variant eval fixed	2012-08-16 13:03:14 -04:00
Mark DePristo	4e42988c66	GSA-485: Remove repairVCFHeader from GATK codebase -- Removed half-a*ssed attempt to automatically repair VCF files with bad headers, which allowed users to provide a replacement header overwriting the file's actually header on the fly. Not a good idea, really. Eric has promised to create a utility that walks through a VCF file and creates a meaningful header field based on the file's contents (if this ever becomes a priority)	2012-08-16 13:03:13 -04:00
Mark DePristo	52bfe8db8a	Make sure the storage writer is closed before running mergeInfo in multi-threaded output management -- It's not clear this is cause of GSA-484 but it will help confirm that it's not the cause	2012-08-16 13:03:13 -04:00
Mark DePristo	7a247df922	Added -bcf argument to VCFWriter output to force BCF regardless of file extension -- Now possible to do -o /dev/stdout -bcf -l DEBUG > tmp.bcf and create a valid BCF2 file -- Cleanup code to make sure extensions easier by moving to a setX model in VariantContextWriterStub	2012-08-16 13:03:13 -04:00
Mark DePristo	28c8e3e6d7	Cleanup BCF2Codec -- Remove FORBID_SYMBOLIC global that is no longer necessary -- all error handling goes via error() function	2012-08-16 13:03:13 -04:00
Mark DePristo	9dc694b2e9	Meaningful error message and keeping tmp file when mergeInfo fails -- BCF2 is failing for some reason when merging tmp. files with parallel combine variants. ThreadLocalOutputTracker no longer sets deleteOnExit on the tmp file, as this prevents debugging. And it's unnecessary because each mergeInto was deleting files as appropriate -- MergeInfo in VariantContextWriterStorage only deletes the intermediate output if an error occurs	2012-08-16 13:03:13 -04:00
Mark DePristo	a9a1c499fd	Update md5 in VariantRecalibrationWalkers test for BCF2 -- only encoding differences	2012-08-16 13:03:13 -04:00
Eric Banks	04be0c92bf	Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-08-15 23:13:32 -04:00
Eric Banks	9035b554fb	Adding tests for the --solid_nocall_strategy argument	2012-08-15 23:13:24 -04:00
David Roazen	fa7605c643	Convert external.build.dir and external.dist.dir back to paths The previous push fixed the external classpath issue but broke external builds in a new way by changing the above from paths to properties. This was a mistake, since external builds require absolute, not relative, paths Thanks to akiezun for the bug report and patch	2012-08-15 23:04:10 -04:00
Eric Banks	f368e568db	Implementing support in BaseRecalibrator for SOLiD no call strategies other than throwing an exception. For some reason we never transfered these capabilities into BQSRv2 earlier.	2012-08-15 22:52:56 -04:00
Eric Banks	9d09230c26	Better docs for verbose output of Pileup	2012-08-15 21:55:08 -04:00
Mark DePristo	3556c36668	Disable general ploidy integration tests because they are running forever	2012-08-15 21:13:16 -04:00
Mark DePristo	c0a31b2e5b	CombineVariants parallel integration tests -- All tests but one (using old bad VCF3 input) run unmodified with parallel code. -- Disabled UNSAFE_VCF_PROCESSING for all but that test, which changes md5s because the output files have fixed headers -- Minor optimizations to simpleMerge	2012-08-15 21:13:16 -04:00
Mark DePristo	669c43031a	BCF2 optimizations; parallel CombineVariants -- BCF2 now determines whether it can safely write out raw genotype blocks, which is true in the case where the VCF header of the input is a complete, ordered subset of the output header. Added utilities to determine this and extensive unit tests (headerLinesAreOrderedConsistently) -- Cleanup collapseStringList and exploreStringList for new unit tests of BCF2Utils. Fixed bug in edge case that never occurred in practice -- VCFContigHeaderLine now provides its own key (VCFHeader.CONTIG_KEY) directly instead of requiring the user to provide it (and hoping its right) -- More ways to access the data in VCFHeader -- BCF2Writer uses a cache to avoid recomputing unnecessarily whether raw genotype blocks can be emitted directly into the output -- Optimization of fullyDecodeAttributes -- attributes.size() is expensive and unnecessary. We just guess that on average we need ~10 elements for the attribute map -- CombineVariants optimization -- filters are online HashSet but are sorted at the end by creating a TreeSet -- makeCombinations is now makePermutations, and you can request to create the permutations with or without replacement	2012-08-15 21:13:16 -04:00
Mark DePristo	dafa7e3885	Temporarily disable StateMonitoringThreadTests while I get them reliably working across platforms	2012-08-15 21:13:16 -04:00
Mark DePristo	2da82e27ac	Autoscaling of memory requirements in GATKPerformanceOverTime	2012-08-15 21:13:15 -04:00
Mark DePristo	d70fd18900	Minor increase in tolerance to sum of states in UnitTest for StateMonitoringThreadFactory	2012-08-15 21:13:15 -04:00
Mark DePristo	ae4d4482ac	Parallel combine variants! -- CombineVariants is now TreeReducible! -- Integration tests running in parallel all pass except one (will fix) due to incorrect use of db=0 flag on input from old VCF format	2012-08-15 21:13:15 -04:00
Mark DePristo	bd7ed0d028	Enable efficient parallel output of BCF2 -- Previous IO stub was hardcoded to write VCF. So when you ran -nt 2 -o my.bcf you actually created intermediate VCF files that were then encoded single threaded as BCF. Now we emit natively per thread BCF, and use the fast mergeInfo code to read BCF -> write BCF. Upcoming optimizations to avoid decoding genotype data unnecessarily will enable us to really quickly process BCF2 in parallel -- VariantContextWriterStub forces BCF output for intermediate files -- Nicer debug log message in BCF2Codec -- Turn off debug logging of BCF2LazyGenotypesDecoder -- BCF2FieldWriterManager now uses .debug not .info, so you won't see all of that field manager debugging info with BCF2 any longer -- VariantContextWriterFactory.isBCFOutput now has version that accepts just a file path, not path + options	2012-08-15 21:13:15 -04:00
Mark DePristo	290fd33f3b	GATKPerformanceOfTime.R updated to new script outputs	2012-08-15 21:13:15 -04:00
Mark DePristo	9459e6203a	Clean, documented implementation of ThreadFactory that monitors running / blocking / waiting time of threads it creates -- Expanded unit tests -- Support for clean logging of results to logger -- Refactored MyTime into AutoFormattingTime in Utils, out of TraversalEngine, for cleanliness and reuse -- Added docs and contracts to StateMonitoringThreadFactory	2012-08-15 21:13:15 -04:00
Mark DePristo	be3230a1fd	Initial implementation of ThreadFactory that monitors running / blocking / waiting time of threads it creates -- Created makeCombinations utility function (very useful!). Moved template from VariantContextTestProvider -- UnitTests for basic functionality	2012-08-15 21:13:15 -04:00
Mark DePristo	fc1bd82011	GATK performance of time lives -- Cut out CountCovariates and TableRecalibrator (no longer in GATK2) -- Parallelism tests go up to 32 cores by default now -- Only tests 1.6 and 2.0 now -- Useful -justUG option to just run all of the UG performance tests	2012-08-15 21:13:15 -04:00
Mark DePristo	f277d7c09e	Removing parallelism bottleneck in the GATK -- GenomeLocParser cache was a major performance bottleneck in parallel GATK performance. With 10 thread > 50% of each thread's time was spent blocking on the MasterSequencingDictionary object. Made this a thread local variable. -- Now we can run the GATK with 48 threads efficiently on GSA4! -- Running -nt 1 => 75 minutes (didn't let is run all of the way through so likely would take longer) -- Running -nt 24 => 3.81 minutes	2012-08-15 21:13:15 -04:00
Mauricio Carneiro	cbf290ada0	Reving CoverageByRG to the new GATKReport API	2012-08-15 16:12:40 -04:00
David Roazen	9b84fa20bf	Fix an issue with the classpath for external builds in build.xml Use "path" instead of "pathconvert" to construct the external.gatk.classpath. This allows the path to evolve as the build progresses, instead of being fixed early on to a value that (in some cases) could be incorrect.	2012-08-15 16:08:21 -04:00
Guillermo del Angel	db92671b7f	Merge branch 'master' of ssh://gsa4.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-08-15 13:32:07 -04:00
Guillermo del Angel	7599f94296	Updating large-scale validation script to new UG syntax	2012-08-15 13:31:45 -04:00
Eric Banks	87e41c83c5	In AlleleCount stratification, check to make sure the AC (or MLEAC) is valid (i.e. not higher than number of chromosomes) and throw a User Error if it isn't. Added a test for bad AC.	2012-08-14 15:02:30 -04:00
Eric Banks	8e3774fb0e	Fixing behavior of the --regenotype argument in SelectVariants to properly run in GenotypeGivenAlleles mode. Added integration tests to cover recent SV changes.	2012-08-14 14:21:42 -04:00
Eric Banks	34b62fa092	Two changes to SelectVariants: 1) don't add DP INFO annotation if DP wasn't used in the input VCF (it was adding DP=0 previously). 2) If MLEAC or MLEAF is present in the original VCF and the number of samples decreases, remove those annotations from the VC.	2012-08-14 12:54:31 -04:00
Eric Banks	cfb994abd2	Trivial removal of ununsed variable (mentioned in resolved JIRA entry)	2012-08-13 22:55:02 -04:00
Khalid Shakir	f809f24afb	Removed SelectHeader's --include_reference_name option since the reference is always included. In SelectHeaders instead of including the path to the file, only include the name of the reference since dbGaP does not like paths in headers.	2012-08-13 16:49:27 -04:00
Khalid Shakir	22b4466cf5	Added setupRetry() to modify jobs when Queue is run with '-retry' and jobs are about to restart after an error. Implemented a mixin called "RetryMemoryLimit" which will by default double the memory. GridEngine memory request parameter can be selected on the command line via '-resMemReqParam mem_free' or '-resMemReqParam virtual_free'. Java optimizations now enabled by default: - Only 4 GC threads instead of each job using java's default O(number of cores) GC threads. Previously on a machine with N cores if you have N jobs running and java allocates N GC threads by default, then the machines are using up to N^2 threads if all jobs are in heavy GC (thanks elauzier). - Exit if GC spends more than 50% of time in GC (thanks ktibbett). - Exit if GC reclaims lest than 10% of max heap (thanks ktibbett). Added a -noGCOpt command line option to disable new java optimizations.	2012-08-13 15:43:05 -04:00
Mark DePristo	6ad75d2f5c	Reverting changes to BCF2 ranges -- The previously expanded ones are actually the missing values in the range. The previous ranges were correct. Removed the TODO to confirm them, as they are now officially confirmed	2012-08-13 15:06:28 -04:00
Mark DePristo	4d3fad38e9	Increase allowable range for BCF2 by -1 on low-end	2012-08-13 14:20:26 -04:00
Mark DePristo	4cbd11faf5	Fixed spelling error in BQSR.R	2012-08-13 10:01:33 -04:00
Mark DePristo	aab417c94d	Fix missing argument in unittest	2012-08-12 13:58:14 -04:00
Mark DePristo	f032e0aba4	A bit better output for ContextCovariate context size logging	2012-08-12 13:45:52 -04:00

1 2 3 4 5 ...

10269 Commits (6a2862e8bcfca67ed4c1169d1aac0ffab6dfdc86) All Branches Search

10269 Commits (6a2862e8bcfca67ed4c1169d1aac0ffab6dfdc86)

All Branches