gatk-3.8

Commit Graph

Author	SHA1	Message	Date
Eric Banks	bedcdbdc5f	Fixing merge conflict	2012-08-27 12:16:51 -04:00
Eric Banks	3d476487c6	LIBS is totally busted for deletions. Putting a check in AD for bad pileup event bases so that we don't produce busted alleles. We must fix LIBS ASAP.	2012-08-27 12:13:12 -04:00
Mark DePristo	63a9ae817a	Ensure thread-safety of CachingIndexedFastaSequenceFile -- Cosmetic cleanup of ReadReferenceView -- TraverseReadsNano provides the reference context, since it's thread-safe -- Cleanup CachingIndexedFastaSequenceFile. Add docs, remove unnecessary setters -- Expand CachingIndexedFastaSequenceFileUnitTest to test explicitly multi-threaded safety.	2012-08-27 12:11:54 -04:00
Mark DePristo	e5b1f1c7f4	Add simple main function to unit test so we can run the nano scheduler test from the command line	2012-08-27 12:11:54 -04:00
Khalid Shakir	2d1ea7124b	One less Queue command line requirement: -tempDir now defaults to .queue/tmp. Also moved queueScatterGather to .queue/scatterGather.	2012-08-27 12:04:50 -04:00
Mark DePristo	68c5142d2d	numThreads > 1 any time you have -nt > 1 silly	2012-08-26 14:36:13 -04:00
Mark DePristo	faacacd6c0	Increase runtime of nano scheduler tests to 1 min	2012-08-26 08:42:58 -04:00
Mark DePristo	846e0c11bc	Add TimeOuts to new threading tests, in case there's a underlying deadlock	2012-08-26 08:18:43 -04:00
Mark DePristo	fde9824765	Optimizations for parallel read walkers -- TraversalReadsNano only creates the NanoScheduler once, and shuts it down onTraversalDone -- Nicer debugging output in NanoScheduler -- ReadShard has a getBufferSize() method now	2012-08-25 17:21:12 -04:00
Mark DePristo	5066b14335	Parallel FlagStat	2012-08-25 17:21:12 -04:00
Mark DePristo	af540888f1	Limited version of parallel read walkers -- Currently doesn't support accessing reference or ROD data -- Parallel versions of PrintReads and CountReads	2012-08-25 17:21:12 -04:00
Mark DePristo	e060b148e2	Minor cleanup of TraverseReads	2012-08-25 17:21:11 -04:00
Mark DePristo	275a5e5439	More tests for NanoScheduler -- Add more contracts -- Test in the UnitTest that the reduce is being called in the correct order	2012-08-25 17:21:11 -04:00
Christopher Hartl	6db0988898	Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-08-25 15:40:32 -04:00
Christopher Hartl	db2e88c7cb	Fix for badIndelLength() throwing NPE at non-indel sites. Added integration test.	2012-08-25 12:38:23 -07:00
Mark DePristo	59b5913b54	Merged bug fix from Stable into Unstable	2012-08-25 14:53:22 -04:00
Mark DePristo	dcc972a557	Usability cleanup for BQSR -- I'm seeing a lot of people trying to use BinaryTagCovariate in the community. They really shouldn't do this, so I moved it to private. -- Throw an exception if its required bintag argument is missing -- Check explicitly if user is requesting DinucCovariate and tell them that its been retired in favor of ContextCovariate -- Show the type (Required, Experimental, Standard) of the covariates when running --list	2012-08-25 14:53:00 -04:00
Mark DePristo	1044ddbc26	Minor improvement in naming of BQSR tests	2012-08-25 14:06:55 -04:00
Mark DePristo	58ca3f61df	Fix horrible bug in the classification of runs as successes, sting-exceptions, or user-exceptions in analyzeRunReports -- Old logic was just busted.	2012-08-25 14:06:55 -04:00
Christopher Hartl	b59948709f	Code improvements re: JIRA GSA-510. Trio class migrated into the Samples package - because the trio structure is so ubiquitously used, it makes sense, I think, to have a class which imposes the structure on the samples. Existing functions which slightly duplicated the getTrios() method look like they have bugs. These functions are now deprecated. A number of functions int he sampleDB looked to be assuming that samples could not share IDs (e.g. sample IDs are unique, so a sample present in two families could not be represented by multiple Sample objects). Added an assertion in the SampleDBBuilder to document/test this assumption. MVLikelihoodRatio now uses the trio methods from SampleDB.	2012-08-25 08:48:27 -07:00
Mark DePristo	0996bbd548	Comments for Chris on cleanup	2012-08-24 16:04:58 -04:00
Mark DePristo	649b82ce85	Merge branch 'nanoScheduler' Conflicts: private/scala/qscript/org/broadinstitute/sting/queue/qscripts/performance/GATKPerformanceOverTime.scala	2012-08-24 15:59:36 -04:00
Mark DePristo	801b910b9e	GATKPerformanceOverTime is finalized (mark II) -- Make BQSR run longer -- Use Dinuc not context covariates for BQSR v1	2012-08-24 15:57:48 -04:00
Mark DePristo	62aa0ac77e	GATKPerformanceOverTime is finalized -- Update BQSR to run v1 and v2. Use new single read group extracted BAM -- Bug fixes	2012-08-24 15:57:48 -04:00
Mark DePristo	3bbdccb0ae	Refactor and cleanup GATKPerformanceOverTime -- Use single read group BAM file for BQSR -- Implement terrible (but clever) hack to support BQSR v1 and v2 in a single Scala class.	2012-08-24 15:57:48 -04:00
Mark DePristo	9f0eff4c4c	MySQLdb required to run analyzeRunReports, despite my best efforts	2012-08-24 15:57:48 -04:00
Mark DePristo	9de8077eeb	Working (efficient?) implementation of NanoScheduler -- Groups inputs for each thread so that we don't have one thread execution per map() call -- Added shutdown function -- Documentation everywhere -- Code cleanup -- Extensive unittests -- At this point I'm ready to integrate it into the engine for CPU parallel read walkers	2012-08-24 15:34:23 -04:00
Christopher Hartl	752f44c332	Code cleanup in MVLR and SelectVariants. Should fix JIRA GSA-509 and GSA-510	2012-08-24 12:25:11 -07:00
Mark DePristo	d6e6b30caf	Initial implementation of GSA-515: Nanoscheduler – Write general NanoScheduler framework in utils.threading. Test with reading via iterator from list of integers, map is int * 2, reduce is sum. Should be efficiency using resources to do sum of 2 * (sum(1 - X)). Done! CPU parallelism is nano threads. Pfor across read / map / reduce. Use work queue to implement. Create general read map reduce framework in utils. Test parallelism independently before hooking up to Locus iterator Represent explicitly the dependency graph. Scheduler should choose the work units that are ready for computation, that are marked as "completing a computation", and then finally that maximize the number of sequent available work units. May be worth measuring expected cost for read read / map / reduce unit and use it to balance the compute As input is single threaded just need one thread to populate inputs, which runs as fast as possible on parallel pushing data to fixed size queue. Each push creates map job and links to upcoming reduce job. Note that there's at most one thread for IO tasks, and all of the threads can contribute to CPU tasks	2012-08-24 14:07:44 -04:00
Eric Banks	0545664f91	Fix ClassCastException seen in Tableau errors	2012-08-24 13:45:48 -04:00
Mark DePristo	b3fd74f0c4	HaplotypeCaller forbids BAQ	2012-08-24 13:25:05 -04:00
Eric Banks	740520c23b	Fix BQSR docs	2012-08-24 13:20:10 -04:00
Mark DePristo	c689d6dcac	GATKPerformanceOverTime is finalized -- Update BQSR to run v1 and v2. Use new single read group extracted BAM -- Bug fixes	2012-08-24 09:20:32 -04:00
Mark DePristo	8371362f3c	Refactor and cleanup GATKPerformanceOverTime -- Use single read group BAM file for BQSR -- Implement terrible (but clever) hack to support BQSR v1 and v2 in a single Scala class.	2012-08-23 21:11:15 -04:00
Mark DePristo	b6cc615890	MySQLdb required to run analyzeRunReports, despite my best efforts	2012-08-23 21:08:32 -04:00
Mark DePristo	1999b95754	Work around for GSA-513: ClassCastException in VariantEval	2012-08-23 18:14:49 -04:00
Christopher Hartl	f1166d6d00	Spotted a potential bug where sample IDs passed in from the meta data were only checked against the sample IDs in the VCF header if the input file happened to be a meta data file rather than a fam file. Added a check for fam files as well, and added an integration test to cover each case.	2012-08-23 11:43:19 -07:00
Mark DePristo	2ae5ec5596	Update for GSA-506: Add nt and efficiency information to GATKRunReport -- Python log upload now includes efficiency information in GATKLogs DB	2012-08-23 12:53:22 -04:00
Mark DePristo	857b11b26f	Done with GSA-506: Add nt and efficiency information to GATKRunReport -- GATKRunReports contain itemized information about the numThreads used to execute the GATK, as well as the efficiency of the use of those threads to get real work done, including time spent running, waiting, blocking, and waiting for IO -- See https://jira.broadinstitute.org/browse/GSA-506 for more details	2012-08-23 09:59:53 -04:00
Mark DePristo	0b735884db	Cleanup code in VariantContext	2012-08-23 09:59:53 -04:00
Mark DePristo	d973863039	GATKPerformanceOverTime includes longer running tests for select variants and variant eval	2012-08-23 09:59:53 -04:00
Eric Banks	e5df91aa23	Looks like the @WalkerName annotation doesn't work with the GATK docs, so I'm renaming the walkers.	2012-08-22 20:17:39 -04:00
Mark DePristo	95a1337285	Merge branch 'threadMonitors' Conflicts: private/scala/qscript/org/broadinstitute/sting/queue/qscripts/performance/GATKPerformanceOverTime.scala	2012-08-22 16:54:47 -04:00
Mark DePristo	63af0cbcba	Cleanup GATK efficiency monitor classes -- Invert logic in GATKArgumentCollection to disable monitoring, not enable. That means monitoring is on by default -- Fix testing error in unit tests -- Rename variables in ThreadAllocation to be clearer	2012-08-22 16:48:02 -04:00
Mark DePristo	1d47d2b573	Fix GATKPerformanceOverTime for BQSR file path error	2012-08-22 16:48:02 -04:00
Mark DePristo	e1293f0ef2	GSA-507: Thread monitoring refactored so it can work without a thread factory -- Old version StateMonitoringThreadFactory refactored into base class ThreadEfficiencyMonitor and subclass EfficiencyMonitoringThreadFactory. -- Base class is used by LinearMicroScheduler to monitor performance of GATK in single threaded mode -- MicroScheduler now handles management of the efficiency monitor. Includes master thread in monitor, meaning that reduce is now included for both schedulers	2012-08-22 16:48:01 -04:00
Mark DePristo	f876c51277	Separately track time spent doing user and system CPU work -- Allows us to ID (by proxy) time spent doing IO -- Refactor StateMonitoryingThreadFactory to use it's own enum, not Thread.State -- Reliable unit tests across mac and unix	2012-08-22 16:48:01 -04:00
Mark DePristo	18060f237b	Add thread efficiency monitoring to GATK HMS -- See https://jira.broadinstitute.org/browse/GSA-502 -- New command line argument -mt enables thread monitoring -- If enabled, HMS uses StateMonitoringThreadFactory to create monitored threads, and prints out an efficiency report when HMS exits, telling the user information like: for BQSR – known to be inefficient locking INFO 17:10:33,195 StateMonitoringThreadFactory - Number of activeThreads used: 8 INFO 17:10:33,196 StateMonitoringThreadFactory - Total runtime 90.3 m INFO 17:10:33,196 StateMonitoringThreadFactory - Fraction of time spent blocked is 0.72 ( 64.8 m) INFO 17:10:33,197 StateMonitoringThreadFactory - Fraction of time spent running is 0.26 ( 23.7 m) INFO 17:10:33,197 StateMonitoringThreadFactory - Fraction of time spent waiting is 0.02 ( 112.8 s) INFO 17:10:33,197 StateMonitoringThreadFactory - Efficiency of multi-threading: 26.19% of time spent doing productive work for CountLoci INFO 17:06:12,777 StateMonitoringThreadFactory - Number of activeThreads used: 8 INFO 17:06:12,777 StateMonitoringThreadFactory - Total runtime 43.5 m INFO 17:06:12,778 StateMonitoringThreadFactory - Fraction of time spent blocked is 0.00 ( 4.2 s) INFO 17:06:12,778 StateMonitoringThreadFactory - Fraction of time spent running is 1.00 ( 43.3 m) INFO 17:06:12,779 StateMonitoringThreadFactory - Fraction of time spent waiting is 0.00 ( 6.0 s) INFO 17:06:12,779 StateMonitoringThreadFactory - Efficiency of multi-threading: 99.61% of time spent doing productive work	2012-08-22 16:48:01 -04:00
Mark DePristo	27842ba448	run_performance_tests use bsub and gsa -- Confirmed that running on gsa queue is fine with sufficient iterations (3)	2012-08-22 16:48:01 -04:00
Mark DePristo	d7a6cd99cd	Expand intervals processed for many GATKPerformanceOverTime commands -- For the high NT tests the total runtime may be too short to really assess nt efficiency vs. start up costs. Reworked underlying test data and intervals so that most tests run in 10-20 hrs for -nt 1.	2012-08-22 16:48:01 -04:00

1 2 3 4 5 ...

10391 Commits (bedcdbdc5f1fbfa1d2c99ea3afa2a902c60cab50) All Branches Search

10391 Commits (bedcdbdc5f1fbfa1d2c99ea3afa2a902c60cab50)

All Branches