-- Underlying system now uses long nano times to be more consistent with standard java practice
-- Updated a few places in the code that were converting from nanoseconds to double seconds to use the new nanoseconds interface directly
-- Bringing us to 100% test coverage with clover with AutoFormattingTimeUnitTest
-Switch back to the old implementation, if needed, with --use_legacy_downsampler
-LocusIteratorByStateExperimental becomes the new LocusIteratorByState, and
the original LocusIteratorByState becomes LegacyLocusIteratorByState
-Similarly, the ExperimentalReadShardBalancer becomes the new ReadShardBalancer,
with the old one renamed to LegacyReadShardBalancer
-Performance improvements: locus traversals used to be 20% slower in the new
downsampling implementation, now they are roughly the same speed.
-Tests show a very high level of concordance with UG calls from the previous
implementation, with some new calls and edge cases that still require more examination.
-With the new implementation, can now use -dcov with ReadWalkers to set a limit
on the max # of reads per alignment start position per sample. Appropriate value
for ReadWalker dcov may be in the single digits for some tools, but this too
requires more investigation.
-- The NanoSchedule timing code (in NSRuntimeProfile) was crazy expensive, but never showed up in the profilers. Removed all of the timing code from the NanoScheduler, the NSRuntimeProfile itself, and updated the unit tests.
-- For tools that largely pass through data quickly, this change reduces runtimes by as much as 10x. For the RealignerTargetCreator example, the runtime before this commit was 3 hours, and after is 30 minutes (6x improvement).
-- Took this opportunity to improve the GATK ProgressMeter. NotifyOfProgress now just keeps track of the maximum position seen, and a separate daemon thread ProgressMeterDaemon periodically wakes up and prints the current progress. This removes all inner loop calls to the GATK timers.
-- The history of the bug started here: http://gatkforums.broadinstitute.org/discussion/comment/2402#Comment_2402
-- Providing this optional argument -maxRuntime (in -maxRuntimeUnits units) causes the GATK to exit gracefully when the max. runtime has been exceeded. By cleanly I mean that the engine simply stops at the next available cycle in the walker as through the end of processing had been reached. This means that all output files are closed properly, etc.
-- Emits an info message that looks like "INFO 10:36:52,723 MicroScheduler - Aborting execution (cleanly) because the runtime has exceeded the requested maximum 10.0000 s". Otherwise there's currently no way to differentiate a truly completed run from a timelimit exceeded run, which may be a useful thing for a future update
-- Resolves GSA-630 / GATK max runtime to deal with bad LSA calling?
-- Added new JIRA entry for Ami to restart chr1 macarthur with this argument set to -maxRuntime 1 -maxRuntimeUnits DAYS to see if we can do all of chr1 in one weekend.
-- There's been no report of problems with the nano scheduled version of TraverseLoci and TraverseReads, so I'm removing the old versions since they are no longer needed
-- Removing unnecessary intermediate base classes
-- GSA-515 / Nanoscheduler GSA-549 / https://jira.broadinstitute.org/browse/GSA-549
-- Fixes monster bug in the way that traversal engines interacted with the NanoScheduler via the output tracker.
-- ThreadLocalOutputTracker is now a ThreadBasedOutputTracker that associates via a map from a master thread -> the storage map. Lookups occur by walking through threads in the same thread group, not just the thread itself (TBD -- should have a map from ThreadGroup instead)
-- Removed unnecessary debug statement in GenomeLocParser
-- nt and nct officially work together now
-- See https://jira.broadinstitute.org/browse/GSA-573
-- Uses InheritedThreadLocal storage so that children threads created by the NanoScheduler see the parent stubs in the main thread.
-- Added explicit integration test that checks that -nt 1, 2 and -nct 1, 2 give the same results for GLM BOTH with the UG over 1 MB.
-- Renamed TraversalErrorManager to the more general MultiThreadedErrorTracker
-- ErrorTracker is now used throughout the NanoScheduler. In order to properly handle errors, the work previously done by main thread (submit jobs, block on reduce) is now handled in a separate thread. The main thread simply wakes up peroidically and checks whether the reduce result is available or if an error has occurred, and handles each appropriately.
-- EngineFeaturesIntegrationTest checks that -nt and -nct properly throw errors in Walkers
-- Added NanoSchedulerUnitTest for input errors
-- ThreadEfficiencyMonitoring is now disabled by default, and can be enabled with a GATK command line option. This is because the monitoring doesn't differentiate between threads that are supposed to do work, and those that are supposed to wait, and therefore gives misleading results.
-- Build.xml no longer copies the unittest results verbosely
-- Refactored error handling from HMS into utils.TraversalErrorManager, which is now used by HMS and will be usable by NanoScheduler
-- Generalized EngineFeaturesIntegrationTest to test map / reduce error throwing for nt 1, nt 2 and nct 2 (disabled)
-- Added unit tests for failing input iterator in NanoScheduler (fails)
-- Made ErrorThrowing NanoScheduable
-- V3 + V4 algorithm for NanoScheduler. The newer version uses 1 dedicated input thread and n - 1 map/reduce threads. These MapReduceJobs perform map and a greedy reduce. The main thread's only job is to shuttle inputs from the input producer thread, enqueueing MapReduce jobs for each one. We manage the number of map jobs now via a Semaphore instead of a BlockingQueue of fixed size.
-- This new algorithm should consume N00% CPU power for -nct N value.
-- Also a cleaner implementation in general
-- Vastly expanded unit tests
-- Deleted FutureValue and ReduceThread
-- Now prints out a single combined NanoScheduler runtime profile report across all nano schedulers in use. So now if you run with -nt 4 you'll get one combined NanoScheduler profiler across all 4 instances of the NanoScheduler within TraverseXNano.
-- Basically you cannot safely use instance specific ThreadLocal variables, as these cannot be safely cleaned up. The old implementation kept pointers to old writers, with huge tribble block indexes, and eventually we crashed out of integration tests
-- See http://weblogs.java.net/blog/jjviana/archive/2010/06/10/threadlocal-thread-pool-bad-idea-or-dealing-apparent-glassfish-memor for more information
-- New implementation uses a borrow/return schedule with a list of N TraversalEngines managed by the MicroScheduler directly.
-- Can now say -nt 4 and -nct 4 to get 16 threads running for you!
-- TraversalEngines are now ThreadLocal variables in the MicroScheduler.
-- Misc. code cleanup, final variables, some contracts.
-- TraversalProgressMeter now completely generalized, named ProgressMeter in utils.progressmeter. Now just takes "nRecordsProcessed" as an argument to print reads. Completely removes dependence on complex data structures from TraversalProgressMeter. Can be used to measure progress on any task with processing units in genomic locations.
-- a fairly simple, class with no dependency on GATK engine or other features.
-- Currently only used by the TraversalEngine / MicroScheduler but could be used for any purpose now, really.
-- Previously these core progress metering functions were all in TraversalEngine, and available to subclasses like TraverseLoci via inheritance. The problem here is that the upcoming data threads x cpu threads parallelism requires one master copy of the progress metering shared among all traversals, but multiple instantiations of traverse engines themselves.
-- Because the progress metering code has horrible anyway, I've refactored and vastly cleaned up and simplified all of these capabilities into TraversalProgressMeter class. I've simplified down the classes it uses to work (STILL SOME TODOs in there) so that it doesn't reach into the core GATK engine all the time. It should be possible to write some nice tests for it now. By making it its own class, it can protect itself from multi-threaded access with a single synchronized printProgress function instead of carrying around multiple lock objects as before
-- Cleaned up the start up of the progress meter. It's now handled when the meter is created, so each micro scheduler doesn't have to deal with proper initialization timing any longer
-- Simplified and made clear the interface for shutting down the traversal engines. There's no a shutdown method in TraversalEngine that's called once by the MicroScheduler when the entire traversing in over. Nano traversals now properly shut down (was subtle bug I undercovered here). The printing of on traversal done metering is now handled by MicroScheduler
-- The MicroScheduler holds the single master copy of the progress meter, and doles it out to the TraversalEngines (currently 1 but in future commit there will be N).
-- Added a nice function to GenomeAnalysisEngine that returns the regions we will be processing, either the intervals requested or the whole genome. Useful for progress meter but also probably for other infrastructure as well
-- Remove a lot of the sh*ting Bean interface getting and setting in MicroScheduler that's no longer useful. The generic bean is just a shell interface with nothing in it.
-- By removing a lot of these bean accessors and setters many things are now final that used to be dynamic.
-Off by default; engine fork isolates new code paths from old code paths,
so no integration tests change yet
-Experimental implementation is currently BROKEN due to a serious issue
involving file spans. No one can/should use the experimental features
until I've patched this issue.
-There are temporarily two independent versions of LocusIteratorByState.
Anyone changing one version should port the change to the other (if possible),
and anyone adding unit tests for one version should add the same unit tests
for the other (again, if possible). This situation will hopefully be extremely
temporary, and last only until the experimental implementation is proven.
-- The NanoScheduler is doing a good job at tracking important information like time spent in map/reduce/input etc.
-- Can be disabled with static boolean in MicroScheduler if we have problems
-- See GSA-515 Nanoscheduler GSA-549 Retire TraverseReads and TraverseLoci after testing confirms nano scheduler version in single threaded version is fine
-- Closes GSA-515 Nanoscheduler GSA-542 Good interface to nanoScheduler
-- Old -nt means dataThreads
-- New -cnt (--num_cpu_threads_per_data_thread) gives you n cpu threads for each data thread in the system
-- Cleanup logic for handling data and cpu threading in HMS, LMS, and MS
-- GATKRunReport reports the total number of threads in use by the GATK, not just the nt value
-- Removed the io,cpu tags for nt. Stupid system if you ask me. Cleaned up the GenomeAnalysisEngine and ThreadAllocation handling to be totally straightforward now
-- Refactored TraverseLoci into old linear version and nano scheduling version
-- Temp. GATK argument to say how many nano threads to use
-- Can efficiently scale to 3 threads before blocking on input
-- A higher level interface to declare parallelism capability of a walker. This interface means that the walker can be multi-threaded, but doesn't necessarily support TreeReducible interface, which forces you to have a combine ReduceType operation that isn't appropriate for parallel read walkers
-- Updated ReadWalkers to implement ThreadSafeMapReduce not TreeReducible
-- GATKRunReports contain itemized information about the numThreads used to execute the GATK, as well as the efficiency of the use of those threads to get real work done, including time spent running, waiting, blocking, and waiting for IO
-- See https://jira.broadinstitute.org/browse/GSA-506 for more details
-- Invert logic in GATKArgumentCollection to disable monitoring, not enable. That means monitoring is on by default
-- Fix testing error in unit tests
-- Rename variables in ThreadAllocation to be clearer
-- Old version StateMonitoringThreadFactory refactored into base class ThreadEfficiencyMonitor and subclass EfficiencyMonitoringThreadFactory.
-- Base class is used by LinearMicroScheduler to monitor performance of GATK in single threaded mode
-- MicroScheduler now handles management of the efficiency monitor. Includes master thread in monitor, meaning that reduce is now included for both schedulers
-- See https://jira.broadinstitute.org/browse/GSA-502
-- New command line argument -mt enables thread monitoring
-- If enabled, HMS uses StateMonitoringThreadFactory to create monitored threads, and prints out an efficiency report when HMS exits, telling the user information like:
for BQSR – known to be inefficient locking
INFO 17:10:33,195 StateMonitoringThreadFactory - Number of activeThreads used: 8
INFO 17:10:33,196 StateMonitoringThreadFactory - Total runtime 90.3 m
INFO 17:10:33,196 StateMonitoringThreadFactory - Fraction of time spent blocked is 0.72 ( 64.8 m)
INFO 17:10:33,197 StateMonitoringThreadFactory - Fraction of time spent running is 0.26 ( 23.7 m)
INFO 17:10:33,197 StateMonitoringThreadFactory - Fraction of time spent waiting is 0.02 ( 112.8 s)
INFO 17:10:33,197 StateMonitoringThreadFactory - Efficiency of multi-threading: 26.19% of time spent doing productive work
for CountLoci
INFO 17:06:12,777 StateMonitoringThreadFactory - Number of activeThreads used: 8
INFO 17:06:12,777 StateMonitoringThreadFactory - Total runtime 43.5 m
INFO 17:06:12,778 StateMonitoringThreadFactory - Fraction of time spent blocked is 0.00 ( 4.2 s)
INFO 17:06:12,778 StateMonitoringThreadFactory - Fraction of time spent running is 1.00 ( 43.3 m)
INFO 17:06:12,779 StateMonitoringThreadFactory - Fraction of time spent waiting is 0.00 ( 6.0 s)
INFO 17:06:12,779 StateMonitoringThreadFactory - Efficiency of multi-threading: 99.61% of time spent doing productive work
-- Check if a traversal error occurred in the last shard
-- Catch ExecutionException from the TreeReducer and throw as our HMS execption
-- ShardTraverser just throws the exception as formatted by the HMS, rather than wrapping it as a RuntimeException itself
-- EngineFeaturesIntegrationTests now uses public exampleFASTA (faster), and does 1000x iterations (slower)
-- Better error message when a traveral error occurs (a real bug)
-- EngineFeaturesIntegrationTest runs the multi-threaded error testing routines 50x times
-- A bit of cleanup in WalkerTest
* Did not touch archived walkers... those can be named whatever.
* Kept abstract classes that end in Walker untouched (e.g. LocusWalker, ReadWalker, ...)
* Renamed a few inner classes due to conflict when stripping off Walker from their outer classes: ContigStats, FlagStats and FastaStats.
-- HMS no longer tries to grab and throw all exceptions. Exceptions are just thrown directly now.
-- Proper error handling is handled by functions in HMS, which are used by ShardTraverser and TreeReducer
-- Better printing of stack traces in WalkerTest
-- This behavior, which isn't obviously valuable at all, continued to grab and rethrow exceptions in the HMS that, if run without NT, would show up as more meaningful errors. Now HMS simply checks whether the throwable it received on error was a RuntimeException. If so, it is stored and rethrow without wrapping later. If it isn't, only in this case is the exception wrapped in a ReviewedStingException.
-- Added a QC walker ErrorThrowingWalker that will throw a UserException, ReviewedStingException, and NullPointerException from map as specified on the command line
-- Added IT that ensures that all three types are thrown properly (i.e., you catch a NullPointerException when you ask for one to be thrown) with and without threading enabled.
-- I believe this will finally put to rest all of these annoying HMS captures.