Concurrency and Synchronization MetricsLast updated: November 27, 2002
Optimizations focussing on multithreaded behaviour need to identify the appropriate opportunities. A basic requirement is to know whether a program does or can actually exhibit concurrent behaviour, or is it effectively single-threaded, executing one thread at a time. This affects the application of various optimization techniques, most obviously synchronization removal and lock design, but also the utility of other analyses that may be constrained by conservative assumptions in the presence of multithreaded execution (e.g., escape analysis).
Since the use of locks can have a large impact on performance in both single and multithreaded code, it is also useful to consider metrics that give more specific information on how locks are being used. A program, even a multithreaded one that does relatively little locking will obviously have a correspondingly reduced benefit from optimizations designed to reduce the cost of locking or number of locks acquired. Lock design and placement is also often predicated on knowing the amount of contention a lock experiences; this can also be exposed by appropriate metrics.
Identifying concurrent benchmarks involves determining whether more than one thread can be executing at the same time. This is not a simple quality to determine; certainly the number of threads started by an application is an upper bound on the amount of execution that can overlap or be concurrent, but the mere existence of multiple threads does not imply they can or will execute concurrently.
Ideally, one would run the program using a very large number of processors, so as many threads as possible could be scheduled simultaneously. Appropriate metrics would then include a single value such as maximum number of concurrently active threads, or a bin describing relative timings for discrete levels of concurrency. Unfortunately, with massive multiprocessor machines still uncommon this approach is somewhat infeasible in practice.
- concurrency.threadDensity.value: A less accurate, but more easily computable metric is to consider the maximum number of threads simultaneously in the ACTIVE or RUNNABLE states. These are threads that are either running, or at least capable of being run (but which are not currently scheduled). In this way we do not require as many processors as runnable threads. Unfortunately, this will certainly perturb results: two short-lived threads started serially by one thread may never overlap execution on a 3-processor; on a uniprocessor, however, scheduling may result in all three threads being runnable at the same time. Of course there is considerable variation already permitted by Java s thread scheduling model metrics in this area will necessarily have considerable variation.
- concurrency.threadDensity.bin: The amount of code executed while another thread is executing is also of interest; it gives an indication of how much concurrent execution exists. Without sufficient processors this is a difficult quantity to measure; again we resort to coarser approximations based on active and runnable threads. This quantity is then calculated as % of kbc executed while there are specified levels of concurrency: 1, 2, or more than 2 threads active or runnable.
Back to Top
An important criterion in optimizing lock usage is to know whether a program does a significant number of lock operations. This is quickly seen from the single value metric of an average number of lock operations per execution unit (bytecode, or time). Continuous versions of the same would enable one to see if the locking behaviour is concentrated in one section of the program, or is specific to particular program phases.
- concurrency.lockDensity.value: This single value metric gives average number of lock (monitorenter) operations per kbc. A value greater than 10 is expected to indicate a lock intensive program. A continuous version is also sensible.
- concurrency.lockContendedDensity.value: Adaptive locks can make use of knowing whether a lock will experience contention. This allows them to optimize behaviour for single-threaded access, but also to adapt to an optimized contended access behaviour if necessary. Similarly, lock removal or relocation strategies will be better if they have information on which locks are (perhaps just likely) high, low or no-contention locks. A simple metric relevant to these efforts is to try and measure the importance of contention; this can be a value giving the average number of contended lock entry operations per kbc.
- concurrency.lock.percentile: Locks may be amenable to hot spot optimization specific locks can be optimized for use by a certain number of threads, or code can be specialized to avoid locking. Whether high-use locks exist or not can be identified through a percentile metric, showing that a large percentage of lock operations are performed by a small percentage of locks; for our metric we define this as the percentage of locks responsible for 90% of lock operations. A similar metric for contended locks can also be defined:concurrency.lockContended.percentile.
- concurrency.lockContended.bin: Knowing the relative number of contended and uncontended locks is also important this allows us to determine if contention is concentrated in a few locks, or more evenly spread out. A bin metric can be used to classify locks and the relative number of lock operations; this metric will describe the relative percentage of lock requested while already held by 0, 1, or more than 1 thread.
Back to Top