Java Send Bytecode Traces

[Footprint] [Two-level predictor] [Global cache hit rate]

Abstract

In this project, we instrumented the Kaffe VM to generate the bytecode trace files. The VM runs with interpreter (not JIT). The full traces were separated to different sub traces: NEW, LOAD, and SEND. We characterized the behaviour of full and sub traces by different methods.

In this page, I present the experiment result of SEND subtraces by applying Two-Level Predictor, Inline Cache, and Global Cache. The graphs include the Footprints of traces, the results of Two-Level Predictor for EV prediction, and the results of Global Cache hit rate for EV.


Introduction

In this part of experiment, we are trying to characterize the behaviour of polymophic calls in Java. There are 3 bytecodes that may have polymophic calls: For each bytecode trace, we record down IC, PC, IW, EA, and EV. The SEND bytecode usually has the following format:

invokevirtual(special, or interface)
indexbyte1
indexbyte2
count (for invokeinterface)
0 (for invokeinterface)

The operand stack is

     ..., objectref, [arg1,[arg2...]]

The declare type of Object is the objectref's class type. The implmentation class type is the class where the invoked method is implemented.

By looking at footprint and prediction rate, we can get brief impression about Java polymophic calls.

Six SPEC benchmarks are used to collect trace files,

The first 2M bytecode traces are written down for each benchmark. Before any other attempts, we should verify the correctness of trace file. By checking the IW field in the range of 0x00-0xc9, we know the format of each instruction trace is right.

The rest of expirements are done based on the Plumber framework developped in LAC, McGill University.


Footprint

TOP

The footprint measures the different bytecode trace chunk happens in a period of execution. In this experiment, we measure the different full chunk numbers in each 20,000 instructions. The graph merged six benchmarks' footprints togethor.

Here is the data file and ps file.

By the graph, we know that the first 8K bytecodes of each benchmark are almost the same. This is reasonable because usually the VM needs to do some initialization before goes into the application class. This factor happens in the full trace and other sub traces.

Also, we know that after some program point, the footprint curves go into stable state. Usually the program is doing something repeatedly, often in a loop.


Two-level Predictor prediction rate

[compress]  [db]  [jack]  [javac]  [mtrt]  [raytraceTOP

The second part of experiments are to apply the Two-Lvel Predictor on the trace file and observe the prediction rates of different configurations. The Two-Level Predictor has basic configuration with a history buffer and an ideal table as second level predictor.

Each benchmark has a prediction rate graph. The key consists of history buffer of EVs concatenated by current PC; the target is current EV. We got prediction rate with different path length, 0, 1, 2, 4, 8, 16.

The fomula is

     EV(t-n)...EV(t-1).PC(t) -> EV(t)

The ideal table is a hashtable which has unlimited size and fully associativity. When the path length is 0, the key is current PC, then the prediction becomes PC(t) -> EV(t). The predictor has same effect as Inline Cache.

Some observations

In some areas, the prediction rates were ZERO. This does not mean that the prediction rates are really 0. It results from no SEND bytecode happening in those areas.

As path length increasing, the prediction rate gets decreaing. It is strange result because usually the longer path length can give better prediction rate. They may have bugs in the trace file for SEND bytecode. It needs investigation.


Global Cache Hit Rate

[compress]  [db]  [jack]  [javac]  [mtrt]  [raytraceTOP

The Global Cache is simulated by a direct-mapped table which is configured with different table size. The key is current EA; the target is current EV. We used following table sizes, 4, 16, 64, 256, 1K, and 4K. So the hit rate differences result from the table sizes.

Some Observations

By the graph, we know the hit rate is increasing as the table size increasing. Global Cache has a cold start period. The 256 entries is usually most cost-effective.


Future work

Limited by the time, the Plumber framework is done with basic functions. I would like to continue this part of work in the summer. Mostly the work will focus on the debugging and documentation. Also, for various bytecode instructions, we need carious examinations of the correctness.


Author : Feng Qian
Last Modification : May 12th, 2000