Stencil algorithms are widely used in high performance applications
such as for partial differential equations and image processing.  They
usually keep updating one or multiple large arrays continuously. If
the arrays are larger than the cache size, performance will be
degraded significantly due to a large number of cache misses as a
result of the LRU cache replacement policy.

Our previous work showed that on-chip cache can be optimally managed
by a LRU variant with cache bypassing. The new optimal policy has
clear and simple hardware requirements. However, it needs simulating
OPT policy on a complete memory trace, which is time-consuming. In
this work, we show that regular cache bypassing patterns in simulation
results from a training run with a small input can be applied to a
real run with a large input. The proposed program-level approximately
optimal cache management (PACMAN) is implemented as a feedback-based
optimization and applied to several popular stencil workloads, in
which the total numbers of misses are significantly reduced.