ReFrame: Layer Caching for Accelerated Inference in Real-Time Rendering
This work addresses performance bottlenecks for real-time graphics applications, but it is incremental as it extends caching ideas from diffusion models to rendering.
The paper tackled the problem of reducing latency in neural network-based real-time rendering tasks by reusing intermediate results from previous frames, achieving an average 1.4x speedup with negligible quality loss.
Graphics rendering applications increasingly leverage neural networks in tasks such as denoising, supersampling, and frame extrapolation to improve image quality while maintaining frame rates. The temporal coherence inherent in these tasks presents an opportunity to reuse intermediate results from previous frames and avoid redundant computations. Recent work has shown that caching intermediate features to be reused in subsequent inferences is an effective method to reduce latency in diffusion models. We extend this idea to real-time rendering and present ReFrame, which explores different caching policies to optimize trade-offs between quality and performance in rendering workloads. ReFrame can be applied to a variety of encoder-decoder style networks commonly found in rendering pipelines. Experimental results show that we achieve 1.4x speedup on average with negligible quality loss in three real-time rendering tasks. Code available: https://ubc-aamodt-group.github.io/reframe-layer-caching/