Streaming-capable High-performance Architecture of Learned Image Compression Codecs
This work addresses performance bottlenecks for practical deployment of learned image compression in real-time applications like streaming, though it is incremental as it builds on existing codecs.
The paper tackles the slow runtime performance of learned image compression codecs by introducing a multi-threaded pipelining and optimized memory architecture that enables asynchronous GPU-CPU execution, achieving excellent throughput and latency improvements without modifying neural models, as demonstrated in a real-time video streaming application on an embedded device.
Learned image compression allows achieving state-of-the-art accuracy and compression ratios, but their relatively slow runtime performance limits their usage. While previous attempts on optimizing learned image codecs focused more on the neural model and entropy coding, we present an alternative method to improving the runtime performance of various learned image compression models. We introduce multi-threaded pipelining and an optimized memory model to enable GPU and CPU workloads asynchronous execution, fully taking advantage of computational resources. Our architecture alone already produces excellent performance without any change to the neural model itself. We also demonstrate that combining our architecture with previous tweaks to the neural models can further improve runtime performance. We show that our implementations excel in throughput and latency compared to the baseline and demonstrate the performance of our implementations by creating a real-time video streaming encoder-decoder sample application, with the encoder running on an embedded device.