C3: High-performance and low-complexity neural compression from a single image or video
This addresses the need for low-complexity neural compression for practical applications, though it builds incrementally on prior work like COOL-CHIC.
The paper tackles the problem of high decoding complexity in neural compression by introducing C3, a method that overfits small models to individual images or videos, achieving rate-distortion performance comparable to state-of-the-art codecs like VTM and Video Compression Transformer with decoding complexities under 3k and 5k MACs/pixel, respectively.
Most neural compression models are trained on large datasets of images or videos in order to generalize to unseen data. Such generalization typically requires large and expressive architectures with a high decoding complexity. Here we introduce C3, a neural compression method with strong rate-distortion (RD) performance that instead overfits a small model to each image or video separately. The resulting decoding complexity of C3 can be an order of magnitude lower than neural baselines with similar RD performance. C3 builds on COOL-CHIC (Ladune et al.) and makes several simple and effective improvements for images. We further develop new methodology to apply C3 to videos. On the CLIC2020 image benchmark, we match the RD performance of VTM, the reference implementation of the H.266 codec, with less than 3k MACs/pixel for decoding. On the UVG video benchmark, we match the RD performance of the Video Compression Transformer (Mentzer et al.), a well-established neural video codec, with less than 5k MACs/pixel for decoding.