Towards image compression with perfect realism at ultra-low bitrates
This addresses the challenge of maintaining realistic image quality for applications requiring extreme compression, such as storage or transmission in bandwidth-limited environments, though it is incremental by building on diffusion models and rate-distortion-perception theory.
The authors tackled the problem of compression artefacts at ultra-low bitrates in image codecs by proposing PerCo, a model that uses iterative diffusion models for decoding, achieving state-of-the-art visual quality with FID and KID metrics at rates as low as 0.003 bits per pixel, compressing a 512x768 image to under 153 bytes.
Image codecs are typically optimized to trade-off bitrate \vs distortion metrics. At low bitrates, this leads to compression artefacts which are easily perceptible, even when training with perceptual or adversarial losses. To improve image quality and remove dependency on the bitrate, we propose to decode with iterative diffusion models. We condition the decoding process on a vector-quantized image representation, as well as a global image description to provide additional context. We dub our model PerCo for 'perceptual compression', and compare it to state-of-the-art codecs at rates from 0.1 down to 0.003 bits per pixel. The latter rate is more than an order of magnitude smaller than those considered in most prior work, compressing a 512x768 Kodak image with less than 153 bytes. Despite this ultra-low bitrate, our approach maintains the ability to reconstruct realistic images. We find that our model leads to reconstructions with state-of-the-art visual quality as measured by FID and KID. As predicted by rate-distortion-perception theory, visual quality is less dependent on the bitrate than previous methods.