Controllable Distortion-Perception Tradeoff Through Latent Diffusion for Neural Image Compression
This addresses the distortion-perception trade-off problem for neural image compression users, offering flexible adjustment during inference without retraining, though it builds incrementally on existing codecs.
The paper tackles the trade-off between distortion and perception in neural image compression by introducing a plug-and-play decoder module using latent diffusion to transform decoded features, achieving over 150% improvement in LPIPS-BDRate with less than 1 dB PSNR loss.
Neural image compression often faces a challenging trade-off among rate, distortion and perception. While most existing methods typically focus on either achieving high pixel-level fidelity or optimizing for perceptual metrics, we propose a novel approach that simultaneously addresses both aspects for a fixed neural image codec. Specifically, we introduce a plug-and-play module at the decoder side that leverages a latent diffusion process to transform the decoded features, enhancing either low distortion or high perceptual quality without altering the original image compression codec. Our approach facilitates fusion of original and transformed features without additional training, enabling users to flexibly adjust the balance between distortion and perception during inference. Extensive experimental results demonstrate that our method significantly enhances the pretrained codecs with a wide, adjustable distortion-perception range while maintaining their original compression capabilities. For instance, we can achieve more than 150% improvement in LPIPS-BDRate without sacrificing more than 1 dB in PSNR.