CVIVJul 17, 2023

Extreme Image Compression using Fine-tuned VQGANs

arXiv:2307.08265v333 citationsh-index: 14
Originality Incremental advance
AI Analysis

This work addresses the problem of achieving high-quality image compression at extreme low bitrates for applications like storage and transmission, representing an incremental improvement over existing generative compression methods.

The paper tackles extreme image compression at very low bitrates (<0.05 bpp) by proposing a VQGAN-based framework that clusters codebooks and uses a transformer for restoration, achieving state-of-the-art perceptual quality and human perception at ≤0.04 bpp and effectively restoring images with up to 20% index loss.

Recent advances in generative compression methods have demonstrated remarkable progress in enhancing the perceptual quality of compressed data, especially in scenarios with low bitrates. However, their efficacy and applicability to achieve extreme compression ratios ($<0.05$ bpp) remain constrained. In this work, we propose a simple yet effective coding framework by introducing vector quantization (VQ)--based generative models into the image compression domain. The main insight is that the codebook learned by the VQGAN model yields a strong expressive capacity, facilitating efficient compression of continuous information in the latent space while maintaining reconstruction quality. Specifically, an image can be represented as VQ-indices by finding the nearest codeword, which can be encoded using lossless compression methods into bitstreams. We propose clustering a pre-trained large-scale codebook into smaller codebooks through the K-means algorithm, yielding variable bitrates and different levels of reconstruction quality within the coding framework. Furthermore, we introduce a transformer to predict lost indices and restore images in unstable environments. Extensive qualitative and quantitative experiments on various benchmark datasets demonstrate that the proposed framework outperforms state-of-the-art codecs in terms of perceptual quality-oriented metrics and human perception at extremely low bitrates ($\le 0.04$ bpp). Remarkably, even with the loss of up to $20\%$ of indices, the images can be effectively restored with minimal perceptual loss.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes