CVApr 11, 2025

ZipIR: Latent Pyramid Diffusion Transformer for High-Resolution Image Restoration

Yongsheng Yu, Haitian Zheng, Zhifei Zhang, Jianming Zhang, Yuqian Zhou, Connelly Barnes, Yuchen Liu, Wei Xiong, Zhe Lin, Jiebo Luo

arXiv:2504.08591v18.42 citationsh-index: 28

Originality Highly original

AI Analysis

This work addresses the trade-off between quality and efficiency in deploying diffusion models for high-resolution image restoration, offering a solution for applications requiring fast processing of degraded images.

The paper tackles the computational inefficiency of diffusion models for high-resolution image restoration by introducing ZipIR, a framework that uses a compressed latent representation to reduce spatial tokens by 32x, enabling faster and higher-quality restoration up to 2K resolution compared to existing methods.

Recent progress in generative models has significantly improved image restoration capabilities, particularly through powerful diffusion models that offer remarkable recovery of semantic details and local fidelity. However, deploying these models at ultra-high resolutions faces a critical trade-off between quality and efficiency due to the computational demands of long-range attention mechanisms. To address this, we introduce ZipIR, a novel framework that enhances efficiency, scalability, and long-range modeling for high-res image restoration. ZipIR employs a highly compressed latent representation that compresses image 32x, effectively reducing the number of spatial tokens, and enabling the use of high-capacity models like the Diffusion Transformer (DiT). Toward this goal, we propose a Latent Pyramid VAE (LP-VAE) design that structures the latent space into sub-bands to ease diffusion training. Trained on full images up to 2K resolution, ZipIR surpasses existing diffusion-based methods, offering unmatched speed and quality in restoring high-resolution images from severely degraded inputs.

View on arXiv PDF

Similar