CVNov 27, 2024

HoliSDiP: Image Super-Resolution via Holistic Semantics and Diffusion Prior

arXiv:2411.18662v19 citationsh-index: 6
Originality Incremental advance
AI Analysis

This work addresses the issue of unreliable outputs in diffusion-based super-resolution for real-world images, offering a solution that enhances control and quality, though it appears incremental as it builds on existing diffusion priors.

The paper tackles the problem of unintended results in real-world image super-resolution due to noisy text prompts and lack of spatial information in diffusion models, and it presents HoliSDiP, which uses semantic segmentation for precise textual and spatial guidance, achieving significant improvement in image quality across various scenarios.

Text-to-image diffusion models have emerged as powerful priors for real-world image super-resolution (Real-ISR). However, existing methods may produce unintended results due to noisy text prompts and their lack of spatial information. In this paper, we present HoliSDiP, a framework that leverages semantic segmentation to provide both precise textual and spatial guidance for diffusion-based Real-ISR. Our method employs semantic labels as concise text prompts while introducing dense semantic guidance through segmentation masks and our proposed Segmentation-CLIP Map. Extensive experiments demonstrate that HoliSDiP achieves significant improvement in image quality across various Real-ISR scenarios through reduced prompt noise and enhanced spatial control.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes