HoliSDiP: Image Super-Resolution via Holistic Semantics and Diffusion Prior
This work addresses the issue of unreliable outputs in diffusion-based super-resolution for real-world images, offering a solution that enhances control and quality, though it appears incremental as it builds on existing diffusion priors.
The paper tackles the problem of unintended results in real-world image super-resolution due to noisy text prompts and lack of spatial information in diffusion models, and it presents HoliSDiP, which uses semantic segmentation for precise textual and spatial guidance, achieving significant improvement in image quality across various scenarios.
Text-to-image diffusion models have emerged as powerful priors for real-world image super-resolution (Real-ISR). However, existing methods may produce unintended results due to noisy text prompts and their lack of spatial information. In this paper, we present HoliSDiP, a framework that leverages semantic segmentation to provide both precise textual and spatial guidance for diffusion-based Real-ISR. Our method employs semantic labels as concise text prompts while introducing dense semantic guidance through segmentation masks and our proposed Segmentation-CLIP Map. Extensive experiments demonstrate that HoliSDiP achieves significant improvement in image quality across various Real-ISR scenarios through reduced prompt noise and enhanced spatial control.