Bridging Fidelity-Reality with Controllable One-Step Diffusion for Image Super-Resolution
This work addresses image super-resolution for applications requiring high-quality visual outputs, representing an incremental improvement over existing one-step diffusion methods.
The paper tackled the problem of image super-resolution by addressing limitations in fidelity, generative prior activation, and text-prompt alignment in one-step diffusion methods, resulting in CODSR which achieves superior perceptual quality and competitive fidelity with efficient one-step inference.
Recent diffusion-based one-step methods have shown remarkable progress in the field of image super-resolution, yet they remain constrained by three critical limitations: (1) inferior fidelity performance caused by the information loss from compression encoding of low-quality (LQ) inputs; (2) insufficient region-discriminative activation of generative priors; (3) misalignment between text prompts and their corresponding semantic regions. To address these limitations, we propose CODSR, a controllable one-step diffusion network for image super-resolution. First, we propose an LQ-guided feature modulation module that leverages original uncompressed information from LQ inputs to provide high-fidelity conditioning for the diffusion process. We then develop a region-adaptive generative prior activation method to effectively enhance perceptual richness without sacrificing local structural fidelity. Finally, we employ a text-matching guidance strategy to fully harness the conditioning potential of text prompts. Extensive experiments demonstrate that CODSR achieves superior perceptual quality and competitive fidelity compared with state-of-the-art methods with efficient one-step inference.