Enhanced Semantic Extraction and Guidance for UGC Image Super Resolution
This work addresses the challenge of improving image quality for user-generated content, which is incremental by combining existing techniques like diffusion models and semantic extraction for a specific domain.
The paper tackles the problem of super-resolution for user-generated content images, which suffer from real-world degradations that differ from synthetic ones, by integrating semantic guidance into a diffusion framework and simulating degradations on the LSDIR dataset. The method achieved second place in the CVPR NTIRE 2025 challenge, demonstrating superiority over state-of-the-art approaches.
Due to the disparity between real-world degradations in user-generated content(UGC) images and synthetic degradations, traditional super-resolution methods struggle to generalize effectively, necessitating a more robust approach to model real-world distortions. In this paper, we propose a novel approach to UGC image super-resolution by integrating semantic guidance into a diffusion framework. Our method addresses the inconsistency between degradations in wild and synthetic datasets by separately simulating the degradation processes on the LSDIR dataset and combining them with the official paired training set. Furthermore, we enhance degradation removal and detail generation by incorporating a pretrained semantic extraction model (SAM2) and fine-tuning key hyperparameters for improved perceptual fidelity. Extensive experiments demonstrate the superiority of our approach against state-of-the-art methods. Additionally, the proposed model won second place in the CVPR NTIRE 2025 Short-form UGC Image Super-Resolution Challenge, further validating its effectiveness. The code is available at https://github.c10pom/Moonsofang/NTIRE-2025-SRlab.