CVMay 24, 2023Code
EgoVSR: Towards High-Quality Egocentric Video Super-ResolutionYichen Chi, Junhao Gu, Jiamiao Zhang et al.
Due to the limitations of capture devices and scenarios, egocentric videos frequently have low visual quality, mainly caused by high compression and severe motion blur. With the increasing application of egocentric videos, there is an urgent need to enhance the quality of these videos through super-resolution. However, existing Video Super-Resolution (VSR) works, focusing on third-person view videos, are actually unsuitable for handling blurring artifacts caused by rapid ego-motion and object motion in egocentric videos. To this end, we propose EgoVSR, a VSR framework specifically designed for egocentric videos. We explicitly tackle motion blurs in egocentric videos using a Dual Branch Deblur Network (DB$^2$Net) in the VSR framework. Meanwhile, a blurring mask is introduced to guide the DB$^2$Net learning, and can be used to localize blurred areas in video frames. We also design a MaskNet to predict the mask, as well as a mask loss to optimize the mask estimation. Additionally, an online motion blur synthesis model for common VSR training data is proposed to simulate motion blurs as in egocentric videos. In order to validate the effectiveness of our proposed method, we introduce an EgoVSR dataset containing a large amount of fast-motion egocentric video sequences. Extensive experiments demonstrate that our EgoVSR model can efficiently super-resolve low-quality egocentric videos and outperform strong comparison baselines. Our code, pre-trained models and data can be found at https://github.com/chiyich/EGOVSR/.
CVOct 17, 2024
Improving Consistency in Diffusion Models for Image Super-ResolutionJunhao Gu, Peng-Tao Jiang, Hao Zhang et al.
Recent methods exploit the powerful text-to-image (T2I) diffusion models for real-world image super-resolution (Real-ISR) and achieve impressive results compared to previous models. However, we observe two kinds of inconsistencies in diffusion-based methods which hinder existing models from fully exploiting diffusion priors. The first is the semantic inconsistency arising from diffusion guidance. T2I generation focuses on semantic-level consistency with text prompts, while Real-ISR emphasizes pixel-level reconstruction from low-quality (LQ) images, necessitating more detailed semantic guidance from LQ inputs. The second is the training-inference inconsistency stemming from the DDPM, which improperly assumes high-quality (HQ) latent corrupted by Gaussian noise as denoising inputs for each timestep. To address these issues, we introduce ConsisSR to handle both semantic and training-inference consistencies. On the one hand, to address the semantic inconsistency, we proposed a Hybrid Prompt Adapter (HPA). Instead of text prompts with coarse-grained classification information, we leverage the more powerful CLIP image embeddings to explore additional color and texture guidance. On the other hand, we introduce Time-Aware Latent Augmentation (TALA) to bridge the training-inference inconsistency. Based on the probability function p(t), we accordingly enhance the SDSR training strategy. With LQ latent with Gaussian noise as inputs, our TALA not only focuses on diffusion noise but also refine the LQ latent towards the HQ counterpart. Our method demonstrates state-of-the-art performance among existing diffusion models. The code will be made publicly available.