Discrete Prior-based Temporal-coherent Content Prediction for Blind Face Video Restoration
This addresses the problem of restoring high-fidelity details in face videos with unknown degradations for applications like video enhancement, though it appears incremental as it builds on prior methods with specific modules.
The paper tackles blind face video restoration by introducing DP-TempCoh, a transformer model that uses discrete priors to predict content and enhance temporal coherence, achieving superior performance on synthetically and naturally degraded videos.
Blind face video restoration aims to restore high-fidelity details from videos subjected to complex and unknown degradations. This task poses a significant challenge of managing temporal heterogeneity while at the same time maintaining stable face attributes. In this paper, we introduce a Discrete Prior-based Temporal-Coherent content prediction transformer to address the challenge, and our model is referred to as DP-TempCoh. Specifically, we incorporate a spatial-temporal-aware content prediction module to synthesize high-quality content from discrete visual priors, conditioned on degraded video tokens. To further enhance the temporal coherence of the predicted content, a motion statistics modulation module is designed to adjust the content, based on discrete motion priors in terms of cross-frame mean and variance. As a result, the statistics of the predicted content can match with that of real videos over time. By performing extensive experiments, we verify the effectiveness of the design elements and demonstrate the superior performance of our DP-TempCoh in both synthetically and naturally degraded video restoration.