CVApr 15

Blind Bitstream-corrupted Video Recovery via Metadata-guided Diffusion Model

arXiv:2604.1390619.72 citationsh-index: 6Has Code
Predicted impact top 37% in CV · last 90 daysOriginality Incremental advance
AI Analysis

For video recovery tasks, this work addresses the practical limitation of requiring manual mask annotations by enabling blind recovery without predefined masks.

This paper introduces a blind video recovery setting that removes the need for predefined corruption masks, and proposes a Metadata-Guided Diffusion Model (M-GDM) that leverages video metadata to identify and recover corrupted regions. The method achieves superior performance in blind bitstream-corrupted video recovery.

Bitstream-corrupted video recovery aims to restore realistic content degraded during video storage or transmission. Existing methods typically assume that predefined masks of corrupted regions are available, but manually annotating these masks is labor-intensive and impractical in real-world scenarios. To address this limitation, we introduce a new blind video recovery setting that removes the reliance on predefined masks. This setting presents two major challenges: accurately identifying corrupted regions and recovering content from extensive and irregular degradations. We propose a Metadata-Guided Diffusion Model (M-GDM) to tackle these challenges. Specifically, intrinsic video metadata are leveraged as corruption indicators through a dual-stream metadata encoder that separately embeds motion vectors and frame types before fusing them into a unified representation. This representation interacts with corrupted latent features via cross-attention at each diffusion step. To preserve intact regions, we design a prior-driven mask predictor that generates pseudo masks using both metadata and diffusion priors, enabling the separation and recombination of intact and recovered regions through hard masking. To mitigate boundary artifacts caused by imperfect masks, a post-refinement module enhances consistency between intact and recovered regions. Extensive experiments demonstrate the effectiveness of our method and its superiority in blind video recovery. Code is available at: https://github.com/Shuyun-Wang/M-GDM.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes