CVAIJan 20

VideoMaMa: Mask-Guided Video Matting via Generative Prior

arXiv:2601.14255v1h-index: 21
Originality Incremental advance
AI Analysis

This work addresses the problem of scalable video matting for real-world applications by leveraging generative priors and pseudo-labeling, though it is incremental in building on existing segmentation and diffusion models.

The paper tackles the challenge of generalizing video matting models to real-world videos by introducing VideoMaMa, which uses pretrained video diffusion models to convert coarse masks into accurate alpha mattes, achieving strong zero-shot generalization with synthetic training. It also creates the MA-V dataset with over 50K annotated videos and fine-tunes SAM2 to outperform existing models on in-the-wild videos.

Generalizing video matting models to real-world videos remains a significant challenge due to the scarcity of labeled data. To address this, we present Video Mask-to-Matte Model (VideoMaMa) that converts coarse segmentation masks into pixel accurate alpha mattes, by leveraging pretrained video diffusion models. VideoMaMa demonstrates strong zero-shot generalization to real-world footage, even though it is trained solely on synthetic data. Building on this capability, we develop a scalable pseudo-labeling pipeline for large-scale video matting and construct the Matting Anything in Video (MA-V) dataset, which offers high-quality matting annotations for more than 50K real-world videos spanning diverse scenes and motions. To validate the effectiveness of this dataset, we fine-tune the SAM2 model on MA-V to obtain SAM2-Matte, which outperforms the same model trained on existing matting datasets in terms of robustness on in-the-wild videos. These findings emphasize the importance of large-scale pseudo-labeled video matting and showcase how generative priors and accessible segmentation cues can drive scalable progress in video matting research.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes