CVDec 9, 2024

UniPaint: Unified Space-time Video Inpainting via Mixture-of-Experts

arXiv:2412.06340v29 citationsh-index: 2Has Code2025 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)
Originality Incremental advance
AI Analysis

This work addresses video editing challenges for content creators by offering a unified approach, though it appears incremental as it builds on existing personalized models with novel adaptations.

The authors tackled the problem of video inpainting and interpolation by proposing UniPaint, a unified framework that treats these tasks together, resulting in high-quality outputs and achieving the best quantitative results across various setups.

In this paper, we present UniPaint, a unified generative space-time video inpainting framework that enables spatial-temporal inpainting and interpolation. Different from existing methods that treat video inpainting and video interpolation as two distinct tasks, we leverage a unified inpainting framework to tackle them and observe that these two tasks can mutually enhance synthesis performance. Specifically, we first introduce a plug-and-play space-time video inpainting adapter, which can be employed in various personalized models. The key insight is to propose a Mixture of Experts (MoE) attention to cover various tasks. Then, we design a spatial-temporal masking strategy during the training stage to mutually enhance each other and improve performance. UniPaint produces high-quality and aesthetically pleasing results, achieving the best quantitative results across various tasks and scale setups. The code and checkpoints are available at $\href{https://github.com/mmmmm-w/UniPaint}{this \ repository}$.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes