CVNov 5, 2025

Unified Long Video Inpainting and Outpainting via Overlapping High-Order Co-Denoising

arXiv:2511.03272v11 citationsh-index: 2
Originality Incremental advance
AI Analysis

This work addresses the problem of long-range video editing for applications requiring seamless and controllable video generation, though it is incremental as it builds on existing diffusion models.

The paper tackles the challenge of generating long videos with high controllability in inpainting and outpainting by introducing a unified approach that extends text-to-video diffusion models, achieving superior performance in quality and perceptual realism over baselines like Wan 2.1 and VACE.

Generating long videos remains a fundamental challenge, and achieving high controllability in video inpainting and outpainting is particularly demanding. To address both of these challenges simultaneously and achieve controllable video inpainting and outpainting for long video clips, we introduce a novel and unified approach for long video inpainting and outpainting that extends text-to-video diffusion models to generate arbitrarily long, spatially edited videos with high fidelity. Our method leverages LoRA to efficiently fine-tune a large pre-trained video diffusion model like Alibaba's Wan 2.1 for masked region video synthesis, and employs an overlap-and-blend temporal co-denoising strategy with high-order solvers to maintain consistency across long sequences. In contrast to prior work that struggles with fixed-length clips or exhibits stitching artifacts, our system enables arbitrarily long video generation and editing without noticeable seams or drift. We validate our approach on challenging inpainting/outpainting tasks including editing or adding objects over hundreds of frames and demonstrate superior performance to baseline methods like Wan 2.1 model and VACE in terms of quality (PSNR/SSIM), and perceptual realism (LPIPS). Our method enables practical long-range video editing with minimal overhead, achieved a balance between parameter efficient and superior performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes