CVJun 27, 2025

OutDreamer: Video Outpainting with a Diffusion Transformer

arXiv:2506.22298v13 citationsh-index: 15
Originality Incremental advance
AI Analysis

This addresses the problem of generating consistent extended video content for applications like video editing, though it appears incremental as an adaptation of diffusion transformers to a specific task.

The paper tackles video outpainting by proposing OutDreamer, a diffusion transformer-based framework that generates new video content beyond original boundaries, achieving state-of-the-art performance in zero-shot evaluations on benchmarks.

Video outpainting is a challenging task that generates new video content by extending beyond the boundaries of an original input video, requiring both temporal and spatial consistency. Many state-of-the-art methods utilize latent diffusion models with U-Net backbones but still struggle to achieve high quality and adaptability in generated content. Diffusion transformers (DiTs) have emerged as a promising alternative because of their superior performance. We introduce OutDreamer, a DiT-based video outpainting framework comprising two main components: an efficient video control branch and a conditional outpainting branch. The efficient video control branch effectively extracts masked video information, while the conditional outpainting branch generates missing content based on these extracted conditions. Additionally, we propose a mask-driven self-attention layer that dynamically integrates the given mask information, further enhancing the model's adaptability to outpainting tasks. Furthermore, we introduce a latent alignment loss to maintain overall consistency both within and between frames. For long video outpainting, we employ a cross-video-clip refiner to iteratively generate missing content, ensuring temporal consistency across video clips. Extensive evaluations demonstrate that our zero-shot OutDreamer outperforms state-of-the-art zero-shot methods on widely recognized benchmarks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes