CVMar 23, 2025

LongDiff: Training-Free Long Video Generation in One Go

arXiv:2503.18150v18 citationsh-index: 8CVPR
Originality Incremental advance
AI Analysis

This work addresses the challenge of long video generation for applications requiring extended sequences, though it is incremental as it builds on existing video diffusion models.

The paper tackled the problem of generating long videos with temporal consistency and visual detail by proposing LongDiff, a training-free method that uses Position Mapping and Informative Frame Selection to address temporal position ambiguity and information dilution, achieving high-quality results in one go.

Video diffusion models have recently achieved remarkable results in video generation. Despite their encouraging performance, most of these models are mainly designed and trained for short video generation, leading to challenges in maintaining temporal consistency and visual details in long video generation. In this paper, we propose LongDiff, a novel training-free method consisting of carefully designed components \ -- Position Mapping (PM) and Informative Frame Selection (IFS) \ -- to tackle two key challenges that hinder short-to-long video generation generalization: temporal position ambiguity and information dilution. Our LongDiff unlocks the potential of off-the-shelf video diffusion models to achieve high-quality long video generation in one go. Extensive experiments demonstrate the efficacy of our method.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes