CVJan 6, 2025

Brick-Diffusion: Generating Long Videos with Brick-to-Wall Denoising

arXiv:2501.02741v14 citationsh-index: 11ICASSP
Originality Incremental advance
AI Analysis

This addresses the challenge of computational and data demands for long video generation in AI, offering a practical solution for applications like content creation, though it is incremental as it builds on existing diffusion models.

The paper tackles the problem of generating long videos with diffusion models, which typically require extensive training, by proposing a training-free method called Brick-Diffusion that uses a brick-to-wall denoising strategy to produce high-fidelity videos of arbitrary length, outperforming existing baselines.

Recent advances in diffusion models have greatly improved text-driven video generation. However, training models for long video generation demands significant computational power and extensive data, leading most video diffusion models to be limited to a small number of frames. Existing training-free methods that attempt to generate long videos using pre-trained short video diffusion models often struggle with issues such as insufficient motion dynamics and degraded video fidelity. In this paper, we present Brick-Diffusion, a novel, training-free approach capable of generating long videos of arbitrary length. Our method introduces a brick-to-wall denoising strategy, where the latent is denoised in segments, with a stride applied in subsequent iterations. This process mimics the construction of a staggered brick wall, where each brick represents a denoised segment, enabling communication between frames and improving overall video quality. Through quantitative and qualitative evaluations, we demonstrate that Brick-Diffusion outperforms existing baseline methods in generating high-fidelity videos.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes