CVAIJun 19, 2025

FastInit: Fast Noise Initialization for Temporally Consistent Video Generation

arXiv:2506.16119v11 citationsh-index: 8
Originality Incremental advance
AI Analysis

This work addresses the problem of computational inefficiency in video generation for AI researchers and practitioners, offering a practical, incremental improvement over existing methods.

The paper tackles the challenge of achieving high temporal consistency in video generation by introducing FastInit, a fast noise initialization method that eliminates iterative refinement and improves efficiency, resulting in consistent quality improvements across various text-to-video models.

Video generation has made significant strides with the development of diffusion models; however, achieving high temporal consistency remains a challenging task. Recently, FreeInit identified a training-inference gap and introduced a method to iteratively refine the initial noise during inference. However, iterative refinement significantly increases the computational cost associated with video generation. In this paper, we introduce FastInit, a fast noise initialization method that eliminates the need for iterative refinement. FastInit learns a Video Noise Prediction Network (VNPNet) that takes random noise and a text prompt as input, generating refined noise in a single forward pass. Therefore, FastInit greatly enhances the efficiency of video generation while achieving high temporal consistency across frames. To train the VNPNet, we create a large-scale dataset consisting of pairs of text prompts, random noise, and refined noise. Extensive experiments with various text-to-video models show that our method consistently improves the quality and temporal consistency of the generated videos. FastInit not only provides a substantial improvement in video generation but also offers a practical solution that can be applied directly during inference. The code and dataset will be released.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes