TempoMaster: Efficient Long Video Generation via Next-Frame-Rate Prediction
This addresses the challenge of generating coherent and high-quality long videos efficiently, which is important for applications in media and entertainment, though it appears incremental as it builds on existing video generation methods.
The paper tackles the problem of long video generation by proposing TempoMaster, a framework that formulates it as next-frame-rate prediction, generating a low-frame-rate clip as a blueprint and refining it progressively, achieving state-of-the-art results in visual and temporal quality.
We present TempoMaster, a novel framework that formulates long video generation as next-frame-rate prediction. Specifically, we first generate a low-frame-rate clip that serves as a coarse blueprint of the entire video sequence, and then progressively increase the frame rate to refine visual details and motion continuity. During generation, TempoMaster employs bidirectional attention within each frame-rate level while performing autoregression across frame rates, thus achieving long-range temporal coherence while enabling efficient and parallel synthesis. Extensive experiments demonstrate that TempoMaster establishes a new state-of-the-art in long video generation, excelling in both visual and temporal quality.