CV LGJan 23

Reward-Forcing: Autoregressive Video Generation with Reward Feedback

Jingran Zhang, Ning Li, Yuanhao Ban, Andrew Bai, Justin Cui

arXiv:2601.16933v12.81 citationsh-index: 6

Originality Incremental advance

AI Analysis

This addresses the challenge of efficient and scalable video generation for applications requiring near real-time output, though it appears incremental relative to existing autoregressive approaches.

The paper tackles the problem of autoregressive video generation by using reward signals to guide the process, achieving comparable performance to state-of-the-art methods with a VBench score of 84.92.

While most prior work in video generation relies on bidirectional architectures, recent efforts have sought to adapt these models into autoregressive variants to support near real-time generation. However, such adaptations often depend heavily on teacher models, which can limit performance, particularly in the absence of a strong autoregressive teacher, resulting in output quality that typically lags behind their bidirectional counterparts. In this paper, we explore an alternative approach that uses reward signals to guide the generation process, enabling more efficient and scalable autoregressive generation. By using reward signals to guide the model, our method simplifies training while preserving high visual fidelity and temporal consistency. Through extensive experiments on standard benchmarks, we find that our approach performs comparably to existing autoregressive models and, in some cases, surpasses similarly sized bidirectional models by avoiding constraints imposed by teacher architectures. For example, on VBench, our method achieves a total score of 84.92, closely matching state-of-the-art autoregressive methods that score 84.31 but require significant heterogeneous distillation.

View on arXiv PDF

Similar