CVFeb 2

Causal Forcing: Autoregressive Diffusion Distillation Done Right for High-Quality Real-Time Interactive Video Generation

arXiv:2602.02214v150 citationsh-index: 34
Originality Incremental advance
AI Analysis

This work solves the issue of performance degradation in real-time video generation for applications requiring interactive content, representing an incremental advance over prior distillation methods.

The paper tackled the problem of real-time interactive video generation by addressing the architectural gap when distilling bidirectional video diffusion models into autoregressive models, proposing Causal Forcing to use an autoregressive teacher for ODE initialization, which outperformed baselines with improvements like 19.3% in Dynamic Degree and 16.7% in Instruction Following.

To achieve real-time interactive video generation, current methods distill pretrained bidirectional video diffusion models into few-step autoregressive (AR) models, facing an architectural gap when full attention is replaced by causal attention. However, existing approaches do not bridge this gap theoretically. They initialize the AR student via ODE distillation, which requires frame-level injectivity, where each noisy frame must map to a unique clean frame under the PF-ODE of an AR teacher. Distilling an AR student from a bidirectional teacher violates this condition, preventing recovery of the teacher's flow map and instead inducing a conditional-expectation solution, which degrades performance. To address this issue, we propose Causal Forcing that uses an AR teacher for ODE initialization, thereby bridging the architectural gap. Empirical results show that our method outperforms all baselines across all metrics, surpassing the SOTA Self Forcing by 19.3\% in Dynamic Degree, 8.7\% in VisionReward, and 16.7\% in Instruction Following. Project page and the code: \href{https://thu-ml.github.io/CausalForcing.github.io/}{https://thu-ml.github.io/CausalForcing.github.io/}

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes