SPIRAL: Self-Evolving Action-Conditioned Video Generation via Reflective Planning Agents
This work addresses the problem of temporal coherence and action execution in long-horizon video generation, offering a novel closed-loop approach for the video generation community.
SPIRAL introduces a closed-loop framework for long-horizon action-conditioned video generation, using reflective planning agents to iteratively refine video segments, achieving consistent gains on ActVideoGen-Bench and VBench across multiple backbones.
Long-horizon action-conditioned video generation aims to synthesize temporally coherent videos that follow complex action instructions over extended horizons, requiring procedural ordering, persistent action execution, and scene consistency beyond conventional TI2V's short-term fidelity. Existing single-shot video generation models typically operate in an open-loop manner, leading to incomplete action execution, hallucinated motions, and temporal drift. To address this, we propose SPIRAL, a closed-loop framework that performs sequential planning and iterative reflection for action-conditioned long-horizon video generation. Specifically, SPIRAL instantiates a think-act-reflect process: a PlanAgent decomposes high-level goals into sub-actions, which condition a VideoGenerator to synthesize each segment alongside a memory context, while a CriticAgent evaluates intermediate video segments to provide corrective feedback for iterative refinement. This closed-loop design further supports self-evolution by utilizing PlanAgent-proposed actions and CriticAgent-derived rewards for GRPO-based post-training to enhance the video generator's long-horizon consistency. Moreover, we introduce ActVideoGen-Dataset for task-specific training, and establish ActVideoGen-Bench as a dedicated evaluation suite for measuring action quality and temporal coherence. Experiments across multiple TI2V backbones alongside the self-evolving strategy show consistent gains on ActVideoGen-Bench and VBench, demonstrating the effectiveness of SPIRAL.