CVJan 9

TAGRPO: Boosting GRPO on Image-to-Video Generation with Direct Trajectory Alignment

Jin Wang, Jianxiang Lu, Guangzheng Xu, Comi Chen, Haoyu Yang, Linqing Wang, Peng Chen, Mingtao Chen, Zhichao Hu, Longhuang Wu, Shuai Shao, Qinglin Lu

arXiv:2601.05729v16.06 citationsh-index: 4

Originality Incremental advance

AI Analysis

This work addresses a specific bottleneck in I2V generation for researchers and practitioners, offering an incremental improvement to existing methods.

The paper tackled inconsistent reward improvements when applying Group Relative Policy Optimization (GRPO) to image-to-video (I2V) generation by introducing TAGRPO, a post-training framework that uses a novel GRPO loss on intermediate latents for direct trajectory alignment, achieving significant improvements over DanceGRPO.

Recent studies have demonstrated the efficacy of integrating Group Relative Policy Optimization (GRPO) into flow matching models, particularly for text-to-image and text-to-video generation. However, we find that directly applying these techniques to image-to-video (I2V) models often fails to yield consistent reward improvements. To address this limitation, we present TAGRPO, a robust post-training framework for I2V models inspired by contrastive learning. Our approach is grounded in the observation that rollout videos generated from identical initial noise provide superior guidance for optimization. Leveraging this insight, we propose a novel GRPO loss applied to intermediate latents, encouraging direct alignment with high-reward trajectories while maximizing distance from low-reward counterparts. Furthermore, we introduce a memory bank for rollout videos to enhance diversity and reduce computational overhead. Despite its simplicity, TAGRPO achieves significant improvements over DanceGRPO in I2V generation.

View on arXiv PDF

Similar