CVFeb 9

WorldCompass: Reinforcement Learning for Long-Horizon World Models

arXiv:2602.09022v110 citationsh-index: 7Has Code
Originality Incremental advance
AI Analysis

This work addresses the challenge of making world models more reliable for interactive applications, representing an incremental advancement in reinforcement learning for video generation.

The paper tackles the problem of improving long-horizon, interactive video-based world models by introducing WorldCompass, a reinforcement learning post-training framework that enhances exploration accuracy and consistency using interaction signals, resulting in significant improvements in interaction accuracy and visual fidelity across various scenarios.

This work presents WorldCompass, a novel Reinforcement Learning (RL) post-training framework for the long-horizon, interactive video-based world models, enabling them to explore the world more accurately and consistently based on interaction signals. To effectively "steer" the world model's exploration, we introduce three core innovations tailored to the autoregressive video generation paradigm: 1) Clip-level rollout Strategy: We generate and evaluate multiple samples at a single target clip, which significantly boosts rollout efficiency and provides fine-grained reward signals. 2) Complementary Reward Functions: We design reward functions for both interaction-following accuracy and visual quality, which provide direct supervision and effectively suppress reward-hacking behaviors. 3) Efficient RL Algorithm: We employ the negative-aware fine-tuning strategy coupled with various efficiency optimizations to efficiently and effectively enhance model capacity. Evaluations on the SoTA open-source world model, WorldPlay, demonstrate that WorldCompass significantly improves interaction accuracy and visual fidelity across various scenarios.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes