RO AIFeb 5

Coupled Local and Global World Models for Efficient First Order RL

Joseph Amigo, Rooholla Khorrambakht, Nicolas Mansard, Ludovic Righetti

arXiv:2602.06219v12.2h-index: 5

Originality Incremental advance

AI Analysis

This work addresses the problem of inefficient RL training in hard-to-model tasks like manipulation for robotics, offering a novel approach that bypasses simulators, though it appears incremental in its method innovation.

The paper tackles the challenge of training RL policies in complex environments without simulators by introducing a method that couples local and global world models, achieving significant improvements in sample efficiency on manipulation tasks like Push-T compared to PPO.

World models offer a promising avenue for more faithfully capturing complex dynamics, including contacts and non-rigidity, as well as complex sensory information, such as visual perception, in situations where standard simulators struggle. However, these models are computationally complex to evaluate, posing a challenge for popular RL approaches that have been successfully used with simulators to solve complex locomotion tasks but yet struggle with manipulation. This paper introduces a method that bypasses simulators entirely, training RL policies inside world models learned from robots' interactions with real environments. At its core, our approach enables policy training with large-scale diffusion models via a novel decoupled first-order gradient (FoG) method: a full-scale world model generates accurate forward trajectories, while a lightweight latent-space surrogate approximates its local dynamics for efficient gradient computation. This coupling of a local and global world model ensures high-fidelity unrolling alongside computationally tractable differentiation. We demonstrate the efficacy of our method on the Push-T manipulation task, where it significantly outperforms PPO in sample efficiency. We further evaluate our approach through an ego-centric object manipulation task with a quadruped. Together, these results demonstrate that learning inside data-driven world models is a promising pathway for solving hard-to-model RL tasks in image space without reliance on hand-crafted physics simulators.

View on arXiv PDF

Similar