LGJun 15, 2024

UniZero: Generalized and Efficient Planning with Scalable Latent World Models

Yuan Pu, Yazhe Niu, Zhenjie Yang, Jiyuan Ren, Hongsheng Li, Yu Liu

arXiv:2406.10667v215.713 citationsHas Code

Originality Highly original

AI Analysis

This addresses the problem of inefficient planning in RL for researchers and practitioners, offering a scalable solution for complex, variable tasks, though it is incremental as it builds on MuZero-style methods.

The paper tackles the challenge of scaling world model-based planning in reinforcement learning to heterogeneous scenarios with diverse dependencies and task variability, introducing UniZero which significantly outperforms existing baselines in benchmarks requiring long-term memory and demonstrates superior scalability in multitask learning on Atari benchmarks.

Learning predictive world models is crucial for enhancing the planning capabilities of reinforcement learning (RL) agents. Recently, MuZero-style algorithms, leveraging the value equivalence principle and Monte Carlo Tree Search (MCTS), have achieved superhuman performance in various domains. However, these methods struggle to scale in heterogeneous scenarios with diverse dependencies and task variability. To overcome these limitations, we introduce UniZero, a novel approach that employs a modular transformer-based world model to effectively learn a shared latent space. By concurrently predicting latent dynamics and decision-oriented quantities conditioned on the learned latent history, UniZero enables joint optimization of the long-horizon world model and policy, facilitating broader and more efficient planning in the latent space. We show that UniZero significantly outperforms existing baselines in benchmarks that require long-term memory. Additionally, UniZero demonstrates superior scalability in multitask learning experiments conducted on Atari benchmarks. In standard single-task RL settings, such as Atari and DMControl, UniZero matches or even surpasses the performance of current state-of-the-art methods. Finally, extensive ablation studies and visual analyses validate the effectiveness and scalability of UniZero's design choices. Our code is available at \textcolor{magenta}{https://github.com/opendilab/LightZero}.

View on arXiv PDF Code

Similar