AILGFeb 23, 2025

OptionZero: Planning with Learned Options

arXiv:2502.16634v32 citationsh-index: 7Has CodeICLR
Originality Highly original
AI Analysis

This work addresses the challenge of efficient planning in complex environments for reinforcement learning practitioners, representing an incremental advance by building on MuZero with novel modifications for option learning.

The paper tackles the problem of planning with learned options in reinforcement learning by proposing OptionZero, which integrates an option network into MuZero for autonomous option discovery through self-play, resulting in a 131.58% improvement in mean human-normalized score over MuZero in 26 Atari games.

Planning with options -- a sequence of primitive actions -- has been shown effective in reinforcement learning within complex environments. Previous studies have focused on planning with predefined options or learned options through expert demonstration data. Inspired by MuZero, which learns superhuman heuristics without any human knowledge, we propose a novel approach, named OptionZero. OptionZero incorporates an option network into MuZero, providing autonomous discovery of options through self-play games. Furthermore, we modify the dynamics network to provide environment transitions when using options, allowing searching deeper under the same simulation constraints. Empirical experiments conducted in 26 Atari games demonstrate that OptionZero outperforms MuZero, achieving a 131.58% improvement in mean human-normalized score. Our behavior analysis shows that OptionZero not only learns options but also acquires strategic skills tailored to different game characteristics. Our findings show promising directions for discovering and using options in planning. Our code is available at https://rlg.iis.sinica.edu.tw/papers/optionzero.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes