AI LGSep 30, 2021

Is Policy Learning Overrated?: Width-Based Planning and Active Learning for Atari

arXiv:2109.15310v24.51 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses sample efficiency for AI agents in complex environments like Atari, offering a novel approach that could reduce training costs, though it is incremental in the context of width-based planning.

The paper tackles the problem of improving sample efficiency in Atari games by proposing Olive, an online width-based planning method that updates features using active learning, outperforming existing methods like Rollout-IW, VAE-IW, and policy-learning approaches by margins such as 42-to-11 and 30-to-22 with significantly fewer interactions.

Width-based planning has shown promising results on Atari 2600 games using pixel input, while using substantially fewer environment interactions than reinforcement learning. Recent width-based approaches have computed feature vectors for each screen using a hand designed feature set or a variational autoencoder trained on game screens (VAE-IW), and prune screens that do not have novel features during the search. We propose Olive (Online-VAE-IW), which updates the VAE features online using active learning to maximize the utility of screens observed during planning. Experimental results in 55 Atari games demonstrate that it outperforms Rollout-IW by 42-to-11 and VAE-IW by 32-to-20. Moreover, Olive outperforms existing work based on policy-learning ($π$-IW, DQN) trained with 100x training budget by 30-to-22 and 31-to-17, and a state of the art data-efficient reinforcement learning (EfficientZero) trained with the same training budget and ran with 1.8x planning budget by 18-to-7 in Atari 100k benchmark, with no policy learning at all. The source code is available at github.com/ibm/atari-active-learning .

View on arXiv PDF Code

Similar