LG AIFeb 4, 2021

Alchemy: A benchmark and analysis toolkit for meta-reinforcement learning agents

Jane X. Wang, Michael King, Nicolas Porcel, Zeb Kurth-Nelson, Tina Zhu, Charlie Deck, Peter Choy, Mary Cassin, Malcolm Reynolds, Francis Song, Gavin Buttimore, David P. Reichert

arXiv:2102.02926v322.040 citationsHas Code

Originality Incremental advance

AI Analysis

This benchmark addresses the scarcity of adequate and well-defined tasks for meta-RL research, providing a tool for researchers to analyze and develop more robust meta-RL agents.

This paper introduces Alchemy, a new 3D video game benchmark for meta-reinforcement learning (meta-RL) that features a procedurally resampled latent causal structure. The authors evaluated two powerful RL agents on Alchemy, revealing a specific failure of meta-learning, thus validating Alchemy as a challenging benchmark.

There has been rapidly growing interest in meta-learning as a method for increasing the flexibility and sample efficiency of reinforcement learning. One problem in this area of research, however, has been a scarcity of adequate benchmark tasks. In general, the structure underlying past benchmarks has either been too simple to be inherently interesting, or too ill-defined to support principled analysis. In the present work, we introduce a new benchmark for meta-RL research, emphasizing transparency and potential for in-depth analysis as well as structural richness. Alchemy is a 3D video game, implemented in Unity, which involves a latent causal structure that is resampled procedurally from episode to episode, affording structure learning, online inference, hypothesis testing and action sequencing based on abstract domain knowledge. We evaluate a pair of powerful RL agents on Alchemy and present an in-depth analysis of one of these agents. Results clearly indicate a frank and specific failure of meta-learning, providing validation for Alchemy as a challenging benchmark for meta-RL. Concurrent with this report, we are releasing Alchemy as public resource, together with a suite of analysis tools and sample agent trajectories.

View on arXiv PDF Code

Similar