LG AI MLJul 18, 2023

Scaling Laws for Imitation Learning in Single-Agent Games

Jens Tuyls, Dhruv Madeka, Kari Torkkola, Dean Foster, Karthik Narasimhan, Sham Kakade

Amazon

arXiv:2307.09423v313.09 citationsh-index: 96Has Code

Originality Incremental advance

AI Analysis

This work addresses the problem of limited performance in imitation learning for AI researchers, showing incremental improvements through scaling in challenging environments like NetHack.

The paper investigates whether scaling up model and data size improves imitation learning in single-agent games, finding that loss and return scale smoothly with compute budget, leading to power laws and agents outperforming prior state-of-the-art by 1.5x in NetHack.

Imitation Learning (IL) is one of the most widely used methods in machine learning. Yet, many works find it is often unable to fully recover the underlying expert behavior, even in constrained environments like single-agent games. However, none of these works deeply investigate the role of scaling up the model and data size. Inspired by recent work in Natural Language Processing (NLP) where "scaling up" has resulted in increasingly more capable LLMs, we investigate whether carefully scaling up model and data size can bring similar improvements in the imitation learning setting for single-agent games. We first demonstrate our findings on a variety of Atari games, and thereafter focus on the extremely challenging game of NetHack. In all games, we find that IL loss and mean return scale smoothly with the compute budget (FLOPs) and are strongly correlated, resulting in power laws for training compute-optimal IL agents. Finally, we forecast and train several NetHack agents with IL and find they outperform prior state-of-the-art by 1.5x in all settings. Our work both demonstrates the scaling behavior of imitation learning in a variety of single-agent games, as well as the viability of scaling up current approaches for increasingly capable agents in NetHack, a game that remains elusively hard for current AI systems.

View on arXiv PDF Code

Similar