LG AIAug 21, 2023

Stabilizing Unsupervised Environment Design with a Learned Adversary

Ishita Mediratta, Minqi Jiang, Jack Parker-Holder, Michael Dennis, Eugene Vinitsky, Tim Rocktäschel

BerkeleyOxford

arXiv:2308.10797v216.020 citationsh-index: 25Has Code

Originality Incremental advance

AI Analysis

This work improves UED for reinforcement learning by enabling learned task generation, potentially advancing general agent training, though it is incremental over prior methods.

The paper tackled the challenge of training robust agents in Unsupervised Environment Design (UED) by addressing shortcomings in the PAIRED method, resulting in performance that matches or exceeds state-of-the-art methods in procedurally-generated environments like maze navigation and car racing.

A key challenge in training generally-capable agents is the design of training tasks that facilitate broad generalization and robustness to environment variations. This challenge motivates the problem setting of Unsupervised Environment Design (UED), whereby a student agent trains on an adaptive distribution of tasks proposed by a teacher agent. A pioneering approach for UED is PAIRED, which uses reinforcement learning (RL) to train a teacher policy to design tasks from scratch, making it possible to directly generate tasks that are adapted to the agent's current capabilities. Despite its strong theoretical backing, PAIRED suffers from a variety of challenges that hinder its practical performance. Thus, state-of-the-art methods currently rely on curation and mutation rather than generation of new tasks. In this work, we investigate several key shortcomings of PAIRED and propose solutions for each shortcoming. As a result, we make it possible for PAIRED to match or exceed state-of-the-art methods, producing robust agents in several established challenging procedurally-generated environments, including a partially-observed maze navigation task and a continuous-control car racing environment. We believe this work motivates a renewed emphasis on UED methods based on learned models that directly generate challenging environments, potentially unlocking more open-ended RL training and, as a result, more general agents.

View on arXiv PDF Code

Similar