AIJan 19, 2023

Generalization through Diversity: Improving Unsupervised Environment Design

Wenjun Li, Pradeep Varakantham, Dexun Li

arXiv:2301.08025v210.010 citationsh-index: 33

Originality Incremental advance

AI Analysis

This addresses robustness issues in RL for applications like robotics or gaming, but is incremental as it builds on existing unsupervised environment design methods.

The paper tackles the problem of RL agents failing on out-of-distribution test scenarios due to training on similar environments, by proposing a method to adaptively identify diverse environments using a novel distance measure, and empirically shows it outperforms leading approaches on three benchmarks.

Agent decision making using Reinforcement Learning (RL) heavily relies on either a model or simulator of the environment (e.g., moving in an 8x8 maze with three rooms, playing Chess on an 8x8 board). Due to this dependence, small changes in the environment (e.g., positions of obstacles in the maze, size of the board) can severely affect the effectiveness of the policy learned by the agent. To that end, existing work has proposed training RL agents on an adaptive curriculum of environments (generated automatically) to improve performance on out-of-distribution (OOD) test scenarios. Specifically, existing research has employed the potential for the agent to learn in an environment (captured using Generalized Advantage Estimation, GAE) as the key factor to select the next environment(s) to train the agent. However, such a mechanism can select similar environments (with a high potential to learn) thereby making agent training redundant on all but one of those environments. To that end, we provide a principled approach to adaptively identify diverse environments based on a novel distance measure relevant to environment design. We empirically demonstrate the versatility and effectiveness of our method in comparison to multiple leading approaches for unsupervised environment design on three distinct benchmark problems used in literature.

View on arXiv PDF

Similar