LGMLJun 5, 2018

Mix&Match - Agent Curricula for Reinforcement Learning

arXiv:1806.01780v183 citations
Originality Incremental advance
AI Analysis

This addresses the problem of slow or difficult training in RL for researchers and practitioners, offering a novel curriculum approach that is incremental in method but broad in application.

The paper tackles the challenge of training complex reinforcement learning agents by introducing Mix&Match, a framework that automatically creates curricula over agents to bootstrap from simpler ones, resulting in faster training and better final performance, such as controlling over 700 actions in a 3D task.

We introduce Mix&Match (M&M) - a training framework designed to facilitate rapid and effective learning in RL agents, especially those that would be too slow or too challenging to train otherwise. The key innovation is a procedure that allows us to automatically form a curriculum over agents. Through such a curriculum we can progressively train more complex agents by, effectively, bootstrapping from solutions found by simpler agents. In contradistinction to typical curriculum learning approaches, we do not gradually modify the tasks or environments presented, but instead use a process to gradually alter how the policy is represented internally. We show the broad applicability of our method by demonstrating significant performance gains in three different experimental setups: (1) We train an agent able to control more than 700 actions in a challenging 3D first-person task; using our method to progress through an action-space curriculum we achieve both faster training and better final performance than one obtains using traditional methods. (2) We further show that M&M can be used successfully to progress through a curriculum of architectural variants defining an agents internal state. (3) Finally, we illustrate how a variant of our method can be used to improve agent performance in a multitask setting.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes