LG AI MLMay 6, 2020

Learning Adaptive Exploration Strategies in Dynamic Environments Through Informed Policy Regularization

Pierre-Alexandre Kamienny, Matteo Pirotta, Alessandro Lazaric, Thibault Lavril, Nicolas Usunier, Ludovic Denoyer

arXiv:2005.02934v115.321 citations

Originality Incremental advance

AI Analysis

This work addresses the challenge of learning adaptive strategies in changing tasks for reinforcement learning agents, offering a practical improvement over existing methods.

The paper tackles the problem of training RNN-based policies for adaptive exploration in dynamic environments, where traditional methods are slow and ineffective, by introducing a regularization technique that reduces sample complexity and enables efficient exploration-exploitation strategies.

We study the problem of learning exploration-exploitation strategies that effectively adapt to dynamic environments, where the task may change over time. While RNN-based policies could in principle represent such strategies, in practice their training time is prohibitive and the learning process often converges to poor solutions. In this paper, we consider the case where the agent has access to a description of the task (e.g., a task id or task parameters) at training time, but not at test time. We propose a novel algorithm that regularizes the training of an RNN-based policy using informed policies trained to maximize the reward in each task. This dramatically reduces the sample complexity of training RNN-based policies, without losing their representational power. As a result, our method learns exploration strategies that efficiently balance between gathering information about the unknown and changing task and maximizing the reward over time. We test the performance of our algorithm in a variety of environments where tasks may vary within each episode.

View on arXiv PDF

Similar