LGAINEFeb 20, 2018

Meta-Reinforcement Learning of Structured Exploration Strategies

arXiv:1802.07245v1384 citations
Originality Incremental advance
AI Analysis

This addresses the problem of inefficient exploration in multi-task reinforcement learning for robotics applications, though it is incremental as it builds on existing meta-RL approaches.

The paper tackled the challenge of exploration in reinforcement learning by introducing a gradient-based fast adaptation algorithm, MAESN, which learns structured exploration strategies from prior tasks, showing it is more effective than prior methods on simulated locomotion and manipulation tasks.

Exploration is a fundamental challenge in reinforcement learning (RL). Many of the current exploration methods for deep RL use task-agnostic objectives, such as information gain or bonuses based on state visitation. However, many practical applications of RL involve learning more than a single task, and prior tasks can be used to inform how exploration should be performed in new tasks. In this work, we explore how prior tasks can inform an agent about how to explore effectively in new situations. We introduce a novel gradient-based fast adaptation algorithm -- model agnostic exploration with structured noise (MAESN) -- to learn exploration strategies from prior experience. The prior experience is used both to initialize a policy and to acquire a latent exploration space that can inject structured stochasticity into a policy, producing exploration strategies that are informed by prior knowledge and are more effective than random action-space noise. We show that MAESN is more effective at learning exploration strategies when compared to prior meta-RL methods, RL without learned exploration strategies, and task-agnostic exploration methods. We evaluate our method on a variety of simulated tasks: locomotion with a wheeled robot, locomotion with a quadrupedal walker, and object manipulation.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes