LGAIMLJun 18, 2019

Directed Exploration for Reinforcement Learning

arXiv:1906.07805v115 citations
Originality Incremental advance
AI Analysis

This addresses a bottleneck in sample efficiency for RL tasks like robotic manipulation, though it appears incremental as it builds on existing uncertainty-based methods.

The paper tackles the inefficiency of non-stationary reward bonuses in reinforcement learning exploration by proposing directed exploration, which uses a goal-conditioned policy to directly target uncertain states, resulting in more efficient and robust exploration.

Efficient exploration is necessary to achieve good sample efficiency for reinforcement learning in general. From small, tabular settings such as gridworlds to large, continuous and sparse reward settings such as robotic object manipulation tasks, exploration through adding an uncertainty bonus to the reward function has been shown to be effective when the uncertainty is able to accurately drive exploration towards promising states. However reward bonuses can still be inefficient since they are non-stationary, which means that we must wait for function approximators to catch up and converge again when uncertainties change. We propose the idea of directed exploration, that is learning a goal-conditioned policy where goals are simply other states, and using that to directly try to reach states with large uncertainty. The goal-conditioned policy is independent of uncertainty and is thus stationary. We show in our experiments how directed exploration is more efficient at exploration and more robust to how the uncertainty is computed than adding bonuses to rewards.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes