LGAIMLMay 23, 2018

When Simple Exploration is Sample Efficient: Identifying Sufficient Conditions for Random Exploration to Yield PAC RL Algorithms

arXiv:1805.09045v420 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of understanding exploration strategies in RL for researchers and practitioners, but it is incremental as it builds on existing methods to analyze specific conditions.

The paper tackles the challenge of efficient exploration in reinforcement learning by analyzing when simple random exploration strategies, like $e$-greedy, can achieve sample efficiency, providing problem-specific sample complexity bounds for $Q$-learning with random walk exploration and linking these theoretical results to empirical benchmarks.

Efficient exploration is one of the key challenges for reinforcement learning (RL) algorithms. Most traditional sample efficiency bounds require strategic exploration. Recently many deep RL algorithms with simple heuristic exploration strategies that have few formal guarantees, achieve surprising success in many domains. These results pose an important question about understanding these exploration strategies such as $e$-greedy, as well as understanding what characterize the difficulty of exploration in MDPs. In this work we propose problem specific sample complexity bounds of $Q$ learning with random walk exploration that rely on several structural properties. We also link our theoretical results to some empirical benchmark domains, to illustrate if our bound gives polynomial sample complexity in these domains and how that is related with the empirical performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes