LGMLJun 2, 2020

Temporally-Extended ε-Greedy Exploration

arXiv:2006.01782v139 citations
Originality Incremental advance
AI Analysis

This work addresses the exploration challenge in RL for practitioners by offering a simple yet effective method, though it is incremental as it builds on the well-known ε-greedy approach.

The paper tackles the problem of exploration in reinforcement learning by proposing a temporally extended ε-greedy algorithm that reduces dithering by repeating actions for random durations, which improves performance on a broad set of domains, with specific distributions inspired by animal foraging yielding strong results.

Recent work on exploration in reinforcement learning (RL) has led to a series of increasingly complex solutions to the problem. This increase in complexity often comes at the expense of generality. Recent empirical studies suggest that, when applied to a broader set of domains, some sophisticated exploration methods are outperformed by simpler counterparts, such as ε-greedy. In this paper we propose an exploration algorithm that retains the simplicity of ε-greedy while reducing dithering. We build on a simple hypothesis: the main limitation of ε-greedy exploration is its lack of temporal persistence, which limits its ability to escape local optima. We propose a temporally extended form of ε-greedy that simply repeats the sampled action for a random duration. It turns out that, for many duration distributions, this suffices to improve exploration on a large set of domains. Interestingly, a class of distributions inspired by ecological models of animal foraging behaviour yields particularly strong performance.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes