AISep 29, 2020

Reannealing of Decaying Exploration Based On Heuristic Measure in Deep Q-Network

arXiv:2009.14297v1
Originality Synthesis-oriented
AI Analysis

This addresses exploration inefficiencies in reinforcement learning for practitioners, but it appears incremental as it builds on existing methods with a simple heuristic-based adjustment.

The paper tackles the problem of inefficient exploration in reinforcement learning by proposing a reannealing-based algorithm that encourages exploration only when needed, such as when the agent is stuck in a local optimum, and shows potential to accelerate training and achieve better policies in an illustrative case study.

Existing exploration strategies in reinforcement learning (RL) often either ignore the history or feedback of search, or are complicated to implement. There is also a very limited literature showing their effectiveness over diverse domains. We propose an algorithm based on the idea of reannealing, that aims at encouraging exploration only when it is needed, for example, when the algorithm detects that the agent is stuck in a local optimum. The approach is simple to implement. We perform an illustrative case study showing that it has potential to both accelerate training and obtain a better policy.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes