NEAILGROFeb 5, 2021

Sparse Reward Exploration via Novelty Search and Emitters

arXiv:2102.03140v220 citations
AI Analysis

This work provides an incremental improvement for researchers and practitioners working on reinforcement learning in sparse reward settings.

This paper addresses the challenge of sparse reward environments by introducing SERENE, an algorithm that separates exploration and exploitation. It uses Novelty Search for exploration and local population-based optimization (emitters) for exploitation, demonstrating favorable performance compared to existing baselines on various sparse reward environments.

Reward-based optimization algorithms require both exploration, to find rewards, and exploitation, to maximize performance. The need for efficient exploration is even more significant in sparse reward settings, in which performance feedback is given sparingly, thus rendering it unsuitable for guiding the search process. In this work, we introduce the SparsE Reward Exploration via Novelty and Emitters (SERENE) algorithm, capable of efficiently exploring a search space, as well as optimizing rewards found in potentially disparate areas. Contrary to existing emitters-based approaches, SERENE separates the search space exploration and reward exploitation into two alternating processes. The first process performs exploration through Novelty Search, a divergent search algorithm. The second one exploits discovered reward areas through emitters, i.e. local instances of population-based optimization algorithms. A meta-scheduler allocates a global computational budget by alternating between the two processes, ensuring the discovery and efficient exploitation of disjoint reward areas. SERENE returns both a collection of diverse solutions covering the search space and a collection of high-performing solutions for each distinct reward area. We evaluate SERENE on various sparse reward environments and show it compares favorably to existing baselines.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes