LGAIRODec 6, 2022

First Go, then Post-Explore: the Benefits of Post-Exploration in Intrinsic Motivation

arXiv:2212.03251v22 citationsh-index: 40
Originality Synthesis-oriented
AI Analysis

This work provides incremental evidence for a specific exploration technique in RL, potentially aiding researchers in improving sample efficiency in sparse-reward tasks.

The paper investigates the isolated benefits of post-exploration in intrinsically motivated goal exploration processes (IMGEP) for reinforcement learning, showing through ablation studies on MiniGrid and Mujoco environments that it helps agents reach more diverse states and boosts performance.

Go-Explore achieved breakthrough performance on challenging reinforcement learning (RL) tasks with sparse rewards. The key insight of Go-Explore was that successful exploration requires an agent to first return to an interesting state ('Go'), and only then explore into unknown terrain ('Explore'). We refer to such exploration after a goal is reached as 'post-exploration'. In this paper, we present a clear ablation study of post-exploration in a general intrinsically motivated goal exploration process (IMGEP) framework, that the Go-Explore paper did not show. We study the isolated potential of post-exploration, by turning it on and off within the same algorithm under both tabular and deep RL settings on both discrete navigation and continuous control tasks. Experiments on a range of MiniGrid and Mujoco environments show that post-exploration indeed helps IMGEP agents reach more diverse states and boosts their performance. In short, our work suggests that RL researchers should consider to use post-exploration in IMGEP when possible since it is effective, method-agnostic and easy to implement.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes