LGMLMay 2, 2023

Unlocking the Power of Representations in Long-term Novelty-based Exploration

arXiv:2305.01521v110 citations
Originality Highly original
AI Analysis

This addresses the challenge of efficient exploration in complex environments for reinforcement learning researchers and practitioners, representing a significant but incremental advance over prior methods.

The authors tackled the problem of long-term exploration in reinforcement learning by introducing RECODE, a non-parametric method that estimates state visitation counts via clustering in an embedding space, achieving new state-of-the-art results in DM-Hard-8 tasks and hard exploration Atari games, including being the first agent to complete "Pitfall!".

We introduce Robust Exploration via Clustering-based Online Density Estimation (RECODE), a non-parametric method for novelty-based exploration that estimates visitation counts for clusters of states based on their similarity in a chosen embedding space. By adapting classical clustering to the nonstationary setting of Deep RL, RECODE can efficiently track state visitation counts over thousands of episodes. We further propose a novel generalization of the inverse dynamics loss, which leverages masked transformer architectures for multi-step prediction; which in conjunction with RECODE achieves a new state-of-the-art in a suite of challenging 3D-exploration tasks in DM-Hard-8. RECODE also sets new state-of-the-art in hard exploration Atari games, and is the first agent to reach the end screen in "Pitfall!".

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes