LGAIMar 31, 2023

Accelerating exploration and representation learning with offline pre-training

arXiv:2304.00046v18 citationsh-index: 78
Originality Incremental advance
AI Analysis

This work addresses sample efficiency in reinforcement learning for complex tasks like NetHack, though it is incremental as it builds on existing offline learning methods.

The paper tackles the challenge of long-horizon tasks in sequential decision-making by improving exploration and representation learning through offline pre-training, showing that separate models learned from human demonstrations significantly boost sample efficiency on the NetHack benchmark.

Sequential decision-making agents struggle with long horizon tasks, since solving them requires multi-step reasoning. Most reinforcement learning (RL) algorithms address this challenge by improved credit assignment, introducing memory capability, altering the agent's intrinsic motivation (i.e. exploration) or its worldview (i.e. knowledge representation). Many of these components could be learned from offline data. In this work, we follow the hypothesis that exploration and representation learning can be improved by separately learning two different models from a single offline dataset. We show that learning a state representation using noise-contrastive estimation and a model of auxiliary reward separately from a single collection of human demonstrations can significantly improve the sample efficiency on the challenging NetHack benchmark. We also ablate various components of our experimental setting and highlight crucial insights.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes