AIJun 1, 2018

Fast Exploration with Simplified Models and Approximately Optimistic Planning in Model Based Reinforcement Learning

arXiv:1806.00175v254 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of slow learning in reinforcement learning for video games, offering a domain-specific improvement.

The paper tackled the problem of sample inefficiency in reinforcement learning by developing a method for strategic exploration with simplified models and optimistic planning, resulting in a novel algorithm that outperforms state-of-the-art algorithms in the game of Pitfall! in less than 50 episodes.

Humans learn to play video games significantly faster than the state-of-the-art reinforcement learning (RL) algorithms. People seem to build simple models that are easy to learn to support planning and strategic exploration. Inspired by this, we investigate two issues in leveraging model-based RL for sample efficiency. First we investigate how to perform strategic exploration when exact planning is not feasible and empirically show that optimistic Monte Carlo Tree Search outperforms posterior sampling methods. Second we show how to learn simple deterministic models to support fast learning using object representation. We illustrate the benefit of these ideas by introducing a novel algorithm, Strategic Object Oriented Reinforcement Learning (SOORL), that outperforms state-of-the-art algorithms in the game of Pitfall! in less than 50 episodes.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes