AICLLGMLJun 12, 2020

How to Avoid Being Eaten by a Grue: Structured Exploration Strategies for Textual Worlds

arXiv:2006.07409v153 citations
AI Analysis

This addresses a key challenge in reinforcement learning for natural language environments, enabling more efficient agents for text-based games, though it is incremental as it builds on existing exploration methods.

The paper tackled the problem of exploration bottlenecks in text-based games, where agents struggle due to sparse rewards and large state-action spaces, and introduced Q*BERT and MC!Q*BERT agents that use knowledge graphs and intrinsic motivation to improve sample efficiency, outperforming state-of-the-art methods on nine games, including Zork, by overcoming the Grue bottleneck for the first time.

Text-based games are long puzzles or quests, characterized by a sequence of sparse and potentially deceptive rewards. They provide an ideal platform to develop agents that perceive and act upon the world using a combinatorially sized natural language state-action space. Standard Reinforcement Learning agents are poorly equipped to effectively explore such spaces and often struggle to overcome bottlenecks---states that agents are unable to pass through simply because they do not see the right action sequence enough times to be sufficiently reinforced. We introduce Q*BERT, an agent that learns to build a knowledge graph of the world by answering questions, which leads to greater sample efficiency. To overcome bottlenecks, we further introduce MC!Q*BERT an agent that uses an knowledge-graph-based intrinsic motivation to detect bottlenecks and a novel exploration strategy to efficiently learn a chain of policy modules to overcome them. We present an ablation study and results demonstrating how our method outperforms the current state-of-the-art on nine text games, including the popular game, Zork, where, for the first time, a learning agent gets past the bottleneck where the player is eaten by a Grue.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes