LG AI NE MLJun 5, 2020

Rapid Task-Solving in Novel Environments

Sam Ritter, Ryan Faulkner, Laurent Sartran, Adam Santoro, Matt Botvinick, David Raposo

arXiv:2006.03662v316.231 citations

Originality Incremental advance

AI Analysis

This addresses the problem of efficient adaptation in AI agents for scenarios requiring quick learning in new environments, though it is incremental as it builds on existing deep RL methods.

The paper tackles the challenge of rapid task-solving in novel environments (RTS), where agents must quickly solve tasks in unfamiliar settings, and demonstrates that Episodic Planning Networks (EPNs) outperform baselines by factors of 2-3 and enable navigation of held-out maps in a single episode.

We propose the challenge of rapid task-solving in novel environments (RTS), wherein an agent must solve a series of tasks as rapidly as possible in an unfamiliar environment. An effective RTS agent must balance between exploring the unfamiliar environment and solving its current task, all while building a model of the new environment over which it can plan when faced with later tasks. While modern deep RL agents exhibit some of these abilities in isolation, none are suitable for the full RTS challenge. To enable progress toward RTS, we introduce two challenge domains: (1) a minimal RTS challenge called the Memory&Planning Game and (2) One-Shot StreetLearn Navigation, which introduces scale and complexity from real-world data. We demonstrate that state-of-the-art deep RL agents fail at RTS in both domains, and that this failure is due to an inability to plan over gathered knowledge. We develop Episodic Planning Networks (EPNs) and show that deep-RL agents with EPNs excel at RTS, outperforming the nearest baseline by factors of 2-3 and learning to navigate held-out StreetLearn maps within a single episode. We show that EPNs learn to execute a value iteration-like planning algorithm and that they generalize to situations beyond their training experience. algorithm and that they generalize to situations beyond their training experience.

View on arXiv PDF

Similar