CVLGROMay 23, 2017

Visual Semantic Planning using Deep Successor Representations

arXiv:1705.08080v2144 citations
AI Analysis

It addresses the problem of enabling intelligent agents to plan effectively in dynamic visual environments, which is incremental as it builds on existing methods like reinforcement and imitation learning.

The paper tackles visual semantic planning by predicting action sequences from visual observations to transform environments from initial to goal states, achieving near-optimal results across a wide range of tasks in the THOR environment.

A crucial capability of real-world intelligent agents is their ability to plan a sequence of actions to achieve their goals in the visual world. In this work, we address the problem of visual semantic planning: the task of predicting a sequence of actions from visual observations that transform a dynamic environment from an initial state to a goal state. Doing so entails knowledge about objects and their affordances, as well as actions and their preconditions and effects. We propose learning these through interacting with a visual and dynamic environment. Our proposed solution involves bootstrapping reinforcement learning with imitation learning. To ensure cross task generalization, we develop a deep predictive model based on successor representations. Our experimental results show near optimal results across a wide range of tasks in the challenging THOR environment.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes