LGAICVROMLMay 15, 2017

Curiosity-driven Exploration by Self-supervised Prediction

arXiv:1705.05363v12933 citationsHas Code
Originality Highly original
AI Analysis

This addresses the challenge of exploration in reinforcement learning for agents in high-dimensional, reward-sparse environments, offering a novel approach to intrinsic motivation.

The paper tackles the problem of sparse or absent extrinsic rewards in reinforcement learning by proposing curiosity-driven exploration using self-supervised prediction, resulting in fewer interactions needed to reach goals and more efficient exploration in environments like VizDoom and Super Mario Bros.

In many real-world scenarios, rewards extrinsic to the agent are extremely sparse, or absent altogether. In such cases, curiosity can serve as an intrinsic reward signal to enable the agent to explore its environment and learn skills that might be useful later in its life. We formulate curiosity as the error in an agent's ability to predict the consequence of its own actions in a visual feature space learned by a self-supervised inverse dynamics model. Our formulation scales to high-dimensional continuous state spaces like images, bypasses the difficulties of directly predicting pixels, and, critically, ignores the aspects of the environment that cannot affect the agent. The proposed approach is evaluated in two environments: VizDoom and Super Mario Bros. Three broad settings are investigated: 1) sparse extrinsic reward, where curiosity allows for far fewer interactions with the environment to reach the goal; 2) exploration with no extrinsic reward, where curiosity pushes the agent to explore more efficiently; and 3) generalization to unseen scenarios (e.g. new levels of the same game) where the knowledge gained from earlier experience helps the agent explore new places much faster than starting from scratch. Demo video and code available at https://pathak22.github.io/noreward-rl/

Code Implementations13 repos

Data from Papers with Code (CC-BY-SA-4.0)

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes