LGAIMay 25, 2016

Learning Purposeful Behaviour in the Absence of Rewards

arXiv:1605.07700v132 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of enabling AI agents to learn and explore effectively in environments with no or very sparse rewards, which is incremental as it builds on existing reinforcement learning frameworks.

The paper tackles the problem of agents learning without reward signals by developing an algorithm that constructs intrinsic goals from 'just out of reach' purposes, resulting in purposeful behaviors that encourage exploration of the state space, particularly in sparse-reward settings.

Artificial intelligence is commonly defined as the ability to achieve goals in the world. In the reinforcement learning framework, goals are encoded as reward functions that guide agent behaviour, and the sum of observed rewards provide a notion of progress. However, some domains have no such reward signal, or have a reward signal so sparse as to appear absent. Without reward feedback, agent behaviour is typically random, often dithering aimlessly and lacking intentionality. In this paper we present an algorithm capable of learning purposeful behaviour in the absence of rewards. The algorithm proceeds by constructing temporally extended actions (options), through the identification of purposes that are "just out of reach" of the agent's current behaviour. These purposes establish intrinsic goals for the agent to learn, ultimately resulting in a suite of behaviours that encourage the agent to visit different parts of the state space. Moreover, the approach is particularly suited for settings where rewards are very sparse, and such behaviours can help in the exploration of the environment until reward is observed.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes