AIMay 25, 2017

Cross-Domain Perceptual Reward Functions

arXiv:1705.09045v32 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of flexible goal specification in reinforcement learning for AI agents, though it appears incremental as it builds on existing perceptual reward methods by extending them to cross-domain settings.

The paper tackles the problem of needing to manually redefine rewards for each new goal in reinforcement learning by introducing Cross-Domain Perceptual Reward (CDPR) functions, which represent goals as visual similarity between an agent's state and a cross-domain goal image, and reports results using deep neural networks to learn these rewards and solve two tasks with deep reinforcement learning.

In reinforcement learning, we often define goals by specifying rewards within desirable states. One problem with this approach is that we typically need to redefine the rewards each time the goal changes, which often requires some understanding of the solution in the agents environment. When humans are learning to complete tasks, we regularly utilize alternative sources that guide our understanding of the problem. Such task representations allow one to specify goals on their own terms, thus providing specifications that can be appropriately interpreted across various environments. This motivates our own work, in which we represent goals in environments that are different from the agents. We introduce Cross-Domain Perceptual Reward (CDPR) functions, learned rewards that represent the visual similarity between an agents state and a cross-domain goal image. We report results for learning the CDPRs with a deep neural network and using them to solve two tasks with deep reinforcement learning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes