LG AINov 1, 2021

On the Expressivity of Markov Reward

David Abel, Will Dabney, Anna Harutyunyan, Mark K. Ho, Michael L. Littman, Doina Precup, Satinder Singh

arXiv:2111.00876v225.9100 citations

Originality Incremental advance

AI Analysis

This work addresses a foundational problem in reinforcement learning for researchers and practitioners by clarifying the limitations and capabilities of reward functions in task specification, though it is incremental in extending theoretical understanding.

The paper investigates the expressivity of Markov reward functions in capturing different abstract notions of tasks, such as sets of acceptable behaviors or partial orderings, and proves that while many tasks can be expressed, some cannot be captured by any Markov reward function, with algorithms provided to construct such functions when possible.

Reward is the driving force for reinforcement-learning agents. This paper is dedicated to understanding the expressivity of reward as a way to capture tasks that we would want an agent to perform. We frame this study around three new abstract notions of "task" that might be desirable: (1) a set of acceptable behaviors, (2) a partial ordering over behaviors, or (3) a partial ordering over trajectories. Our main results prove that while reward can express many of these tasks, there exist instances of each task type that no Markov reward function can capture. We then provide a set of polynomial-time algorithms that construct a Markov reward function that allows an agent to optimize tasks of each of these three types, and correctly determine when no such reward function exists. We conclude with an empirical study that corroborates and illustrates our theoretical findings.

View on arXiv PDF

Similar