AIJul 22, 2023

On the Expressivity of Multidimensional Markov Reward

arXiv:2307.12184v16.75 citationsh-index: 5

Originality Incremental advance

AI Analysis

This work addresses the expressivity of reward functions for specifying agent behaviors in sequential decision-making, which is foundational for reinforcement learning and AI safety, but it appears incremental as it builds on existing MDP theory.

The paper tackles the problem of whether scalar or multidimensional Markov reward functions can characterize a given set of acceptable policies in Markov Decision Processes, establishing necessary and sufficient conditions for their existence and proving that multidimensional rewards can characterize any non-degenerate set of deterministic policies.

We consider the expressivity of Markov rewards in sequential decision making under uncertainty. We view reward functions in Markov Decision Processes (MDPs) as a means to characterize desired behaviors of agents. Assuming desired behaviors are specified as a set of acceptable policies, we investigate if there exists a scalar or multidimensional Markov reward function that makes the policies in the set more desirable than the other policies. Our main result states both necessary and sufficient conditions for the existence of such reward functions. We also show that for every non-degenerate set of deterministic policies, there exists a multidimensional Markov reward function that characterizes it

View on arXiv PDF

Similar