LGJan 14, 2025

Reward Compatibility: A Framework for Inverse RL

Filippo Lazzati, Mirco Mutti, Alberto Metelli

arXiv:2501.07996v17.11 citationsh-index: 15

Originality Incremental advance

AI Analysis

This work addresses the challenge of learning rewards from demonstrations in IRL, offering a more nuanced theoretical framework that could improve efficiency in applications like robotics and autonomous systems, though it appears incremental by building on existing IRL concepts.

The paper tackles the problem of Inverse Reinforcement Learning (IRL) by introducing a reward compatibility framework to quantify how well a reward aligns with expert demonstrations, generalizing the binary feasible reward set. It provides tractable algorithms with sample complexity analysis for settings like optimal/suboptimal demonstrations and online/offline data, extending provably efficient IRL to large-scale MDPs.

We provide an original theoretical study of Inverse Reinforcement Learning (IRL) through the lens of reward compatibility, a novel framework to quantify the compatibility of a reward with the given expert's demonstrations. Intuitively, a reward is more compatible with the demonstrations the closer the performance of the expert's policy computed with that reward is to the optimal performance for that reward. This generalizes the notion of feasible reward set, the most common framework in the theoretical IRL literature, for which a reward is either compatible or not compatible. The grayscale introduced by the reward compatibility is the key to extend the realm of provably efficient IRL far beyond what is attainable with the feasible reward set: from tabular to large-scale MDPs. We analyze the IRL problem across various settings, including optimal and suboptimal expert's demonstrations and both online and offline data collection. For all of these dimensions, we provide a tractable algorithm and corresponding sample complexity analysis, as well as various insights on reward compatibility and how the framework can pave the way to yet more general problem settings.

View on arXiv PDF

Similar