LG MLJan 10, 2025

Robustness in the Face of Partial Identifiability in Reward Learning

arXiv:2501.06376v24.11 citationsh-index: 15

Originality Incremental advance

AI Analysis

This work addresses the identifiability issue in reward learning for applications such as planning, offering a principled robust approach, though it appears incremental as it builds on existing reward learning frameworks.

The paper tackles the problem of reward learning when feedback is insufficient to uniquely identify the target reward, which can lead to failures in downstream applications like planning, by introducing a robust framework that maximizes performance against the worst-case reward in the feasible set, with theoretical guarantees on sample and iteration complexity provided for their algorithm Rob-ReL.

In Reward Learning (ReL), we are given feedback on an unknown target reward, and the goal is to use this information to recover it in order to carry out some downstream application, e.g., planning. When the feedback is not informative enough, the target reward is only partially identifiable, i.e., there exists a set of rewards, called the feasible set, that are equally plausible candidates for the target reward. In these cases, the ReL algorithm might recover a reward function different from the target reward, possibly leading to a failure in the application. In this paper, we introduce a general ReL framework that permits to quantify the drop in "performance" suffered in the considered application because of identifiability issues. Building on this, we propose a robust approach to address the identifiability problem in a principled way, by maximizing the "performance" with respect to the worst-case reward in the feasible set. We then develop Rob-ReL, a ReL algorithm that applies this robust approach to the subset of ReL problems aimed at assessing a preference between two policies, and we provide theoretical guarantees on sample and iteration complexity for Rob-ReL. We conclude with a proof-of-concept experiment to illustrate the considered setting.

View on arXiv PDF

Similar