Invariance in Policy Optimisation and Partial Identifiability in Reward Learning
This addresses the challenge of designing reward functions for complex tasks, providing insights for selecting data sources in reward learning, though it is incremental as it builds on existing reward learning methods.
The paper tackles the problem of multiple reward functions fitting the same data in reward learning, formally characterizing partial identifiability for sources like expert demonstrations and analyzing its impact on tasks such as policy optimization, with a framework for comparing data sources by invariances.
It is often very challenging to manually design reward functions for complex, real-world tasks. To solve this, one can instead use reward learning to infer a reward function from data. However, there are often multiple reward functions that fit the data equally well, even in the infinite-data limit. This means that the reward function is only partially identifiable. In this work, we formally characterise the partial identifiability of the reward function given several popular reward learning data sources, including expert demonstrations and trajectory comparisons. We also analyse the impact of this partial identifiability for several downstream tasks, such as policy optimisation. We unify our results in a framework for comparing data sources and downstream tasks by their invariances, with implications for the design and selection of data sources for reward learning.