AIHCJan 19, 2021

Choice Set Misspecification in Reward Inference

arXiv:2101.07691v120 citations
Originality Incremental advance
AI Analysis

This addresses a critical issue in robotics for enabling safe and effective human-robot interaction by improving reward inference from feedback, though it is incremental as it builds on existing reward inference methods.

The paper tackles the problem of robots incorrectly inferring reward functions from human feedback due to misspecified choice sets, showing that while some misspecifications have neutral effects, others can cause the robot to infer the opposite of the intended reward, leading to harmful performance.

Specifying reward functions for robots that operate in environments without a natural reward signal can be challenging, and incorrectly specified rewards can incentivise degenerate or dangerous behavior. A promising alternative to manually specifying reward functions is to enable robots to infer them from human feedback, like demonstrations or corrections. To interpret this feedback, robots treat as approximately optimal a choice the person makes from a choice set, like the set of possible trajectories they could have demonstrated or possible corrections they could have made. In this work, we introduce the idea that the choice set itself might be difficult to specify, and analyze choice set misspecification: what happens as the robot makes incorrect assumptions about the set of choices from which the human selects their feedback. We propose a classification of different kinds of choice set misspecification, and show that these different classes lead to meaningful differences in the inferred reward and resulting performance. While we would normally expect misspecification to hurt, we find that certain kinds of misspecification are neither helpful nor harmful (in expectation). However, in other situations, misspecification can be extremely harmful, leading the robot to believe the opposite of what it should believe. We hope our results will allow for better prediction and response to the effects of misspecification in real-world reward inference.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes