Inverse Decision Modeling: Learning Interpretable Representations of Behavior
This work addresses the problem of interpretable behavior modeling for decision analysis, offering a unifying perspective that opens up new research directions, though it appears incremental as it builds on existing imitation and reward learning.
The paper tackles the challenge of obtaining transparent descriptions of existing decision behavior by developing an expressive framework for learning parameterized representations of sequential decision behavior, formalizing both forward and inverse problems to generalize imitation and reward learning.
Decision analysis deals with modeling and enhancing decision processes. A principal challenge in improving behavior is in obtaining a transparent description of existing behavior in the first place. In this paper, we develop an expressive, unifying perspective on inverse decision modeling: a framework for learning parameterized representations of sequential decision behavior. First, we formalize the forward problem (as a normative standard), subsuming common classes of control behavior. Second, we use this to formalize the inverse problem (as a descriptive model), generalizing existing work on imitation/reward learning -- while opening up a much broader class of research problems in behavior representation. Finally, we instantiate this approach with an example (inverse bounded rational control), illustrating how this structure enables learning (interpretable) representations of (bounded) rationality -- while naturally capturing intuitive notions of suboptimal actions, biased beliefs, and imperfect knowledge of environments.