Interactive Inverse Reinforcement Learning of Interaction Scenarios via Bi-level Optimization
For researchers in IRL and interactive learning, this work addresses the limitation of passive observation in IRL by enabling active interaction, though the approach is incremental.
This paper introduces interactive IRL (IIRL) where a learner actively interacts with an expert to infer the expert's reward function, formulated as a stochastic bi-level optimization problem. The proposed algorithm BISIRL converges and outperforms baselines in interactive scenarios.
Inverse reinforcement learning (IRL) learns a reward function and a corresponding policy that best fit the demonstration data of an expert. However, in the current IRL setting, the learner is isolated from the expert and can only passively observe the expert demonstrations. This limits the applicability of IRL to interactive settings, where the learner actively interacts with the expert and needs to infer the expert's reward function from the interactions. To bridge the gap, this paper studies interactive IRL (IIRL) where a learner aims to learn the reward function of an expert and a policy to interact with the expert during its interactions with the expert. We formulate IIRL as a stochastic bi-level optimization problem where the lower level learns a reward function to explain the behaviors of the expert, and the upper level learns a policy to interact with the expert. We develop a double-loop algorithm, Bi-level Interactive Scenarios Inverse Reinforcement Learning (BISIRL), which solves the lower-level problem in the inner loop and the upper-level problem in the outer loop. We formally guarantee that BISIRL converges and validate our algorithm through extensive experiments.