LG SYJan 5, 2023

Data-Driven Inverse Reinforcement Learning for Expert-Learner Zero-Sum Games

Wenqian Xue, Bosen Lian, Jialu Fan, Tianyou Chai, Frank L. Lewis

arXiv:2301.01997v12.01 citationsh-index: 127

Originality Incremental advance

AI Analysis

This addresses the challenge of robust IRL in adversarial environments for robotics or autonomous systems, representing an incremental extension of existing IRL methods.

The paper tackles the problem of inverse reinforcement learning (IRL) in adversarial settings by formulating it as a zero-sum game between an expert and a learner, where the learner aims to reconstruct the expert's cost function while rejecting non-cooperative disturbances. It develops a framework and off-policy algorithm that does not require knowledge of agent dynamics, with simulation experiments demonstrating effectiveness.

In this paper, we formulate inverse reinforcement learning (IRL) as an expert-learner interaction whereby the optimal performance intent of an expert or target agent is unknown to a learner agent. The learner observes the states and controls of the expert and hence seeks to reconstruct the expert's cost function intent and thus mimics the expert's optimal response. Next, we add non-cooperative disturbances that seek to disrupt the learning and stability of the learner agent. This leads to the formulation of a new interaction we call zero-sum game IRL. We develop a framework to solve the zero-sum game IRL problem that is a modified extension of RL policy iteration (PI) to allow unknown expert performance intentions to be computed and non-cooperative disturbances to be rejected. The framework has two parts: a value function and control action update based on an extension of PI, and a cost function update based on standard inverse optimal control. Then, we eventually develop an off-policy IRL algorithm that does not require knowledge of the expert and learner agent dynamics and performs single-loop learning. Rigorous proofs and analyses are given. Finally, simulation experiments are presented to show the effectiveness of the new approach.

View on arXiv PDF

Similar