LG AINov 11, 2024

Non-Adversarial Inverse Reinforcement Learning via Successor Feature Matching

Arnav Kumar Jain, Harley Wiltzer, Jesse Farebrother, Irina Rish, Glen Berseth, Sanjiban Choudhury

arXiv:2411.07007v215.79 citationsh-index: 8Has CodeICLR

Originality Incremental advance

AI Analysis

This addresses the problem of efficient and stable IRL for agents replicating expert demonstrations, particularly in state-only settings where behavior cloning fails, though it appears incremental as it builds on existing actor-critic algorithms.

The paper tackles the computational expense and instability of adversarial inverse reinforcement learning (IRL) by proposing a non-adversarial method that directly optimizes policies using successor feature matching, eliminating the need for reward function learning. It demonstrates the ability to learn from as few as one expert demonstration and achieves improved performance on control tasks.

In inverse reinforcement learning (IRL), an agent seeks to replicate expert demonstrations through interactions with the environment. Traditionally, IRL is treated as an adversarial game, where an adversary searches over reward models, and a learner optimizes the reward through repeated RL procedures. This game-solving approach is both computationally expensive and difficult to stabilize. In this work, we propose a novel approach to IRL by direct policy optimization: exploiting a linear factorization of the return as the inner product of successor features and a reward vector, we design an IRL algorithm by policy gradient descent on the gap between the learner and expert features. Our non-adversarial method does not require learning a reward function and can be solved seamlessly with existing actor-critic RL algorithms. Remarkably, our approach works in state-only settings without expert action labels, a setting which behavior cloning (BC) cannot solve. Empirical results demonstrate that our method learns from as few as a single expert demonstration and achieves improved performance on various control tasks.

View on arXiv PDF Code

Similar