LG AINov 10, 2025

Enabling Off-Policy Imitation Learning with Deep Actor Critic Stabilization

arXiv:2511.07288v14.1

Originality Incremental advance

AI Analysis

This addresses sample inefficiency for imitation learning practitioners, but it is incremental as it builds on existing adversarial methods.

The paper tackles the problem of sample inefficiency in imitation learning by introducing an off-policy adversarial algorithm with stabilization techniques, resulting in a reduction in samples needed to match expert behavior.

Learning complex policies with Reinforcement Learning (RL) is often hindered by instability and slow convergence, a problem exacerbated by the difficulty of reward engineering. Imitation Learning (IL) from expert demonstrations bypasses this reliance on rewards. However, state-of-the-art IL methods, exemplified by Generative Adversarial Imitation Learning (GAIL)Ho et. al, suffer from severe sample inefficiency. This is a direct consequence of their foundational on-policy algorithms, such as TRPO Schulman et.al. In this work, we introduce an adversarial imitation learning algorithm that incorporates off-policy learning to improve sample efficiency. By combining an off-policy framework with auxiliary techniques specifically, double Q network based stabilization and value learning without reward function inference we demonstrate a reduction in the samples required to robustly match expert behavior.

View on arXiv PDF

Similar