Provably and Practically Efficient Adversarial Imitation Learning with General Function Approximation
This addresses the problem of inefficient theoretical guarantees in adversarial imitation learning for AI researchers, offering a practical and provably efficient solution.
The paper tackles the gap between theory and practice in adversarial imitation learning by introducing OPT-AIL, a method with provable efficiency under general function approximation, achieving polynomial sample and interaction complexities and outperforming prior deep AIL methods in challenging tasks.
As a prominent category of imitation learning methods, adversarial imitation learning (AIL) has garnered significant practical success powered by neural network approximation. However, existing theoretical studies on AIL are primarily limited to simplified scenarios such as tabular and linear function approximation and involve complex algorithmic designs that hinder practical implementation, highlighting a gap between theory and practice. In this paper, we explore the theoretical underpinnings of online AIL with general function approximation. We introduce a new method called optimization-based AIL (OPT-AIL), which centers on performing online optimization for reward functions and optimism-regularized Bellman error minimization for Q-value functions. Theoretically, we prove that OPT-AIL achieves polynomial expert sample complexity and interaction complexity for learning near-expert policies. To our best knowledge, OPT-AIL is the first provably efficient AIL method with general function approximation. Practically, OPT-AIL only requires the approximate optimization of two objectives, thereby facilitating practical implementation. Empirical studies demonstrate that OPT-AIL outperforms previous state-of-the-art deep AIL methods in several challenging tasks.