LGAINov 26, 2025

Hybrid-AIRL: Enhancing Inverse Reinforcement Learning with Supervised Expert Guidance

arXiv:2511.21356v1h-index: 1
Originality Incremental advance
AI Analysis

This addresses the sparse reward problem in reinforcement learning for domains like imperfect-information games, but it is incremental as it builds on existing AIRL methods.

The paper tackled the problem of Adversarial Inverse Reinforcement Learning (AIRL) struggling with sparse, delayed rewards in complex settings like poker, by proposing Hybrid-AIRL (H-AIRL), which incorporates supervised expert guidance and regularization to achieve higher sample efficiency and more stable learning compared to AIRL.

Adversarial Inverse Reinforcement Learning (AIRL) has shown promise in addressing the sparse reward problem in reinforcement learning (RL) by inferring dense reward functions from expert demonstrations. However, its performance in highly complex, imperfect-information settings remains largely unexplored. To explore this gap, we evaluate AIRL in the context of Heads-Up Limit Hold'em (HULHE) poker, a domain characterized by sparse, delayed rewards and significant uncertainty. In this setting, we find that AIRL struggles to infer a sufficiently informative reward function. To overcome this limitation, we contribute Hybrid-AIRL (H-AIRL), an extension that enhances reward inference and policy learning by incorporating a supervised loss derived from expert data and a stochastic regularization mechanism. We evaluate H-AIRL on a carefully selected set of Gymnasium benchmarks and the HULHE poker setting. Additionally, we analyze the learned reward function through visualization to gain deeper insights into the learning process. Our experimental results show that H-AIRL achieves higher sample efficiency and more stable learning compared to AIRL. This highlights the benefits of incorporating supervised signals into inverse RL and establishes H-AIRL as a promising framework for tackling challenging, real-world settings.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes