LGAIMLJun 8, 2025

Efficient $Q$-Learning and Actor-Critic Methods for Robust Average Reward Reinforcement Learning

arXiv:2506.07040v24 citationsh-index: 8
Originality Incremental advance
AI Analysis

This addresses robust policy learning in uncertain environments for reinforcement learning practitioners, representing an incremental advance with specific theoretical guarantees.

The paper tackles robust average-reward reinforcement learning under uncertainty sets by analyzing Q-learning and actor-critic methods, achieving sample complexities of Õ(ε⁻²) for learning ε-optimal robust policies.

We present a non-asymptotic convergence analysis of $Q$-learning and actor-critic algorithms for robust average-reward Markov Decision Processes (MDPs) under contamination, total-variation (TV) distance, and Wasserstein uncertainty sets. A key ingredient of our analysis is showing that the optimal robust $Q$ operator is a strict contraction with respect to a carefully designed semi-norm (with constant functions quotiented out). This property enables a stochastic approximation update that learns the optimal robust $Q$-function using $\tilde{\mathcal{O}}(ε^{-2})$ samples. We also provide an efficient routine for robust $Q$-function estimation, which in turn facilitates robust critic estimation. Building on this, we introduce an actor-critic algorithm that learns an $ε$-optimal robust policy within $\tilde{\mathcal{O}}(ε^{-2})$ samples. We provide numerical simulations to evaluate the performance of our algorithms.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes