LGAIMay 21, 2025

Global Convergence for Average Reward Constrained MDPs with Primal-Dual Actor Critic Algorithm

arXiv:2505.15138v15 citationsh-index: 9
Originality Highly original
AI Analysis

This work addresses the problem of average reward Constrained Markov Decision Processes for researchers and practitioners in the field of reinforcement learning, providing an incremental improvement over existing methods.

The authors tackled the problem of average reward Constrained Markov Decision Processes, achieving global convergence and constraint violation rates of $ ilde{mathcal{O}}(1/sqrt{T})$ when the mixing time is known, and $ ilde{mathcal{O}}(1/T^{0.5-ε})$ otherwise. Their algorithm achieves a high convergence rate, with rates that match the theoretical lower bound.

This paper investigates infinite-horizon average reward Constrained Markov Decision Processes (CMDPs) with general parametrization. We propose a Primal-Dual Natural Actor-Critic algorithm that adeptly manages constraints while ensuring a high convergence rate. In particular, our algorithm achieves global convergence and constraint violation rates of $\tilde{\mathcal{O}}(1/\sqrt{T})$ over a horizon of length $T$ when the mixing time, $τ_{\mathrm{mix}}$, is known to the learner. In absence of knowledge of $τ_{\mathrm{mix}}$, the achievable rates change to $\tilde{\mathcal{O}}(1/T^{0.5-ε})$ provided that $T \geq \tilde{\mathcal{O}}\left(τ_{\mathrm{mix}}^{2/ε}\right)$. Our results match the theoretical lower bound for Markov Decision Processes and establish a new benchmark in the theoretical exploration of average reward CMDPs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes