LG AIMay 21, 2025

Global Convergence for Average Reward Constrained MDPs with Primal-Dual Actor Critic Algorithm

Yang Xu, Swetha Ganesh, Washim Uddin Mondal, Qinbo Bai, Vaneet Aggarwal

arXiv:2505.15138v113.05 citationsh-index: 9

Originality Highly original

AI Analysis

This work addresses the problem of average reward Constrained Markov Decision Processes for researchers and practitioners in the field of reinforcement learning, providing an incremental improvement over existing methods.

The authors tackled the problem of average reward Constrained Markov Decision Processes, achieving global convergence and constraint violation rates of $ ilde{mathcal{O}}(1/sqrt{T})$ when the mixing time is known, and $ ilde{mathcal{O}}(1/T^{0.5-ε})$ otherwise. Their algorithm achieves a high convergence rate, with rates that match the theoretical lower bound.

This paper investigates infinite-horizon average reward Constrained Markov Decision Processes (CMDPs) with general parametrization. We propose a Primal-Dual Natural Actor-Critic algorithm that adeptly manages constraints while ensuring a high convergence rate. In particular, our algorithm achieves global convergence and constraint violation rates of $\tilde{\mathcal{O}}(1/\sqrt{T})$ over a horizon of length $T$ when the mixing time, $τ_{\mathrm{mix}}$, is known to the learner. In absence of knowledge of $τ_{\mathrm{mix}}$, the achievable rates change to $\tilde{\mathcal{O}}(1/T^{0.5-ε})$ provided that $T \geq \tilde{\mathcal{O}}\left(τ_{\mathrm{mix}}^{2/ε}\right)$. Our results match the theoretical lower bound for Markov Decision Processes and establish a new benchmark in the theoretical exploration of average reward CMDPs.

View on arXiv PDF

Similar