LGOct 25, 2023

Finite-Time Analysis of Three-Timescale Constrained Actor-Critic and Constrained Natural Actor-Critic Algorithms

arXiv:2310.16363v41 citationsh-index: 1
Originality Incremental advance
AI Analysis

This provides a theoretical foundation for safe reinforcement learning with constraints, though it is incremental as it extends existing actor-critic methods to constrained settings with non-asymptotic analysis.

The paper tackles the problem of analyzing actor-critic algorithms for constrained Markov decision processes with inequality constraints in a non-i.i.d. setting, proving that both Constrained Actor Critic and Constrained Natural Actor Critic achieve a first-order stationary point with a sample complexity of 𝒪̃(ε^{-2.5}) and validating results on Safety-Gym environments.

Actor Critic methods have found immense applications on a wide range of Reinforcement Learning tasks especially when the state-action space is large. In this paper, we consider actor critic and natural actor critic algorithms with function approximation for constrained Markov decision processes (C-MDP) involving inequality constraints and carry out a non-asymptotic analysis for both of these algorithms in a non-i.i.d (Markovian) setting. We consider the long-run average cost criterion where both the objective and the constraint functions are suitable policy-dependent long-run averages of certain prescribed cost functions. We handle the inequality constraints using the Lagrange multiplier method. We prove that these algorithms are guaranteed to find a first-order stationary point (i.e., $\Vert \nabla L(θ,γ)\Vert_2^2 \leq ε$) of the performance (Lagrange) function $L(θ,γ)$, with a sample complexity of $\mathcal{\tilde{O}}(ε^{-2.5})$ in the case of both Constrained Actor Critic (C-AC) and Constrained Natural Actor Critic (C-NAC) algorithms. We also show the results of experiments on three different Safety-Gym environments.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes