LGMLFeb 28, 2022

Provably Efficient Convergence of Primal-Dual Actor-Critic with Nonlinear Function Approximation

arXiv:2202.13863v11 citations
Originality Highly original
AI Analysis

This provides a theoretical foundation for efficient actor-critic methods in reinforcement learning, applicable to scenarios like multi-agent RL, though it is incremental as it builds on existing primal-dual frameworks.

The paper tackles the convergence of actor-critic algorithms with nonlinear function approximation in a nonconvex-nonconcave primal-dual setting, achieving a provably efficient convergence rate of O(√(ln(N d G²)/N)) under Markovian sampling and demonstrating empirical validation on OpenAI Gym tasks.

We study the convergence of the actor-critic algorithm with nonlinear function approximation under a nonconvex-nonconcave primal-dual formulation. Stochastic gradient descent ascent is applied with an adaptive proximal term for robust learning rates. We show the first efficient convergence result with primal-dual actor-critic with a convergence rate of $\mathcal{O}\left(\sqrt{\frac{\ln \left(N d G^2 \right)}{N}}\right)$ under Markovian sampling, where $G$ is the element-wise maximum of the gradient, $N$ is the number of iterations, and $d$ is the dimension of the gradient. Our result is presented with only the Polyak-Łojasiewicz condition for the dual variables, which is easy to verify and applicable to a wide range of reinforcement learning (RL) scenarios. The algorithm and analysis are general enough to be applied to other RL settings, like multi-agent RL. Empirical results on OpenAI Gym continuous control tasks corroborate our theoretical findings.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes