Provably Efficient Convergence of Primal-Dual Actor-Critic with Nonlinear Function Approximation
This provides a theoretical foundation for efficient actor-critic methods in reinforcement learning, applicable to scenarios like multi-agent RL, though it is incremental as it builds on existing primal-dual frameworks.
The paper tackles the convergence of actor-critic algorithms with nonlinear function approximation in a nonconvex-nonconcave primal-dual setting, achieving a provably efficient convergence rate of O(√(ln(N d G²)/N)) under Markovian sampling and demonstrating empirical validation on OpenAI Gym tasks.
We study the convergence of the actor-critic algorithm with nonlinear function approximation under a nonconvex-nonconcave primal-dual formulation. Stochastic gradient descent ascent is applied with an adaptive proximal term for robust learning rates. We show the first efficient convergence result with primal-dual actor-critic with a convergence rate of $\mathcal{O}\left(\sqrt{\frac{\ln \left(N d G^2 \right)}{N}}\right)$ under Markovian sampling, where $G$ is the element-wise maximum of the gradient, $N$ is the number of iterations, and $d$ is the dimension of the gradient. Our result is presented with only the Polyak-Łojasiewicz condition for the dual variables, which is easy to verify and applicable to a wide range of reinforcement learning (RL) scenarios. The algorithm and analysis are general enough to be applied to other RL settings, like multi-agent RL. Empirical results on OpenAI Gym continuous control tasks corroborate our theoretical findings.