LGJun 18, 2023

On the Global Convergence of Natural Actor-Critic with Two-layer Neural Network Parametrization

Mudit Gaur, Amrit Singh Bedi, Di Wang, Vaneet Aggarwal

arXiv:2306.10486v110.78 citationsh-index: 43

Originality Highly original

AI Analysis

This provides theoretical guarantees for neural network-based actor-critic methods, addressing a foundational problem in reinforcement learning theory.

The paper tackles the theoretical gap in actor-critic algorithms with neural network parametrization by proposing NAC2L, establishing a sample complexity of σ(1/ε^4(1-γ)^4) for countable state spaces without requiring linear or low-rank MDP structures.

Actor-critic algorithms have shown remarkable success in solving state-of-the-art decision-making problems. However, despite their empirical effectiveness, their theoretical underpinnings remain relatively unexplored, especially with neural network parametrization. In this paper, we delve into the study of a natural actor-critic algorithm that utilizes neural networks to represent the critic. Our aim is to establish sample complexity guarantees for this algorithm, achieving a deeper understanding of its performance characteristics. To achieve that, we propose a Natural Actor-Critic algorithm with 2-Layer critic parametrization (NAC2L). Our approach involves estimating the $Q$-function in each iteration through a convex optimization problem. We establish that our proposed approach attains a sample complexity of $\tilde{\mathcal{O}}\left(\frac{1}{ε^{4}(1-γ)^{4}}\right)$. In contrast, the existing sample complexity results in the literature only hold for a tabular or linear MDP. Our result, on the other hand, holds for countable state spaces and does not require a linear or low-rank structure on the MDP.

View on arXiv PDF

Similar