LGMLOct 11, 2024

On the Convergence of Single-Timescale Actor-Critic

arXiv:2410.08868v24 citationsh-index: 27
Originality Incremental advance
AI Analysis

This provides a theoretical foundation for actor-critic methods in reinforcement learning, offering improved convergence guarantees for practitioners, though it is incremental relative to existing stationary policy results.

The paper tackles the global convergence analysis of single-timescale actor-critic algorithms for infinite-horizon discounted MDPs, establishing convergence to an ε-close globally optimal policy with a sample complexity of O(ε^{-3}), improving upon prior complexities of O(ε^{-4}) for global optimality.

We analyze the global convergence of the single-timescale actor-critic (AC) algorithm for the infinite-horizon discounted Markov Decision Processes (MDPs) with finite state spaces. To this end, we introduce an elegant analytical framework for handling complex, coupled recursions inherent in the algorithm. Leveraging this framework, we establish that the algorithm converges to an $ε$-close \textbf{globally optimal} policy with a sample complexity of \( O(ε^{-3}) \). This significantly improves upon the existing complexity of $O(ε^{-2})$ to achieve $ε$-close \textbf{stationary policy}, which is equivalent to the complexity of $O(ε^{-4})$ to achieve $ε$-close \textbf{globally optimal} policy using gradient domination lemma. Furthermore, we demonstrate that to achieve this improvement, the step sizes for both the actor and critic must decay as \( O(k^{-\frac{2}{3}}) \) with iteration $k$, diverging from the conventional \( O(k^{-\frac{1}{2}}) \) rates commonly used in (non)convex optimization.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes