LG OC MLJul 14, 2019

On the Global Convergence of Actor-Critic: A Case for Linear Quadratic Regulator with Ergodic Cost

Zhuoran Yang, Yongxin Chen, Mingyi Hong, Zhaoran Wang

arXiv:1907.06246v141 citations

Originality Incremental advance

AI Analysis

This provides foundational insights for understanding bilevel optimization in RL, though it is incremental as it focuses on a specific, simplified case.

The paper tackles the theoretical instability of actor-critic algorithms in reinforcement learning by analyzing their convergence in the linear quadratic regulator setting, proving global optimality at a linear rate.

Despite the empirical success of the actor-critic algorithm, its theoretical understanding lags behind. In a broader context, actor-critic can be viewed as an online alternating update algorithm for bilevel optimization, whose convergence is known to be fragile. To understand the instability of actor-critic, we focus on its application to linear quadratic regulators, a simple yet fundamental setting of reinforcement learning. We establish a nonasymptotic convergence analysis of actor-critic in this setting. In particular, we prove that actor-critic finds a globally optimal pair of actor (policy) and critic (action-value function) at a linear rate of convergence. Our analysis may serve as a preliminary step towards a complete theoretical understanding of bilevel optimization with nonconvex subproblems, which is NP-hard in the worst case and is often solved using heuristics.

View on arXiv PDF

Similar