OCLGJan 31, 2022

Single Time-scale Actor-critic Method to Solve the Linear Quadratic Regulator with Convergence Guarantees

arXiv:2202.00048v220 citations
Originality Incremental advance
AI Analysis

This provides a method for control theory and reinforcement learning practitioners to solve LQR with improved efficiency, though it appears incremental as it builds on existing actor-critic and LSTD techniques.

The paper tackles the linear quadratic regulator (LQR) problem by proposing a single time-scale actor-critic algorithm, achieving a sample complexity of O(ε^{-1} log(ε^{-1})^2) with convergence guarantees and numerical validation.

We propose a single time-scale actor-critic algorithm to solve the linear quadratic regulator (LQR) problem. A least squares temporal difference (LSTD) method is applied to the critic and a natural policy gradient method is used for the actor. We give a proof of convergence with sample complexity $\mathcal{O}(\varepsilon^{-1} \log(\varepsilon^{-1})^2)$. The method in the proof is applicable to general single time-scale bilevel optimization problem. We also numerically validate our theoretical results on the convergence.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes