OCLGNov 1, 2022

Convergence of policy gradient methods for finite-horizon exploratory linear-quadratic control problems

arXiv:2211.00617v328 citationsh-index: 14
Originality Incremental advance
AI Analysis

This work addresses convergence issues in continuous-time stochastic control for applications like robotics or finance, representing an incremental improvement with novel geometric methods.

The paper tackles the global linear convergence of policy gradient methods for finite-horizon exploratory linear-quadratic control problems, proposing geometry-aware gradient descents that achieve linear convergence to the optimal policy with robustness across action frequencies, as confirmed by numerical experiments.

We study the global linear convergence of policy gradient (PG) methods for finite-horizon continuous-time exploratory linear-quadratic control (LQC) problems. The setting includes stochastic LQC problems with indefinite costs and allows additional entropy regularisers in the objective. We consider a continuous-time Gaussian policy whose mean is linear in the state variable and whose covariance is state-independent. Contrary to discrete-time problems, the cost is noncoercive in the policy and not all descent directions lead to bounded iterates. We propose geometry-aware gradient descents for the mean and covariance of the policy using the Fisher geometry and the Bures-Wasserstein geometry, respectively. The policy iterates are shown to satisfy an a-priori bound, and converge globally to the optimal policy with a linear rate. We further propose a novel PG method with discrete-time policies. The algorithm leverages the continuous-time analysis, and achieves a robust linear convergence across different action frequencies. A numerical experiment confirms the convergence and robustness of the proposed algorithm.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes