SYLGMar 31, 2023

An Efficient Off-Policy Reinforcement Learning Algorithm for the Continuous-Time LQR Problem

arXiv:2303.17819v111 citationsh-index: 38
Originality Incremental advance
AI Analysis

This provides an efficient solution for control systems in robotics or engineering where continuous-time optimization is needed, though it is incremental as it builds on existing off-policy methods.

The paper tackles the continuous-time LQR problem by designing an off-policy reinforcement learning algorithm that uses input-state data with a specific persistently exciting input for exploration, guaranteeing solution existence, uniqueness, and convergence to optimal control.

In this paper, an off-policy reinforcement learning algorithm is designed to solve the continuous-time LQR problem using only input-state data measured from the system. Different from other algorithms in the literature, we propose the use of a specific persistently exciting input as the exploration signal during the data collection step. We then show that, using this persistently excited data, the solution of the matrix equation in our algorithm is guaranteed to exist and to be unique at every iteration. Convergence of the algorithm to the optimal control input is also proven. Moreover, we formulate the policy evaluation step as the solution of a Sylvester-transpose equation, which increases the efficiency of its solution. Finally, a method to determine a stabilizing policy to initialize the algorithm using only measured data is proposed.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes