Revisiting LQR Control from the Perspective of Receding-Horizon Policy Gradient
This work offers a streamlined analysis for applying RHPG to linear control and estimation, potentially benefiting researchers in control theory and reinforcement learning, though it appears incremental as it revisits a classic problem with a new framework.
The paper tackles the discrete-time linear quadratic regulator (LQR) problem using a model-free learning framework called receding-horizon policy gradient (RHPG), providing a sample complexity analysis to learn a stabilizing and near-optimal control policy without requiring a stabilizing initialization.
We revisit in this paper the discrete-time linear quadratic regulator (LQR) problem from the perspective of receding-horizon policy gradient (RHPG), a newly developed model-free learning framework for control applications. We provide a fine-grained sample complexity analysis for RHPG to learn a control policy that is both stabilizing and $ε$-close to the optimal LQR solution, and our algorithm does not require knowing a stabilizing control policy for initialization. Combined with the recent application of RHPG in learning the Kalman filter, we demonstrate the general applicability of RHPG in linear control and estimation with streamlined analyses.