Sample Complexity of Linear Quadratic Regulator Without Initial Stability
This work addresses the LQR problem for control systems by removing restrictive assumptions, making it more broadly applicable, though it appears incremental in nature.
The paper tackles the Linear Quadratic Regulator (LQR) problem with unknown dynamics by introducing a receding-horizon algorithm that avoids two-point gradient estimates and the need for a stable initial policy, achieving the same order of sample complexity as prior methods.
Inspired by REINFORCE, we introduce a novel receding-horizon algorithm for the Linear Quadratic Regulator (LQR) problem with unknown dynamics. Unlike prior methods, our algorithm avoids reliance on two-point gradient estimates while maintaining the same order of sample complexity. Furthermore, it eliminates the restrictive requirement of starting with a stable initial policy, broadening its applicability. Beyond these improvements, we introduce a refined analysis of error propagation through the contraction of the Riccati operator under the Riemannian distance. This refinement leads to a better sample complexity and ensures improved convergence guarantees.