SYSYOCMar 15

Data-Enabled Policy and Value Iteration for Continuous-Time Linear Quadratic Output Feedback Control

arXiv:2603.1438693.2h-index: 9
AI Analysis

This addresses the problem of output feedback control in continuous-time systems for control engineers, offering a more efficient and stable alternative to existing methods, though it is incremental as it builds on prior data-driven control techniques.

The paper tackles the continuous-time linear quadratic regulator problem with unmeasurable states and unknown dynamics by proposing data-driven policy and value iteration algorithms, achieving superior numerical stability, reduced data demand, and higher computational efficiency without needing prior system knowledge.

This paper proposes efficient policy iteration and value iteration algorithms for the continuous-time linear quadratic regulator problem with unmeasurable states and unknown system dynamics, from the perspective of direct data-driven control. Specifically, by re-examining the data characteristics of input-output filtered vectors and introducing QR decomposition, an improved substitute state construction method is presented that further eliminates redundant information, ensures a full row rank data matrix, and enables a complete parameterized representation of the feedback controller. Furthermore, the original problem is transformed into an equivalent linear quadratic regulator problem defined on the substitute state with a known input matrix, verifying the stabilizability and detectability of the transformed system. Consequently, model-free policy iteration and value iteration algorithms are designed that fully exploit the full row rank substitute state data matrix. The proposed algorithms offer distinct advantages: they avoid the need for prior knowledge of the system order or the calculation of signal derivatives and integrals; the iterative equations can be solved directly without relying on the traditional least-squares paradigm, guaranteeing feasibility in both single-output and multi-output settings; and they demonstrate superior numerical stability, reduced data demand, and higher computational efficiency. Moreover, the heuristic results regarding trajectory generation for continuous-time systems are discussed, circumventing potential failure modes associated with existing approaches.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes