Direct Data-Driven Linear Quadratic Tracking via Policy Optimization

arXiv:2605.155635.3

Predicted impact top 86% in SY · last 90 daysOriginality Incremental advance

AI Analysis

For control engineers seeking real-time data-driven optimal tracking, this work provides a computationally efficient method that overcomes the dimensionality bottleneck of existing approaches.

This paper extends the Data-EnablEd Policy Optimization (DeePO) framework from LQR to LQT by introducing a reference-decoupled reformulation that maintains constant-dimension decision variables. The proposed offline and online algorithms achieve global linear convergence and linear decay of optimality gap up to a bias term inversely proportional to SNR, with superior tracking performance in simulations.

Direct data-driven optimal control provides an elegant end-to-end paradigm, yet its real-time applicability is often hindered by the growing dimensionality of online decision variables. Recent breakthroughs, notably Data-EnablEd Policy Optimization (DeePO), overcome this bottleneck for the Linear Quadratic Regulator (LQR) through sample-covariance parameterization; however, extending this paradigm to Linear Quadratic Tracking (LQT) poses a fundamental challenge. The core difficulty stems from the intricate coupling between time-varying references and the feedback-feedforward policy structure, which prevents a direct application of constant-dimension parameterization. We first introduce a reference-decoupled reformulation of LQT that naturally accommodates the covariance parameterization, guaranteeing a fixed dimension of decision variables independent of data horizon. This formulation is proven to be exactly equivalent to the indirect certainty-equivalence LQT solution. Leveraging this characterization, we develop offline and online DeePO algorithms. Theoretically, we prove global linear convergence for the offline algorithm using local gradient dominance and smoothness, and show that in the online setting the optimality gap decays linearly up to a bias term that scales inversely with the signal-to-noise ratio (SNR). Numerical simulations varify the theoretical results and illustrate the superior tracking performance of the proposed method.

View on arXiv PDF

Similar