15.7SYApr 16
Bridging Continuous-time LQR and Reinforcement Learning via Gradient Flow of the Bellman ErrorArmin Gießler, Albertus Johannes Malan, Sören Hohmann
In this paper, we present a novel method for computing the optimal feedback gain of the infinite-horizon Linear Quadratic Regulator (LQR) problem via an ordinary differential equation. We introduce a novel continuous-time Bellman error, derived from the Hamilton-Jacobi-Bellman (HJB) equation, which quantifies the suboptimality of stabilizing policies and is parametrized in terms of the feedback gain. We analyze its properties, including its effective domain, smoothness, coerciveness and show the existence of a unique stationary point within the stability region. Furthermore, we derive a closed-form gradient expression of the Bellman error that induces a gradient flow. This converges to the optimal feedback and generates a unique trajectory which exclusively comprises stabilizing feedback policies. Additionally, this work advances interesting connections between LQR theory and Reinforcement Learning (RL) by redefining suboptimality of the Algebraic Riccati Equation (ARE) as a Bellman error, adapting a state-independent formulation, and leveraging Lyapunov equations to overcome the infinite-horizon challenge. We validate our method in a simulation and compare it to the state of the art.
4.6SYApr 16
Data-driven Linear Quadratic Integral Control: A Convex Formulation and Policy Gradient ApproachArmin Gießler, Pol Jané-Soneira, Sören Hohmann
This paper studies the data-driven synthesis of linear quadratic integral (LQI) controllers for continuous-time systems. The objective is to achieve optimal state-feedback control with integral action for reference tracking using only measured data. To this end, we derive a data-driven closed-loop parameterization of the augmented dynamics that incorporates the integral state while relying solely on input-state-output measurements of the underlying system. Based on this parameterization, a data-driven convex optimization problem is formulated whose solution yields the optimal linear quadratic regulator (LQR) feedback gain for the augmented system without explicit knowledge of the system matrices. In addition, a policy gradient flow is derived to compute the optimal controller within the space of stabilizing gains. The proposed approach enables data-driven optimal tracking control while avoiding explicit state augmentation in the data collection phase. The effectiveness of the method is demonstrated through a numerical example involving a distributed generation unit (DGU) in a DC microgrid.
42.4OCApr 16
Towards Optimal Passive Feedback Control of LTI Systems under LQR PerformanceArmin Gießler, Pol Jané-Soneira, Sören Hohmann
We study state-feedback design for continuous-time LTI systems with a control input and an external input-output pair. Our objective is to determine feedback gains that render the closed-loop system (strictly) passive with respect to the external port while minimizing the standard LQR cost in the disturbance-free case. The resulting constrained optimization problem is intractable due to bilinear matrix inequalities. We analyze the set of passivating gains, showing it is unbounded, possibly nonconvex, path-connected, and contractible. We propose an indirect approach, in which the set of passivating feedback gains is inner-approximated by a compact, convex polytope. A projected gradient flow is employed to compute a gain within this polytope that minimizes the LQR cost. Numerical examples illustrate the effectiveness of the method.
19.0OCApr 30
Data-Driven Continuous-Time Linear Quadratic Regulator via Closed-Loop and Reinforcement Learning ParameterizationsArmin Gießler, Felix Thömmes, Sören Hohmann
This paper studies data-driven approaches to the continuous-time linear quadratic regulator (LQR) problem based on two existing parameterizations, namely a closed-loop (CL) parameterization from behavioral system theory and an integral reinforcement learning (IRL) parameterization. The CL parameterization characterizes the closed-loop system via a matrix that satisfies equality constraints. While this parameterization has been extensively studied for discrete-time systems, we adapt key results to the continuous-time setting and develop a policy iteration (PI) scheme, derive a data-driven continuous-time algebraic Riccati equation (CARE), and introduce an alternative convex problem formulation. The IRL parameterization utilizes off-policy data to perform policy evaluation, which is then used for PI or value iteration. Within the IRL framework, we derive a policy gradient flow and propose convex reformulations of the LQR problem. Finally, we provide a unified treatment of these parameterizations that enables a systematic understanding of existing approaches and clarifies their structural relationships.