SY SYApr 16

Bridging Continuous-time LQR and Reinforcement Learning via Gradient Flow of the Bellman Error

Armin Gießler, Albertus Johannes Malan, Sören Hohmann

arXiv:2506.096853.2h-index: 6

Predicted impact top 92% in SY · last 90 daysOriginality Incremental advance

AI Analysis

For control theory and reinforcement learning researchers, this work bridges LQR and RL by reformulating the ARE as a Bellman error, offering a new perspective on optimal control.

This paper introduces a novel continuous-time Bellman error for the infinite-horizon LQR problem, derived from the HJB equation, and uses its gradient flow to compute the optimal feedback gain. The method guarantees convergence to the optimal policy while maintaining stability throughout the trajectory.

In this paper, we present a novel method for computing the optimal feedback gain of the infinite-horizon Linear Quadratic Regulator (LQR) problem via an ordinary differential equation. We introduce a novel continuous-time Bellman error, derived from the Hamilton-Jacobi-Bellman (HJB) equation, which quantifies the suboptimality of stabilizing policies and is parametrized in terms of the feedback gain. We analyze its properties, including its effective domain, smoothness, coerciveness and show the existence of a unique stationary point within the stability region. Furthermore, we derive a closed-form gradient expression of the Bellman error that induces a gradient flow. This converges to the optimal feedback and generates a unique trajectory which exclusively comprises stabilizing feedback policies. Additionally, this work advances interesting connections between LQR theory and Reinforcement Learning (RL) by redefining suboptimality of the Algebraic Riccati Equation (ARE) as a Bellman error, adapting a state-independent formulation, and leveraging Lyapunov equations to overcome the infinite-horizon challenge. We validate our method in a simulation and compare it to the state of the art.

View on arXiv PDF

Similar