LG SYAug 31, 2024

Lyapunov Neural ODE State-Feedback Control Policies

Joshua Hang Sai Ip, Georgios Makrygiorgos, Ali Mesbah

arXiv:2409.00393v34.6h-index: 7

Originality Incremental advance

AI Analysis

This addresses the challenge of ensuring stability in neural control policies for continuous-time systems, with applications like plasma medicine, though it is incremental in bridging NODEs with stability guarantees.

The paper tackled the problem of stabilizing constrained nonlinear systems in continuous-time optimal control by proposing Lyapunov-NODE control (L-NODEC), which uses a Lyapunov loss to learn state-feedback neural policies with exponential stability guarantees, reducing inference time to reach the target state.

Deep neural networks are increasingly used as an effective parameterization of control policies in various learning-based control paradigms. For continuous-time optimal control problems (OCPs), which are central to many decision-making tasks, control policy learning can be cast as a neural ordinary differential equation (NODE) problem wherein state and control constraints are naturally accommodated. This paper presents a NODE approach to solving continuous-time OCPs for the case of stabilizing a known constrained nonlinear system around a target state. The approach, termed Lyapunov-NODE control (L-NODEC), uses a novel Lyapunov loss formulation that incorporates an exponentially-stabilizing control Lyapunov function to learn a state-feedback neural control policy, bridging the gap of solving continuous-time OCPs via NODEs with stability guarantees. The proposed Lyapunov loss allows L-NODEC to guarantee exponential stability of the controlled system, as well as its adversarial robustness to perturbations to the initial state. The performance of L-NODEC is illustrated in two problems, including a dose delivery problem in plasma medicine. In both cases, L-NODEC effectively stabilizes the controlled system around the target state despite perturbations to the initial state and reduces the inference time necessary to reach the target.

View on arXiv PDF

Similar