SYSep 16, 2024
Safe and Stable Closed-Loop Learning for Neural-Network-Supported Model Predictive ControlSebastian Hirt, Maik Pfefferkorn, Rolf Findeisen
Safe learning of control policies remains challenging, both in optimal control and reinforcement learning. In this article, we consider safe learning of parametrized predictive controllers that operate with incomplete information about the underlying process. To this end, we employ Bayesian optimization for learning the best parameters from closed-loop data. Our method focuses on the system's overall long-term performance in closed-loop while keeping it safe and stable. Specifically, we parametrize the stage cost function of an MPC using a feedforward neural network. This allows for a high degree of flexibility, enabling the system to achieve a better closed-loop performance with respect to a superordinate measure. However, this flexibility also necessitates safety measures, especially with respect to closed-loop stability. To this end, we explicitly incorporated stability information in the Bayesian-optimization-based learning procedure, thereby achieving rigorous probabilistic safety guarantees. The proposed approach is illustrated using a numeric example.
23.1ROMay 14
SOCC-ICP: Semantics-Assisted Odometry based on Occupancy Grids and ICPJohannes Scherer, Sebastian Hirt, Henri Meeß
Reliable pose estimation in previously unseen environments is a fundamental capability of autonomous systems. Existing LiDAR odometry methods typically employ point-, surfel-, or NDT-based map representations, which are distinct from the semantic occupancy grids commonly used for downstream tasks such as motion planning. We introduce SOCC-ICP, a semantics-assisted odometry framework that jointly performs Semantic OCCupancy grid mapping and LiDAR scan alignment. Each map voxel encodes geometric and semantic statistics, enabling adaptive point-to-point or point-to-plane ICP based on local planarity. Further, the occupancy grid naturally filters dynamic objects through raycasting-based free-space updates. Across diverse evaluation scenarios, SOCC-ICP achieves performance competitive with state-of-the-art LiDAR odometry and remains robust in geometrically degenerate environments, even in the absence of semantic cues. When semantic labels are available, integrating them into map construction, downsampling, and correspondence weighting yields further accuracy gains. By unifying odometry and semantic occupancy grid mapping within a single representation, SOCC-ICP eliminates redundant map structures and directly provides a map suitable for downstream robotic applications.
21.8LGMar 25
Efficient Controller Learning from Human Preferences and Numerical Data Via Multi-Modal Surrogate ModelsLukas Theiner, Maik Pfefferkorn, Yongpeng Zhao et al.
Tuning control policies manually to meet high-level objectives is often time-consuming. Bayesian optimization provides a data-efficient framework for automating this process using numerical evaluations of an objective function. However, many systems, particularly those involving humans, require optimization based on subjective criteria. Preferential Bayesian optimization addresses this by learning from pairwise comparisons instead of quantitative measurements, but relying solely on preference data can be inefficient. We propose a multi-fidelity, multi-modal Bayesian optimization framework that integrates low-fidelity numerical data with high-fidelity human preferences. Our approach employs Gaussian process surrogate models with both hierarchical, autoregressive and non-hierarchical, coregionalization-based structures, enabling efficient learning from mixed-modality data. We illustrate the framework by tuning an autonomous vehicle's trajectory planner, showing that combining numerical and preference data significantly reduces the need for experiments involving the human decision maker while effectively adapting driving style to individual preferences.
LGDec 12, 2025
High-Dimensional Surrogate Modeling for Closed-Loop Learning of Neural-Network-Parameterized Model Predictive ControlSebastian Hirt, Valentinus Suwanto, Hendrik Alsmeier et al.
Learning controller parameters from closed-loop data has been shown to improve closed-loop performance. Bayesian optimization, a widely used black-box and sample-efficient learning method, constructs a probabilistic surrogate of the closed-loop performance from few experiments and uses it to select informative controller parameters. However, it typically struggles with dense high-dimensional controller parameterizations, as they may appear, for example, in tuning model predictive controllers, because standard surrogate models fail to capture the structure of such spaces. This work suggests that the use of Bayesian neural networks as surrogate models may help to mitigate this limitation. Through a comparison between Gaussian processes with Matern kernels, finite-width Bayesian neural networks, and infinite-width Bayesian neural networks on a cart-pole task, we find that Bayesian neural network surrogate models achieve faster and more reliable convergence of the closed-loop cost and enable successful optimization of parameterizations with hundreds of dimensions. Infinite-width Bayesian neural networks also maintain performance in settings with more than one thousand parameters, whereas Matern-kernel Gaussian processes rapidly lose effectiveness. These results indicate that Bayesian neural network surrogate models may be suitable for learning dense high-dimensional controller parameterizations and offer practical guidance for selecting surrogate models in learning-based controller design.
SYApr 18, 2024
Stability-informed Bayesian Optimization for MPC Cost Function LearningSebastian Hirt, Maik Pfefferkorn, Ali Mesbah et al.
Designing predictive controllers towards optimal closed-loop performance while maintaining safety and stability is challenging. This work explores closed-loop learning for predictive control parameters under imperfect information while considering closed-loop stability. We employ constrained Bayesian optimization to learn a model predictive controller's (MPC) cost function parametrized as a feedforward neural network, optimizing closed-loop behavior as well as minimizing model-plant mismatch. Doing so offers a high degree of freedom and, thus, the opportunity for efficient and global optimization towards the desired and optimal closed-loop behavior. We extend this framework by stability constraints on the learned controller parameters, exploiting the optimal value function of the underlying MPC as a Lyapunov candidate. The effectiveness of the proposed approach is underlined in simulations, highlighting its performance and safety capabilities.
SYApr 9, 2024
Learning Model Predictive Control Parameters via Bayesian Optimization for Battery Fast ChargingSebastian Hirt, Andreas Höhl, Joachim Schaeffer et al.
Tuning parameters in model predictive control (MPC) presents significant challenges, particularly when there is a notable discrepancy between the controller's predictions and the actual behavior of the closed-loop plant. This mismatch may stem from factors like substantial model-plant differences, limited prediction horizons that do not cover the entire time of interest, or unforeseen system disturbances. Such mismatches can jeopardize both performance and safety, including constraint satisfaction. Traditional methods address this issue by modifying the finite horizon cost function to better reflect the overall operational cost, learning parts of the prediction model from data, or implementing robust MPC strategies, which might be either computationally intensive or overly cautious. As an alternative, directly optimizing or learning the controller parameters to enhance closed-loop performance has been proposed. We apply Bayesian optimization for efficient learning of unknown model parameters and parameterized constraint backoff terms, aiming to improve closed-loop performance of battery fast charging. This approach establishes a hierarchical control framework where Bayesian optimization directly fine-tunes closed-loop behavior towards a global and long-term objective, while MPC handles lower-level, short-term control tasks. For lithium-ion battery fast charging, we show that the learning approach not only ensures safe operation but also maximizes closed-loop performance. This includes maintaining the battery's operation below its maximum terminal voltage and reducing charging times, all achieved using a standard nominal MPC model with a short horizon and notable initial model-plant mismatch.
SYDec 3, 2024
Time-Series-Informed Closed-loop Learning for Sequential Decision Making and ControlSebastian Hirt, Lukas Theiner, Rolf Findeisen
Closed-loop performance of sequential decision making algorithms, such as model predictive control, depends strongly on the choice of controller parameters. Bayesian optimization allows learning of parameters from closed-loop experiments, but standard Bayesian optimization treats this as a black-box problem and ignores the temporal structure of closed-loop trajectories, leading to slow convergence and inefficient use of experimental resources. We propose a time-series-informed multi-fidelity Bayesian optimization framework that aligns the fidelity dimension with closed-loop time, enabling intermediate performance evaluations within a closed-loop experiment to be incorporated as lower-fidelity observations. Additionally, we derive probabilistic early stopping criteria to terminate unpromising closed-loop experiments based on the surrogate model's posterior belief, avoiding full episodes for poor parameterizations and thereby reducing resource usage. Simulation results on a nonlinear control benchmark demonstrate that, compared to standard black-box Bayesian optimization approaches, the proposed method achieves comparable closed-loop performance with roughly half the experimental resources, and yields better final performance when using the same resource budget, highlighting the value of exploiting temporal structure for sample-efficient closed-loop controller tuning.
SYAug 18, 2025
A Hierarchical Surrogate Model for Efficient Multi-Task Parameter Learning in Closed-Loop ControlSebastian Hirt, Lukas Theiner, Maik Pfefferkorn et al.
Many control problems require repeated tuning and adaptation of controllers across distinct closed-loop tasks, where data efficiency and adaptability are critical. We propose a hierarchical Bayesian optimization (BO) framework that is tailored to efficient controller parameter learning in sequential decision-making and control scenarios for distinct tasks. Instead of treating the closed-loop cost as a black-box, our method exploits structural knowledge of the underlying problem, consisting of a dynamical system, a control law, and an associated closed-loop cost function. We construct a hierarchical surrogate model using Gaussian processes that capture the closed-loop state evolution under different parameterizations, while the task-specific weighting and accumulation into the closed-loop cost are computed exactly via known closed-form expressions. This allows knowledge transfer and enhanced data efficiency between different closed-loop tasks. The proposed framework retains sublinear regret guarantees on par with standard black-box BO, while enabling multi-task or transfer learning. Simulation experiments with model predictive control demonstrate substantial benefits in both sample efficiency and adaptability when compared to purely black-box BO approaches.
SYMar 19, 2025
Exploiting Prior Knowledge in Preferential Learning of Individualized Autonomous Vehicle Driving StylesLukas Theiner, Sebastian Hirt, Alexander Steinke et al.
Trajectory planning for automated vehicles commonly employs optimization over a moving horizon - Model Predictive Control - where the cost function critically influences the resulting driving style. However, finding a suitable cost function that results in a driving style preferred by passengers remains an ongoing challenge. We employ preferential Bayesian optimization to learn the cost function by iteratively querying a passenger's preference. Due to increasing dimensionality of the parameter space, preference learning approaches might struggle to find a suitable optimum with a limited number of experiments and expose the passenger to discomfort when exploring the parameter space. We address these challenges by incorporating prior knowledge into the preferential Bayesian optimization framework. Our method constructs a virtual decision maker from real-world human driving data to guide parameter sampling. In a simulation experiment, we achieve faster convergence of the prior-knowledge-informed learning procedure compared to existing preferential Bayesian optimization approaches and reduce the number of inadequate driving styles sampled.