Melanie N. Zeilinger

h-index37

49papers

6,384citations

Novelty51%

AI Score58

Ranked #4,335 of 194,257 authors (top 2%)#5 in SY (top 1%)

49 Papers

2.5SYJul 12

Stochastic MPC with Online-optimized Policies and Closed-loop Guarantees

Marcell Bartos, Alexandre Didier, Jerome Sieber et al.

This paper proposes a stochastic model predictive control method for linear systems affected by additive Gaussian disturbances that optimizes over disturbance feedback matrices online. Closed-loop satisfaction of probabilistic constraints and recursive feasibility of the underlying convex optimization problem is guaranteed. Optimization over feedback policies online increases performance and reduces conservatism compared to fixed-feedback approaches. The central mechanism is a finitely determined maximal admissible set for probabilistic constraints, together with the reconditioning of the predicted probabilistic constraints on the current knowledge at every time step. The proposed method's applicability is demonstrated on a building temperature control example.

1.2SYJun 21, 2016

Plug-and-Play Model Predictive Control for Load Shaping and Voltage Control in Smart Grids

Caroline Le Floch, Somil Bansal, Claire J. Tomlin et al.

This paper presents a predictive controller for handling plug-and-play (P&P) charging requests of flexible loads in a distribution system. We define two types of flexible loads: (i) deferrable loads that have a fixed power profile but can be deferred in time and (ii) shapeable loads that have flexible power profiles but fixed energy requests, such as Plug-in Electric Vehicles (PEVs). The proposed method uses a hierarchical control scheme based on a model predictive control (MPC) formulation for minimizing the global system cost. The first stage computes a reachable reference that trades off deviation from the nominal voltage with the required generation control. The second stage uses a price-based objective to aggregate flexible loads and provide load shaping services, while satisfying system constraints and users' preferences at all times. It is shown that the proposed controller is recursively feasible under specific conditions, i.e. the flexible load demands are satisfied and bus voltages remain within the desired limits. Finally, the proposed scheme is illustrated on a 55 bus radial distribution network.

1.2SYJan 21, 2019

Recursively Feasible Stochastic Model Predictive Control using Indirect Feedback

Lukas Hewing, Kim P. Wabersich, Melanie N. Zeilinger

We present a stochastic model predictive control (MPC) method for linear discrete-time systems subject to possibly unbounded and correlated additive stochastic disturbance sequences. Chance constraints are treated in analogy to robust MPC using the concept of probabilistic reachable sets for constraint tightening. We introduce an initialization of each MPC iteration which is always recursively feasibility and thereby allows that chance constraint satisfaction for the closed-loop system can readily be shown. Under an i.i.d. zero mean assumption on the additive disturbance, we furthermore provide an average asymptotic performance bound. Two examples illustrate the approach, highlighting feedback properties of the novel initialization scheme, as well as the inclusion of time-varying, correlated disturbances in a building control setting.

16.1SYJun 24, 2023

Physics-Informed Machine Learning for Modeling and Control of Dynamical Systems

Truong X. Nghiem, Ján Drgoňa, Colin Jones et al.

Physics-informed machine learning (PIML) is a set of methods and tools that systematically integrate machine learning (ML) algorithms with physical constraints and abstract mathematical models developed in scientific and engineering domains. As opposed to purely data-driven methods, PIML models can be trained from additional information obtained by enforcing physical laws such as energy and mass conservation. More broadly, PIML models can include abstract properties and conditions such as stability, convexity, or invariance. The basic premise of PIML is that the integration of ML and physics can yield more effective, physically consistent, and data-efficient models. This paper aims to provide a tutorial-like overview of the recent advances in PIML for dynamical system modeling and control. Specifically, the paper covers an overview of the theory, fundamental concepts and methods, tools, and applications on topics of: 1) physics-informed learning for system identification; 2) physics-informed learning for control; 3) analysis and verification of PIML models; and 4) physics-informed digital twins. The paper is concluded with a perspective on open challenges and future research opportunities.

18.8LGJul 25, 2023Code

Submodular Reinforcement Learning

Manish Prajapat, Mojmír Mutný, Melanie N. Zeilinger et al.

In reinforcement learning (RL), rewards of states are typically considered additive, and following the Markov assumption, they are $\textit{independent}$ of states visited previously. In many important applications, such as coverage control, experiment design and informative path planning, rewards naturally have diminishing returns, i.e., their value decreases in light of similar states visited previously. To tackle this, we propose $\textit{submodular RL}$ (SubRL), a paradigm which seeks to optimize more general, non-additive (and history-dependent) rewards modelled via submodular set functions which capture diminishing returns. Unfortunately, in general, even in tabular settings, we show that the resulting optimization problem is hard to approximate. On the other hand, motivated by the success of greedy algorithms in classical submodular optimization, we propose SubPO, a simple policy gradient-based algorithm for SubRL that handles non-additive rewards by greedily maximizing marginal gains. Indeed, under some assumptions on the underlying Markov Decision Process (MDP), SubPO recovers optimal constant factor approximations of submodular bandits. Moreover, we derive a natural policy gradient approach for locally optimizing SubRL instances even in large state- and action- spaces. We showcase the versatility of our approach by applying SubPO to several applications, such as biodiversity monitoring, Bayesian experiment design, informative path planning, and coverage maximization. Our results demonstrate sample efficiency, as well as scalability to high-dimensional state-action spaces.

14.6LGOct 12, 2022Code

Near-Optimal Multi-Agent Learning for Safe Coverage Control

Manish Prajapat, Matteo Turchetta, Melanie N. Zeilinger et al.

In multi-agent coverage control problems, agents navigate their environment to reach locations that maximize the coverage of some density. In practice, the density is rarely known $\textit{a priori}$, further complicating the original NP-hard problem. Moreover, in many applications, agents cannot visit arbitrary locations due to $\textit{a priori}$ unknown safety constraints. In this paper, we aim to efficiently learn the density to approximately solve the coverage problem while preserving the agents' safety. We first propose a conditionally linear submodular coverage function that facilitates theoretical analysis. Utilizing this structure, we develop MacOpt, a novel algorithm that efficiently trades off the exploration-exploitation dilemma due to partial observability, and show that it achieves sublinear regret. Next, we extend results on single-agent safe exploration to our multi-agent setting and propose SafeMac for safe coverage and exploration. We analyze SafeMac and give first of its kind results: near optimal coverage in finite time while provably guaranteeing safety. We extensively evaluate our algorithms on synthetic and real problems, including a bio-diversity monitoring task under safety constraints, where SafeMac outperforms competing methods.

13.6OCNov 28, 2022

Zero-Order Optimization for Gaussian Process-based Model Predictive Control

Amon Lahr, Andrea Zanelli, Andrea Carron et al.

By enabling constraint-aware online model adaptation, model predictive control using Gaussian process (GP) regression has exhibited impressive performance in real-world applications and received considerable attention in the learning-based control community. Yet, solving the resulting optimal control problem in real-time generally remains a major challenge, due to i) the increased number of augmented states in the optimization problem, as well as ii) computationally expensive evaluations of the posterior mean and covariance and their respective derivatives. To tackle these challenges, we employ i) a tailored Jacobian approximation in a sequential quadratic programming (SQP) approach, and combine it with ii) a parallelizable GP inference and automatic differentiation framework. Reducing the numerical complexity with respect to the state dimension $n_x$ for each SQP iteration from $\mathcal{O}(n_x^6)$ to $\mathcal{O}(n_x^3)$, and accelerating GP evaluations on a graphical processing unit, the proposed algorithm computes suboptimal, yet feasible solutions at drastically reduced computation times and exhibits favorable local convergence properties. Numerical experiments verify the scaling properties and investigate the runtime distribution across different parts of the algorithm.

6.6SYApr 19, 2023Code

Approximate non-linear model predictive control with safety-augmented neural networks

Henrik Hose, Johannes Köhler, Melanie N. Zeilinger et al.

Model predictive control (MPC) achieves stability and constraint satisfaction for general nonlinear systems, but requires computationally expensive online optimization. This paper studies approximations of such MPC controllers via neural networks (NNs) to achieve fast online evaluation. We propose safety augmentation that yields deterministic guarantees for convergence and constraint satisfaction despite approximation inaccuracies. We approximate the entire input sequence of the MPC with NNs, which allows us to verify online if it is a feasible solution to the MPC problem. We replace the NN solution by a safe candidate based on standard MPC techniques whenever it is infeasible or has worse cost. Our method requires a single evaluation of the NN and forward integration of the input sequence online, which is fast to compute on resource-constrained systems. The proposed control framework is illustrated using two numerical non-linear MPC benchmarks of different complexity, demonstrating computational speedups that are orders of magnitude higher than online optimization. In the examples, we achieve deterministic safety through the safety-augmented NNs, where a naive NN implementation fails.

6.6SYNov 27, 2025Code

L4acados: Learning-based models for acados, applied to Gaussian process-based predictive control

Amon Lahr, Joshua Näf, Kim P. Wabersich et al.

Incorporating learning-based models, such as artificial neural networks or Gaussian processes, into model predictive control (MPC) strategies can significantly improve control performance and online adaptation capabilities for real-world applications. Still, enabling state-of-the-art implementations of learning-based models for MPC is complicated by the challenge of interfacing machine learning frameworks with real-time optimal control software. This work aims at filling this gap by incorporating external sensitivities in sequential quadratic programming solvers for nonlinear optimal control. To this end, we provide L4acados, a general framework for incorporating Python-based dynamics models in the real-time optimal control software acados. By computing external sensitivities via a user-defined Python module, L4acados enables the implementation of MPC controllers with learning-based residual models in acados, while supporting parallelization of sensitivity computations when preparing the quadratic subproblems. We demonstrate significant speed-ups and superior scaling properties of L4acados compared to available software using a neural-network-based control example. Last, we provide an efficient and modular real-time implementation of Gaussian process-based MPC using L4acados, which is applied to two hardware examples: autonomous miniature racing, as well as motion control of a full-scale autonomous vehicle for an ISO lane change maneuver.

13.3OCSep 13, 2024Code

Towards safe and tractable Gaussian process-based MPC: Efficient sampling within a sequential quadratic programming framework

Manish Prajapat, Amon Lahr, Johannes Köhler et al.

Learning uncertain dynamics models using Gaussian process~(GP) regression has been demonstrated to enable high-performance and safety-aware control strategies for challenging real-world applications. Yet, for computational tractability, most approaches for Gaussian process-based model predictive control (GP-MPC) are based on approximations of the reachable set that are either overly conservative or impede the controller's safety guarantees. To address these challenges, we propose a robust GP-MPC formulation that guarantees constraint satisfaction with high probability. For its tractable implementation, we propose a sampling-based GP-MPC approach that iteratively generates consistent dynamics samples from the GP within a sequential quadratic programming framework. We highlight the improved reachable set approximation compared to existing methods, as well as real-time feasible computation times, using two numerical examples.

7.0ROMar 26

An MPC framework for efficient navigation of mobile robots in cluttered environments

Johannes Köhler, Daniel Zhang, Raffaele Soloperto et al.

We present a model predictive control (MPC) framework for efficient navigation of mobile robots in cluttered environments. The proposed approach integrates a finite-segment shortest path planner into the finite-horizon trajectory optimization of the MPC. This formulation ensures convergence to dynamically selected targets and guarantees collision avoidance, even under general nonlinear dynamics and cluttered environments. The approach is validated through hardware experiments on a small ground robot, where a human operator dynamically assigns target locations that a robot should reach while avoiding obstacles. The robot reached new targets within 2-3 seconds and responded to new commands within 50 ms to 100 ms, immediately adjusting its motion even while still moving at high speeds toward a previous target.

10.2SYMar 30

Optimistic Online LQR via Intrinsic Rewards

Marcell Bartos, Bruce D. Lee, Lenart Treven et al.

Optimism in the face of uncertainty is a popular approach to balance exploration and exploitation in reinforcement learning. Here, we consider the online linear quadratic regulator (LQR) problem, i.e., to learn the LQR corresponding to an unknown linear dynamical system by adapting the control policy online based on closed-loop data collected during operation. In this work, we propose Intrinsic Rewards LQR (IR-LQR), an optimistic online LQR algorithm that applies the idea of intrinsic rewards originating from reinforcement learning and the concept of variance regularization to promote uncertainty-driven exploration. IR-LQR retains the structure of a standard LQR synthesis problem by only modifying the cost function, resulting in an intuitively pleasing, simple, computationally cheap, and efficient algorithm. This is in contrast to existing optimistic online LQR formulations that rely on more complicated iterative search algorithms or solve computationally demanding optimization problems. We show that IR-LQR achieves the optimal worst-case regret rate of $\sqrt{T}$, and compare it to various state-of-the-art online LQR algorithms via numerical experiments carried out on an aircraft pitch angle control and an unmanned aerial vehicle example.

8.7SYApr 15

Stability of Certainty-Equivalent Adaptive LQR for Linear Systems with Unknown Time-Varying Parameters

Marcell Bartos, Johannes Köhler, Florian Dörfler et al.

Standard model-based control design deteriorates when the system dynamics change during operation. To overcome this challenge, online and adaptive methods have been proposed in the literature. In this work, we consider the class of discrete-time linear systems with unknown time-varying parameters. We propose a simple, modular, and computationally tractable approach by combining two classical and well-known building blocks from estimation and control: the least mean square filter and the certainty-equivalent linear quadratic regulator. Despite both building blocks being simple and off-the-shelf, our analysis shows that they can be seamlessly combined to a powerful pipeline with stability guarantees. Namely, finite-gain $\ell^2$-stability of the closed-loop interconnection of the unknown system, the parameter estimator, and the controller is proven, despite the presence of unknown disturbances and time-varying parametric uncertainties. Real-world applicability of the proposed algorithm is showcased by simulations carried out on a nonlinear planar quadrotor.

6.5LGMar 17

Optimal uncertainty bounds for multivariate kernel regression under bounded noise: A Gaussian process-based dual function

Amon Lahr, Anna Scampicchio, Johannes Köhler et al.

Non-conservative uncertainty bounds are essential for making reliable predictions about latent functions from noisy data--and thus, a key enabler for safe learning-based control. In this domain, kernel methods such as Gaussian process regression are established techniques, thanks to their inherent uncertainty quantification mechanism. Still, existing bounds either pose strong assumptions on the underlying noise distribution, are conservative, do not scale well in the multi-output case, or are difficult to integrate into downstream tasks. This paper addresses these limitations by presenting a tight, distribution-free bound for multi-output kernel-based estimates. It is obtained through an unconstrained, duality-based formulation, which shares the same structure of classic Gaussian process confidence bounds and can thus be straightforwardly integrated into downstream optimization pipelines. We show that the proposed bound generalizes many existing results and illustrate its application using an example inspired by quadrotor dynamics learning.

8.5SYApr 9

Unifying Sequential Quadratic Programming and Linear-Parameter-Varying Algorithms for Real-Time Model Predictive Control

Kristóf Floch, Amon Lahr, Roland Tóth et al.

This paper presents a unified framework that connects sequential quadratic programming (SQP) and the iterative linear-parameter-varying model predictive control (LPV-MPC) technique. Using the differential formulation of the LPV-MPC, we demonstrate how SQP and LPV-MPC can be unified through a specific choice of scheduling variable and the 2nd Fundamental Theorem of Calculus (FTC) embedding technique and compare their convergence properties. This enables the unification of the zero-order approach of SQP with the LPV-MPC scheduling technique to enhance the computational efficiency of robust and stochastic MPC problems. To demonstrate our findings, we compare the two schemes in a simulation example. Finally, we present real-time feasibility and performance of the zero-order LPV-MPC approach by applying it to Gaussian process (GP)-based MPC for autonomous racing with real-world experiments.

6.9SYApr 1

Bridging RL and MPC for mixed-integer optimal control with application to Formula 1 race strategies

Joschua Wüthrich, Romir Damle, Giona Fieni et al.

We propose a hybrid reinforcement learning (RL) and model predictive control (MPC) framework for mixed-integer optimal control, where discrete variables enter the cost and dynamics but not the constraints. Existing hierarchical approaches use RL only for the discrete action space, leaving continuous optimization to MPC. Unlike these methods, we train the RL agent on the full hybrid action space, ensuring consistency with the cost of the underlying Markov decision process. During deployment, the RL actor is rolled out over the prediction horizon to parametrize an integer-free nonlinear MPC through the discrete action sequence and provide a continuous warm-start. The learned critic serves as a terminal cost to capture long-term performance. We prove recursive feasibility, and validate the framework on a Formula 1 race strategy problem. The hybrid method achieves near-optimal performance relative to an offline mixed-integer nonlinear program benchmark, outperforming a standalone RL agent. Moreover, the hybrid scheme enables adaptation to unseen disturbances through modular MPC extensions at zero retraining cost.

7.7SYMar 31

Distributed Predictive Control Barrier Functions: Towards Scalable Safety Certification in Modular Multi-Agent Systems

Jonas Ohnemus, Alexandre Didier, Ahmed Aboudonia et al.

We consider safety-critical multi-agent systems with distributed control architectures and potentially varying network topologies. While learning-based distributed control enables scalability and high performance, a lack of formal safety guarantees in the face of unforeseen disturbances and unsafe network topology changes may lead to system failure. To address this challenge, we introduce structured control barrier functions (s-CBFs) as a multi-agent safety framework. The s-CBFs are augmented to a distributed predictive control barrier function (D-PCBF), a predictive, optimization-based safety layer that uses model predictions to guarantee recoverable safety at all times. The proposed approach enables a permissive yet formal plug-and-play protocol, allowing agents to join or leave the network while ensuring safety recovery if a change in network topology requires temporarily unsafe behavior. We validate the formulation through simulations and real-time experiments of a miniature race-car platoon.

8.3SYApr 10

Efficient Uniform Feasible Set Sampling for Approximate Linear MPC

Elias Milios, Felix Berkel, Felix Gruber et al.

Model Predictive Control (MPC) offers safe and near-optimal control but suffers from high computational costs. Approximate MPC (AMPC) mitigates this by learning a cheaper surrogate policy, typically by training a neural network on state-MPC input pairs. Generating training data is a major bottleneck, requiring solving the MPC for numerous states sampled from its feasible set. Since this feasible set is implicitly defined and unknown, efficient sampling is nontrivial but crucial. We propose the linear MPC Hit-and-Run (LMPC-HR) sampler for linear MPC with polyhedral constraints. We identify the feasible set boundaries along search directions, a crucial step within HR, by formulating the problem as a convex linear program, replacing expensive iterative searches with a single optimization step. A numerical study demonstrates that LMPC-HR achieves an order of magnitude reduction in computation time for generating uniformly distributed samples from the feasible set compared to naive baselines.

8.0SYMar 18

Real-Time Online Learning for Model Predictive Control using a Spatio-Temporal Gaussian Process Approximation

Lars Bartels, Amon Lahr, Andrea Carron et al.

Learning-based model predictive control (MPC) can enhance control performance by correcting for model inaccuracies, enabling more precise state trajectory predictions than traditional MPC. A common approach is to model unknown residual dynamics as a Gaussian process (GP), which leverages data and also provides an estimate of the associated uncertainty. However, the high computational cost of online learning poses a major challenge for real-time GP-MPC applications. This work presents an efficient implementation of an approximate spatio-temporal GP model, offering online learning at constant computational complexity. It is optimized for GP-MPC, where it enables improved control performance by learning more accurate system dynamics online in real-time, even for time-varying systems. The performance of the proposed method is demonstrated by simulations and hardware experiments in the exemplary application of autonomous miniature racing.

23.5LGMay 24, 2024Code

Understanding the differences in Foundation Models: Attention, State Space Models, and Recurrent Neural Networks

Jerome Sieber, Carmen Amo Alonso, Alexandre Didier et al.

Softmax attention is the principle backbone of foundation models for various artificial intelligence applications, yet its quadratic complexity in sequence length can limit its inference throughput in long-context settings. To address this challenge, alternative architectures such as linear attention, State Space Models (SSMs), and Recurrent Neural Networks (RNNs) have been considered as more efficient alternatives. While connections between these approaches exist, such models are commonly developed in isolation and there is a lack of theoretical understanding of the shared principles underpinning these architectures and their subtle differences, greatly influencing performance and scalability. In this paper, we introduce the Dynamical Systems Framework (DSF), which allows a principled investigation of all these architectures in a common representation. Our framework facilitates rigorous comparisons, providing new insights on the distinctive characteristics of each model class. For instance, we compare linear attention and selective SSMs, detailing their differences and conditions under which both are equivalent. We also provide principled comparisons between softmax attention and other model classes, discussing the theoretical conditions under which softmax attention can be approximated. Additionally, we substantiate these new insights with empirical validations and mathematical arguments. This shows the DSF's potential to guide the systematic development of future more efficient and scalable foundation models.

14.2SYMar 25, 2024Code

State Space Models as Foundation Models: A Control Theoretic Overview

Carmen Amo Alonso, Jerome Sieber, Melanie N. Zeilinger

In recent years, there has been a growing interest in integrating linear state-space models (SSM) in deep neural network architectures of foundation models. This is exemplified by the recent success of Mamba, showing better performance than the state-of-the-art Transformer architectures in language tasks. Foundation models, like e.g. GPT-4, aim to encode sequential data into a latent space in order to learn a compressed representation of the data. The same goal has been pursued by control theorists using SSMs to efficiently model dynamical systems. Therefore, SSMs can be naturally connected to deep sequence modeling, offering the opportunity to create synergies between the corresponding research areas. This paper is intended as a gentle introduction to SSM-based architectures for control theorists and summarizes the latest research developments. It provides a systematic review of the most successful SSM proposals and highlights their main features from a control theoretic perspective. Additionally, we present a comparative analysis of these models, evaluating their performance on a standardized benchmark designed for assessing a model's efficiency at learning long sequences.

9.2SYDec 15, 2023Code

Automatic nonlinear MPC approximation with closed-loop guarantees

Abdullah Tokmak, Christian Fiedler, Melanie N. Zeilinger et al.

Safety guarantees are vital in many control applications, such as robotics. Model predictive control (MPC) provides a constructive framework for controlling safety-critical systems, but is limited by its computational complexity. We address this problem by presenting a novel algorithm that automatically computes an explicit approximation to nonlinear MPC schemes while retaining closed-loop guarantees. Specifically, the problem can be reduced to a function approximation problem, which we then tackle by proposing ALKIA-X, the Adaptive and Localized Kernel Interpolation Algorithm with eXtrapolated reproducing kernel Hilbert space norm. ALKIA-X is a non-iterative algorithm that ensures numerically well-conditioned computations, a fast-to-evaluate approximating function, and the guaranteed satisfaction of any desired bound on the approximation error. Hence, ALKIA-X automatically computes an explicit function that approximates the MPC, yielding a controller suitable for safety-critical systems and high sampling rates. We apply ALKIA-X to approximate two nonlinear MPC schemes, demonstrating reduced computational demand and applicability to realistic problems.

10.8SYFeb 9, 2024Code

Safe Guaranteed Exploration for Non-linear Systems

Manish Prajapat, Johannes Köhler, Matteo Turchetta et al.

Safely exploring environments with a-priori unknown constraints is a fundamental challenge that restricts the autonomy of robots. While safety is paramount, guarantees on sufficient exploration are also crucial for ensuring autonomous task completion. To address these challenges, we propose a novel safe guaranteed exploration framework using optimal control, which achieves first-of-its-kind results: guaranteed exploration for non-linear systems with finite time sample complexity bounds, while being provably safe with arbitrarily high probability. The framework is general and applicable to many real-world scenarios with complex non-linear dynamics and unknown domains. We improve the efficiency of this general framework by proposing an algorithm, SageMPC, SAfe Guaranteed Exploration using Model Predictive Control. SageMPC leverages three key techniques: i) exploiting a Lipschitz bound, ii) goal-directed exploration, and iii) receding horizon style re-planning, all while maintaining the desired sample complexity, safety and exploration guarantees of the framework. Lastly, we demonstrate safe efficient exploration in challenging unknown environments using SageMPC with a car model.

7.3SYMay 12, 2025

Finite-Sample-Based Reachability for Safe Control with Gaussian Process Dynamics

Manish Prajapat, Johannes Köhler, Amon Lahr et al.

Gaussian Process (GP) regression is shown to be effective for learning unknown dynamics, enabling efficient and safety-aware control strategies across diverse applications. However, existing GP-based model predictive control (GP-MPC) methods either rely on approximations, thus lacking guarantees, or are overly conservative, which limits their practical utility. To close this gap, we present a sampling-based framework that efficiently propagates the model's epistemic uncertainty while avoiding conservatism. We establish a novel sample complexity result that enables the construction of a reachable set using a finite number of dynamics functions sampled from the GP posterior. Building on this, we design a sampling-based GP-MPC scheme that is recursively feasible and guarantees closed-loop safety and stability with high probability. Finally, we showcase the effectiveness of our method on two numerical examples, highlighting accurate reachable set over-approximation and safe closed-loop performance.

14.4LGMar 10, 2025Code

Performance-driven Constrained Optimal Auto-Tuner for MPC

Albert Gassol Puigjaner, Manish Prajapat, Andrea Carron et al.

A key challenge in tuning Model Predictive Control (MPC) cost function parameters is to ensure that the system performance stays consistently above a certain threshold. To address this challenge, we propose a novel method, COAT-MPC, Constrained Optimal Auto-Tuner for MPC. With every tuning iteration, COAT-MPC gathers performance data and learns by updating its posterior belief. It explores the tuning parameters' domain towards optimistic parameters in a goal-directed fashion, which is key to its sample efficiency. We theoretically analyze COAT-MPC, showing that it satisfies performance constraints with arbitrarily high probability at all times and provably converges to the optimum performance within finite time. Through comprehensive simulations and comparative analyses with a hardware platform, we demonstrate the effectiveness of COAT-MPC in comparison to classical Bayesian Optimization (BO) and other state-of-the-art methods. When applied to autonomous racing, our approach outperforms baselines in terms of constraint violations and cumulative regret over time.

9.2LGApr 8, 2024

Stochastic Online Optimization for Cyber-Physical and Robotic Systems

Hao Ma, Melanie Zeilinger, Michael Muehlebach

We propose a novel gradient-based online optimization framework for solving stochastic programming problems that frequently arise in the context of cyber-physical and robotic systems. Our problem formulation accommodates constraints that model the evolution of a cyber-physical system, which has, in general, a continuous state and action space, is nonlinear, and where the state is only partially observed. We also incorporate an approximate model of the dynamics as prior knowledge into the learning process and show that even rough estimates of the dynamics can significantly improve the convergence of our algorithms. Our online optimization framework encompasses both gradient descent and quasi-Newton methods, and we provide a unified convergence analysis of our algorithms in a non-convex setting. We also characterize the impact of modeling errors in the system dynamics on the convergence rate of the algorithms. Finally, we evaluate our algorithms in simulations of a flexible beam, a four-legged walking robot, and in real-world experiments with a ping-pong playing robot.

9.4LGSep 29, 2025

Physics-informed learning under mixing: How physical knowledge speeds up learning

Anna Scampicchio, Leonardo F. Toso, Rahel Rickenbach et al.

A major challenge in physics-informed machine learning is to understand how the incorporation of prior domain knowledge affects learning rates when data are dependent. Focusing on empirical risk minimization with physics-informed regularization, we derive complexity-dependent bounds on the excess risk in probability and in expectation. We prove that, when the physical prior information is aligned, the learning rate improves from the (slow) Sobolev minimax rate to the (fast) optimal i.i.d. one without any sample-size deflation due to data dependence.

7.9LGOct 14, 2024

Lambda-Skip Connections: the architectural component that prevents Rank Collapse

Federico Arangath Joseph, Jerome Sieber, Melanie N. Zeilinger et al.

Rank collapse, a phenomenon where embedding vectors in sequence models rapidly converge to a uniform token or equilibrium state, has recently gained attention in the deep learning literature. This phenomenon leads to reduced expressivity and potential training instabilities due to vanishing gradients. Empirical evidence suggests that architectural components like skip connections, LayerNorm, and MultiLayer Perceptrons (MLPs) play critical roles in mitigating rank collapse. While this issue is well-documented for transformers, alternative sequence models, such as State Space Models (SSMs), which have recently gained prominence, have not been thoroughly examined for similar vulnerabilities. This paper extends the theory of rank collapse from transformers to SSMs using a unifying framework that captures both architectures. We study how a parametrized version of the classic skip connection component, which we call \emph{lambda-skip connections}, provides guarantees for rank collapse prevention. Through analytical results, we present a sufficient condition to guarantee prevention of rank collapse across all the aforementioned architectures. We also study the necessity of this condition via ablation studies and analytical examples. To our knowledge, this is the first study that provides a general guarantee to prevent rank collapse, and that investigates rank collapse in the context of SSMs, offering valuable understanding for both theoreticians and practitioners. Finally, we validate our findings with experiments demonstrating the crucial role of architectural components such as skip connections and gating mechanisms in preventing rank collapse.

2.7LGFeb 1

SALAAD: Sparse And Low-Rank Adaptation via ADMM

Hao Ma, Melis Ilayda Bal, Liang Zhang et al.

Modern large language models are increasingly deployed under compute and memory constraints, making flexible control of model capacity a central challenge. While sparse and low-rank structures naturally trade off capacity and performance, existing approaches often rely on heuristic designs that ignore layer and matrix heterogeneity or require model-specific architectural modifications. We propose SALAAD, a plug-and-play framework applicable to different model architectures that induces sparse and low-rank structures during training. By formulating structured weight learning under an augmented Lagrangian framework and introducing an adaptive controller that dynamically balances the training loss and structural constraints, SALAAD preserves the stability of standard training dynamics while enabling explicit control over the evolution of effective model capacity during training. Experiments across model scales show that SALAAD substantially reduces memory consumption during deployment while achieving performance comparable to ad-hoc methods. Moreover, a single training run yields a continuous spectrum of model capacities, enabling smooth and elastic deployment across diverse memory budgets without the need for retraining.

4.1LGOct 10, 2025

Design Principles for Sequence Models via Coefficient Dynamics

Jerome Sieber, Antonio Orvieto, Melanie N. Zeilinger et al.

Deep sequence models, ranging from Transformers and State Space Models (SSMs) to more recent approaches such as gated linear RNNs, fundamentally compute outputs as linear combinations of past value vectors. To draw insights and systematically compare such architectures, we develop a unified framework that makes this output operation explicit, by casting the linear combination coefficients as the outputs of autonomous linear dynamical systems driven by impulse inputs. This viewpoint, in spirit substantially different from approaches focusing on connecting linear RNNs with linear attention, reveals a common mathematical theme across diverse architectures and crucially captures softmax attention, on top of RNNs, SSMs, and related models. In contrast to new model proposals that are commonly evaluated on benchmarks, we derive design principles linking architectural choices to model properties. Thereby identifying tradeoffs between expressivity and efficient implementation, geometric constraints on input selectivity, and stability conditions for numerically stable training and information retention. By connecting several insights and observations from recent literature, the framework both explains empirical successes of recent designs and provides guiding principles for systematically designing new sequence model architectures.

4.1LGOct 10, 2025

Task-Level Insights from Eigenvalues across Sequence Models

Rahel Rickenbach, Jelena Trisovic, Alexandre Didier et al.

Although softmax attention drives state-of-the-art performance for sequence models, its quadratic complexity limits scalability, motivating linear alternatives such as state space models (SSMs). While these alternatives improve efficiency, their fundamental differences in information processing remain poorly understood. In this work, we leverage the recently proposed dynamical systems framework to represent softmax, norm and linear attention as dynamical systems, enabling a structured comparison with SSMs by analyzing their respective eigenvalue spectra. Since eigenvalues capture essential aspects of dynamical system behavior, we conduct an extensive empirical analysis across diverse sequence models and benchmarks. We first show that eigenvalues influence essential aspects of memory and long-range dependency modeling, revealing spectral signatures that align with task requirements. Building on these insights, we then investigate how architectural modifications in sequence models impact both eigenvalue spectra and task performance. This correspondence further strengthens the position of eigenvalue analysis as a principled metric for interpreting, understanding, and ultimately improving the capabilities of sequence models.

4.3SYSep 20, 2025

Safe Guaranteed Dynamics Exploration with Probabilistic Models

Manish Prajapat, Johannes Köhler, Melanie N. Zeilinger et al.

Ensuring both optimality and safety is critical for the real-world deployment of agents, but becomes particularly challenging when the system dynamics are unknown. To address this problem, we introduce a notion of maximum safe dynamics learning via sufficient exploration in the space of safe policies. We propose a $\textit{pessimistically}$ safe framework that $\textit{optimistically}$ explores informative states and, despite not reaching them due to model uncertainty, ensures continuous online learning of dynamics. The framework achieves first-of-its-kind results: learning the dynamics model sufficiently $-$ up to an arbitrary small tolerance (subject to noise) $-$ in a finite time, while ensuring provably safe operation throughout with high probability and without requiring resets. Building on this, we propose an algorithm to maximize rewards while learning the dynamics $\textit{only to the extent needed}$ to achieve close-to-optimal performance. Unlike typical reinforcement learning (RL) methods, our approach operates online in a non-episodic setting and ensures safety throughout the learning process. We demonstrate the effectiveness of our approach in challenging domains such as autonomous car racing and drone navigation under aerodynamic effects $-$ scenarios where safety is critical and accurate modeling is difficult.

7.5LGOct 15, 2021

On-Policy Model Errors in Reinforcement Learning

Lukas P. Fröhlich, Maksym Lefarov, Melanie N. Zeilinger et al.

Model-free reinforcement learning algorithms can compute policy gradients given sampled environment transitions, but require large amounts of data. In contrast, model-based methods can use the learned model to generate new data, but model errors and bias can render learning unstable or suboptimal. In this paper, we present a novel method that combines real-world data and a learned model in order to get the best of both worlds. The core idea is to exploit the real-world data for on-policy predictions and use the learned model only to generalize to different actions. Specifically, we use the data as time-dependent on-policy correction terms on top of a learned model, to retain the ability to generate data without accumulating errors over long prediction horizons. We motivate this method theoretically and show that it counteracts an error term for model-based policy improvement. Experiments on MuJoCo- and PyBullet-benchmarks show that our method can drastically improve existing model-based approaches without introducing additional tuning parameters.

8.9ROOct 6, 2021

Contextual Tuning of Model Predictive Control for Autonomous Racing

Lukas P. Fröhlich, Christian Küttel, Elena Arcari et al.

Learning-based model predictive control has been widely applied in autonomous racing to improve the closed-loop behaviour of vehicles in a data-driven manner. When environmental conditions change, e.g., due to rain, often only the predictive model is adapted, but the controller parameters are kept constant. However, this can lead to suboptimal behaviour. In this paper, we address the problem of data-efficient controller tuning, adapting both the model and objective simultaneously. The key novelty of the proposed approach is that we leverage a learned dynamics model to encode the environmental condition as a so-called context. This insight allows us to employ contextual Bayesian optimization to efficiently transfer knowledge across different environmental conditions. Consequently, we require fewer data to find the optimal controller configuration for each context. The proposed framework is extensively evaluated with more than 3'000 laps driven on an experimental platform with 1:28 scale RC race cars. The results show that our approach successfully optimizes the lap time across different contexts requiring fewer data compared to other approaches based on standard Bayesian optimization.

5.3ROMar 8, 2021

Design, Optimal Guidance and Control of a Low-cost Re-usable Electric Model Rocket

Lukas Spannagl, Elias Hampp, Andrea Carron et al.

In the last decade, autonomous vertical take-off and landing (VTOL) vehicles have become increasingly important as they lower mission costs thanks to their re-usability. However, their development is complex, rendering even the basic experimental validation of the required advanced guidance and control (G & C) algorithms prohibitively time-consuming and costly. In this paper, we present the design of an inexpensive small-scale VTOL platform that can be built from off-the-shelf components for less than 1000 USD. The vehicle design mimics the first stage of a reusable launcher, making it a perfect test-bed for G & C algorithms. To control the vehicle during ascent and descent, we propose a real-time optimization-based G & C algorithm. The key features are a real-time minimum fuel and free-final-time optimal guidance combined with an offset-free tracking model predictive position controller. The vehicle hardware design and the G & C algorithm are experimentally validated both indoors and outdoor, showing reliable operation in a fully autonomous fashion with all computations done on-board and in real-time.

8.4LGMar 2, 2021

Data-driven control of room temperature and bidirectional EV charging using deep reinforcement learning: simulations and experiments

B. Svetozarevic, C. Baumann, S. Muntwiler et al.

This work presents a fully data-driven, black-box pipeline to obtain an optimal control policy for a multi-loop building control problem based on historical building and weather data, thus without the need for complex physics-based modelling. We demonstrate the method for joint control of room temperature and bidirectional EV charging to maximize the occupant thermal comfort and energy savings while leaving enough energy in the EV battery for the next trip. We modelled the room temperature with a recurrent neural network and EV charging with a piece-wise linear function. Using these models as a simulation environment, we applied a deep reinforcement learning (DRL) algorithm to obtain an optimal control policy. The learnt policy achieves on average 17% energy savings over the heating season and 19% better comfort satisfaction than a standard RB room temperature controller. When a bidirectional EV is additionally connected and a two-tariff electricity pricing is applied, the MIMO DRL policy successfully leverages the battery and decreases the overall cost of electricity compared to two standard RB controllers, one controlling the room temperature and another controlling the bidirectional EV (dis-)charging. Finally, we demonstrate a successful transfer of the learnt DRL policy from simulation onto a real building, the DFAB HOUSE at Empa Duebendorf in Switzerland, achieving up to 30% energy savings while maintaining similar comfort levels compared to a conventional RB room temperature controller over three weeks during the heating season.

14.4RONov 18, 2020Code

Cautious Bayesian Optimization for Efficient and Scalable Policy Search

Lukas P. Fröhlich, Melanie N. Zeilinger, Edgar D. Klenske

Sample efficiency is one of the key factors when applying policy search to real-world problems. In recent years, Bayesian Optimization (BO) has become prominent in the field of robotics due to its sample efficiency and little prior knowledge needed. However, one drawback of BO is its poor performance on high-dimensional search spaces as it focuses on global search. In the policy search setting, local optimization is typically sufficient as initial policies are often available, e.g., via meta-learning, kinesthetic demonstrations or sim-to-real approaches. In this paper, we propose to constrain the policy search space to a sublevel-set of the Bayesian surrogate model's predictive uncertainty. This simple yet effective way of constraining the policy update enables BO to scale to high-dimensional spaces (>100) as well as reduces the risk of damaging the system. We demonstrate the effectiveness of our approach on a wide range of problems, including a motor skills task, adapting deep RL agents to new reward signals and a sim-to-real task for an inverted pendulum system.

6.6SYAug 13, 2020

Meta Learning MPC using Finite-Dimensional Gaussian Process Approximations

Elena Arcari, Andrea Carron, Melanie N. Zeilinger

Data availability has dramatically increased in recent years, driving model-based control methods to exploit learning techniques for improving the system description, and thus control performance. Two key factors that hinder the practical applicability of learning methods in control are their high computational complexity and limited generalization capabilities to unseen conditions. Meta-learning is a powerful tool that enables efficient learning across a finite set of related tasks, easing adaptation to new unseen tasks. This paper makes use of a meta-learning approach for adaptive model predictive control, by learning a system model that leverages data from previous related tasks, while enabling fast fine-tuning to the current task during closed-loop operation. The dynamics is modeled via Gaussian process regression and, building on the Karhunen-Lo{è}ve expansion, can be approximately reformulated as a finite linear combination of kernel eigenfunctions. Using data collected over a set of tasks, the eigenfunction hyperparameters are optimized in a meta-training phase by maximizing a variational bound for the log-marginal likelihood. During meta-testing, the eigenfunctions are fixed, so that only the linear parameters are adapted to the new unseen task in an online adaptive fashion via Bayesian linear regression, providing a simple and efficient inference scheme. Simulation results are provided for autonomous racing with miniature race cars adapting to unseen road conditions.

9.7SYMay 6, 2020

Maximum Likelihood Methods for Inverse Learning of Optimal Controllers

Marcel Menner, Melanie N. Zeilinger

This paper presents a framework for inverse learning of objective functions for constrained optimal control problems, which is based on the Karush-Kuhn-Tucker (KKT) conditions. We discuss three variants corresponding to different model assumptions and computational complexities. The first method uses a convex relaxation of the KKT conditions and serves as the benchmark. The main contribution of this paper is the proposition of two learning methods that combine the KKT conditions with maximum likelihood estimation. The key benefit of this combination is the systematic treatment of constraints for learning from noisy data with a branch-and-bound algorithm using likelihood arguments. This paper discusses theoretic properties of the learning methods and presents simulation results that highlight the advantages of using the maximum likelihood formulation for learning objective functions.

17.3MLFeb 7, 2020Code

Noisy-Input Entropy Search for Efficient Robust Bayesian Optimization

Lukas P. Fröhlich, Edgar D. Klenske, Julia Vinogradska et al.

We consider the problem of robust optimization within the well-established Bayesian optimization (BO) framework. While BO is intrinsically robust to noisy evaluations of the objective function, standard approaches do not consider the case of uncertainty about the input parameters. In this paper, we propose Noisy-Input Entropy Search (NES), a novel information-theoretic acquisition function that is designed to find robust optima for problems with both input and measurement noise. NES is based on the key insight that the robust objective in many cases can be modeled as a Gaussian process, however, it cannot be observed directly. We evaluate NES on several benchmark problems from the optimization literature and from engineering. The results show that NES reliably finds robust optima, outperforming existing methods from the literature on all benchmarks.

4.3SYJan 21, 2020

Bayesian Optimization for Policy Search in High-Dimensional Systems via Automatic Domain Selection

Lukas P. Fröhlich, Edgar D. Klenske, Christian G. Daniel et al.

Bayesian Optimization (BO) is an effective method for optimizing expensive-to-evaluate black-box functions with a wide range of applications for example in robotics, system design and parameter optimization. However, scaling BO to problems with large input dimensions (>10) remains an open challenge. In this paper, we propose to leverage results from optimal control to scale BO to higher dimensional control tasks and to reduce the need for manually selecting the optimization domain. The contributions of this paper are twofold: 1) We show how we can make use of a learned dynamics model in combination with a model-based controller to simplify the BO problem by focusing onto the most relevant regions of the optimization domain. 2) Based on (1) we present a method to find an embedding in parameter space that reduces the effective dimensionality of the optimization problem. To evaluate the effectiveness of the proposed approach, we present an experimental evaluation on real hardware, as well as simulated tasks including a 48-dimensional policy for a quadcopter.

12.8LGDec 23, 2019

On Simulation and Trajectory Prediction with Gaussian Process Dynamics

Lukas Hewing, Elena Arcari, Lukas P. Fröhlich et al.

Established techniques for simulation and prediction with Gaussian process (GP) dynamics often implicitly make use of an independence assumption on successive function evaluations of the dynamics model. This can result in significant error and underestimation of the prediction uncertainty, potentially leading to failures in safety-critical applications. This paper discusses methods that explicitly take the correlation of successive function evaluations into account. We first describe two sampling-based techniques; one approach provides samples of the true trajectory distribution, suitable for `ground truth' simulations, while the other draws function samples from basis function approximations of the GP. Second, we propose a linearization-based technique that directly provides approximations of the trajectory distribution, taking correlations explicitly into account. We demonstrate the procedures in simple numerical examples, contrasting the results with established methods.

1.9ROJun 24, 2019

Using Human Ratings for Feedback Control: A Supervised Learning Approach with Application to Rehabilitation Robotics

Marcel Menner, Lukas Neuner, Lars Lünenburger et al.

This paper presents a method for tailoring a parametric controller based on human ratings. The method leverages supervised learning concepts in order to train a reward model from data. It is applied to a gait rehabilitation robot with the goal of teaching the robot how to walk patients physiologically. In this context, the reward model judges the physiology of the gait cycle (instead of therapists) using sensor measurements provided by the robot and the automatic feedback controller chooses the input settings of the robot to maximize the reward. The key advantage of the proposed method is that only a few input adaptations are necessary to achieve a physiological gait cycle. Experiments with nondisabled subjects show that the proposed method permits the incorporation of human expertise into a control law and to automatically walk patients physiologically.

2.3SYApr 8, 2019

Linear model predictive safety certification for learning-based control

Kim P. Wabersich, Melanie N. Zeilinger

While it has been repeatedly shown that learning-based controllers can provide superior performance, they often lack of safety guarantees. This paper aims at addressing this problem by introducing a model predictive safety certification (MPSC) scheme for polytopic linear systems with additive disturbances. The scheme verifies safety of a proposed learning-based input and modifies it as little as necessary in order to keep the system within a given set of constraints. Safety is thereby related to the existence of a model predictive controller (MPC) providing a feasible trajectory towards a safe target set. A robust MPC formulation accounts for the fact that the model is generally uncertain in the context of learning, which allows proving constraint satisfaction at all times under the proposed MPSC strategy. The MPSC scheme can be used in order to expand any potentially conservative set of safe states for learning and we prove an iterative technique for enlarging the safe set. Finally, a practical data-based design procedure for MPSC is proposed using scenario optimization.

28.0SYDec 13, 2018

A predictive safety filter for learning-based control of constrained nonlinear dynamical systems

Kim P. Wabersich, Melanie N. Zeilinger

The transfer of reinforcement learning (RL) techniques into real-world applications is challenged by safety requirements in the presence of physical limitations. Most RL methods, in particular the most popular algorithms, do not support explicit consideration of state and input constraints. In this paper, we address this problem for nonlinear systems with continuous state and input spaces by introducing a predictive safety filter, which is able to turn a constrained dynamical system into an unconstrained safe system and to which any RL algorithm can be applied `out-of-the-box'. The predictive safety filter receives the proposed control input and decides, based on the current system state, if it can be safely applied to the real system, or if it has to be modified otherwise. Safety is thereby established by a continuously updated safety policy, which is based on a model predictive control formulation using a data-driven system model and considering state and input dependent uncertainties.

21.6SYMar 22, 2018

Linear model predictive safety certification for learning-based control

Kim P. Wabersich, Melanie N. Zeilinger

9.2SYNov 30, 2017

Scalable synthesis of safety certificates from data with application to learning-based control

Kim P. Wabersich, Melanie N. Zeilinger

The control of complex systems faces a trade-off between high performance and safety guarantees, which in particular restricts the application of learning-based methods to safety-critical systems. A recently proposed framework to address this issue is the use of a safety controller, which guarantees to keep the system within a safe region of the state space. This paper introduces efficient techniques for the synthesis of a safe set and control law, which offer improved scalability properties by relying on approximations based on convex optimization problems. The first proposed method requires only an approximate linear system model and Lipschitz continuity of the unknown nonlinear dynamics. The second method extends the results by showing how a Gaussian process prior on the unknown system dynamics can be used in order to reduce conservatism of the resulting safe set. We demonstrate the results with numerical examples, including an autonomous convoy of vehicles.

37.5ROMay 3, 2017

A General Safety Framework for Learning-Based Control in Uncertain Robotic Systems

Jaime F. Fisac, Anayo K. Akametalu, Melanie N. Zeilinger et al.

The proven efficacy of learning-based control schemes strongly motivates their application to robotic systems operating in the physical world. However, guaranteeing correct operation during the learning process is currently an unresolved issue, which is of vital importance in safety-critical systems. We propose a general safety framework based on Hamilton-Jacobi reachability methods that can work in conjunction with an arbitrary learning algorithm. The method exploits approximate knowledge of the system dynamics to guarantee constraint satisfaction while minimally interfering with the learning process. We further introduce a Bayesian mechanism that refines the safety analysis as the system acquires new evidence, reducing initial conservativeness when appropriate while strengthening guarantees through real-time validation. The result is a least-restrictive, safety-preserving control law that intervenes only when (a) the computed safety guarantees require it, or (b) confidence in the computed guarantees decays in light of new observations. We prove theoretical safety guarantees combining probabilistic and worst-case analysis and demonstrate the proposed framework experimentally on a quadrotor vehicle. Even though safety analysis is based on a simple point-mass model, the quadrotor successfully arrives at a suitable controller by policy-gradient reinforcement learning without ever crashing, and safely retracts away from a strong external disturbance introduced during flight.

1.2SYApr 9, 2015

Quantization Design for Distributed Optimization

Ye Pu, Melanie N. Zeilinger, Colin N. Jones

We consider the problem of solving a distributed optimization problem using a distributed computing platform, where the communication in the network is limited: each node can only communicate with its neighbours and the channel has a limited data-rate. A common technique to address the latter limitation is to apply quantization to the exchanged information. We propose two distributed optimization algorithms with an iteratively refining quantization design based on the inexact proximal gradient method and its accelerated variant. We show that if the parameters of the quantizers, i.e. the number of bits and the initial quantization intervals, satisfy certain conditions, then the quantization error is bounded by a linearly decreasing function and the convergence of the distributed algorithms is guaranteed. Furthermore, we prove that after imposing the quantization scheme, the distributed algorithms still exhibit a linear convergence rate, and show complexity upper-bounds on the number of iterations to achieve a given accuracy. Finally, we demonstrate the performance of the proposed algorithms and the theoretical findings for solving a distributed optimal control problem.