Florian Dörfler

LG
h-index45
45papers
207citations
Novelty56%
AI Score56

45 Papers

OCMay 24
Gray-Box Nonlinear Feedback Optimization

Zhiyu He, Saverio Bolognani, Michael Muehlebach et al.

Feedback optimization enables autonomous optimality seeking of a dynamical system through its closed-loop interconnection with iterative optimization algorithms. Among various iteration structures, model-based approaches require the input-output sensitivity matrix of the system to construct gradients, whereas model-free approaches eliminate this need by estimating gradients from real-time objective evaluations. These approaches offer complementary benefits in sample efficiency and accuracy against model mismatch, i.e., sensitivity errors. To achieve balanced closed-loop performance, we propose a gray-box feedback optimization controller, featuring systematic incorporation of approximate sensitivities into model-free updates via a tunable convex combination. We provide unified performance characterizations covering different approaches. We elucidate how cumulative sensitivity errors (model-based) and variances due to stochastic exploration (model-free) shape the closed-loop behavior and induce a trade-off between iteration and dimensional dependence. The proposed controller retains sample efficiency and provable (local) optimality for nonconvex problems despite inaccurate sensitivities. We further develop and characterize a running gray-box controller that handles constrained time-varying problems with changing objectives and steady-state input-output maps.

OCFeb 21, 2013
Synchronization and Power Sharing for Droop-Controlled Inverters in Islanded Microgrids

John W. Simpson-Porco, Florian Dörfler, Francesco Bullo

Motivated by the recent and growing interest in smart grid technology, we study the operation of DC/AC inverters in an inductive microgrid. We show that a network of loads and DC/AC inverters equipped with power-frequency droop controllers can be cast as a Kuramoto model of phase-coupled oscillators. This novel description, together with results from the theory of coupled oscillators, allows us to characterize the behavior of the network of inverters and loads. Specifically, we provide a necessary and sufficient condition for the existence of a synchronized solution that is unique and locally exponentially stable. We present a selection of controller gains leading to a desirable sharing of power among the inverters, and specify the set of loads which can be serviced without violating given actuation constraints. Moreover, we propose a distributed integral controller based on averaging algorithms which dynamically regulates the system frequency in the presence of a time-varying load. Remarkably, this distributed-averaging integral controller has the additional property that it maintains the power sharing properties of the primary droop controller. Our results hold without assumptions on identical line characteristics or voltage magnitudes.

OCMar 14, 2011
Cyber-Physical Attacks in Power Networks: Models, Fundamental Limitations and Monitor Design

Fabio Pasqualetti, Florian Dörfler, Francesco Bullo

Future power networks will be characterized by safe and reliable functionality against physical malfunctions and cyber attacks. This paper proposes a unified framework and advanced monitoring procedures to detect and identify network components malfunction or measurements corruption caused by an omniscient adversary. We model a power system under cyber-physical attack as a linear time-invariant descriptor system with unknown inputs. Our attack model generalizes the prototypical stealth, (dynamic) false-data injection and replay attacks. We characterize the fundamental limitations of both static and dynamic procedures for attack detection and identification. Additionally, we design provably-correct (dynamic) detection and identification procedures based on tools from geometric control theory. Finally, we illustrate the effectiveness of our method through a comparison with existing (static) detection algorithms, and through a numerical study.

SYMay 21
Quantifying Grid-Forming Behavior: Bridging Device-level Dynamics and System-Level Stability

Kehao Zhuang, Huanhai Xin, Verena Häberle et al.

Grid-forming (GFM) technology is widely regarded as a promising solution for future power systems dominated by power electronics. However, a universally accepted definition of GFM behavior and precise method for its quantification remain elusive. Moreover, the impact of GFM converter on system stability is not precisely quantified, creating a significant disconnect between device and system levels. To address these gaps from a small-signal perspective, at the device level, the paper introduces a novel metric, the Forming Index (FI) to quantify a converter's response to grid voltage fluctuations. Rather than enumerating various control architectures, the FI provides a metric for the converter's GFM ability by quantifying its sensitivity to grid variations. At the system level, a new quantitative measure of system strength that captures the multi-bus voltage stiffness is proposed, which quantifies the voltage and phase angle responses of multiple buses to current or power disturbances. The paper further extends and defines this concept to grid strength and bus strength to identify weak areas within the system. Finally, the device and system levels are bridged by formally proving that GFM converters enhance system strength. The proposed framework provides a unified benchmark for GFM converter design, optimal placement, and system stability assessment.

OCMar 10, 2012
Attack Detection and Identification in Cyber-Physical Systems -- Part I: Models and Fundamental Limitations

Fabio Pasqualetti, Florian Dörfler, Francesco Bullo

Cyber-physical systems integrate computation, communication, and physical capabilities to interact with the physical world and humans. Besides failures of components, cyber-physical systems are prone to malignant attacks, and specific analysis tools as well as monitoring mechanisms need to be developed to enforce system security and reliability. This paper proposes a unified framework to analyze the resilience of cyber-physical systems against attacks cast by an omniscient adversary. We model cyber-physical systems as linear descriptor systems, and attacks as exogenous unknown inputs. Despite its simplicity, our model captures various real-world cyber-physical systems, and it includes and generalizes many prototypical attacks, including stealth, (dynamic) false-data injection and replay attacks. First, we characterize fundamental limitations of static, dynamic, and active monitors for attack detection and identification. Second, we provide constructive algebraic conditions to cast undetectable and unidentifiable attacks. Third, by using the system interconnection structure, we describe graph-theoretic conditions for the existence of undetectable and unidentifiable attacks. Finally, we validate our findings through some illustrative examples with different cyber-physical systems, such as a municipal water supply network and two electrical power grids.

OCFeb 27, 2012
Attack Detection and Identification in Cyber-Physical Systems -- Part II: Centralized and Distributed Monitor Design

Fabio Pasqualetti, Florian Dörfler, Francesco Bullo

Cyber-physical systems integrate computation, communication, and physical capabilities to interact with the physical world and humans. Besides failures of components, cyber-physical systems are prone to malicious attacks so that specific analysis tools and monitoring mechanisms need to be developed to enforce system security and reliability. This paper builds upon the results presented in our companion paper [1] and proposes centralized and distributed monitors for attack detection and identification. First, we design optimal centralized attack detection and identification monitors. Optimality refers to the ability of detecting (respectively identifying) every detectable (respectively identifiable) attack. Second, we design an optimal distributed attack detection filter based upon a waveform relaxation technique. Third, we show that the attack identification problem is computationally hard, and we design a sub-optimal distributed attack identification procedure with performance guarantees. Finally, we illustrate the robustness of our monitors to system noise and unmodeled dynamics through a simulation study.

SYMay 21
Quantifying Grid-Forming Behavior: Bridging Device-Level Dynamics and System-Level Strength

Kehao Zhuang, Huanhai Xin, Verena Häberle et al.

Grid-forming (GFM) technology is widely regarded as a promising solution for future power systems dominated by power electronics. However, a precise method for quantifying GFM converter behavior and a universally accepted GFM definition remain elusive. Moreover, the impact of GFM on system stability is not precisely quantified, creating a significant disconnect between device and system levels. To address these gaps from a small-signal perspective, at the device level, we introduce a novel metric, the Forming Index (FI) to quantify a converter's response to grid voltage fluctuations. Rather than enumerating various control architectures, the FI provides a metric for the converter's GFM ability by quantifying its sensitivity to grid variations. At the system level, we propose a new quantitative measure of system strength that captures the multi-bus voltage stiffness, which quantifies the voltage and phase angle responses of multiple buses to current or power disturbances. We further extend and define this concept to grid strength and bus strength to identify weak areas within the system. Finally, we bridge the device and system levels by formally proving that GFM converters enhance system strength. Our proposed framework provides a unified benchmark for GFM converter design, optimal placement, and system stability assessment.

SYNov 25, 2019
Impacts of Grid Structure on PLL-Synchronization Stability of Converter-Integrated Power Systems

Linbin Huang, Huanhai Xin, Wei Dong et al.

Small-signal instability of grid-connected power converters may arise when the converters use a phase-locked loop (PLL) to synchronize with a weak grid. Commonly, this stability problem (referred as PLL-synchronization stability in this paper) was studied by employing a single-converter system connected to an infinite bus, which however, omits the impacts of power grid structure and the interactions among multiple converters. Motivated by this, we investigate how the grid structure affects PLL-synchronization stability of multi-converter systems. By using Kron reduction to eliminate the interior nodes, an equivalent reduced network is obtained which contains only the converter nodes. We explicitly show how the Kron-reduced multi-converter system can be decoupled into its modes. This modal representation allows us to demonstrate that the smallest eigenvalue of the grounded Laplacian matrix of the Kron-reduced network dominates the stability margin. We also carry out a sensitivity analysis of this smallest eigenvalue to explore how a perturbation in the original network affects the stability margin. On this basis, we provide guidelines on how to improve the PLL-synchronization stability of multi-converter systems by PLL-retuning, proper placement of converters or enhancing some weak connection in the network. Finally, we validate our findings with simulation results based on a 39-bus test system.

OCApr 24, 2023
Designing Optimal Personalized Incentive for Traffic Routing using BIG Hype algorithm

Panagiotis D. Grontas, Carlo Cenedese, Marta Fochesato et al.

We study the problem of optimally routing plug-in electric and conventional fuel vehicles on a city level. In our model, commuters selfishly aim to minimize a local cost that combines travel time, from a fixed origin to a desired destination, and the monetary cost of using city facilities, parking or service stations. The traffic authority can influence the commuters' preferred routing choice by means of personalized discounts on parking tickets and on the energy price at service stations. We formalize the problem of designing these monetary incentives optimally as a large-scale bilevel game, where constraints arise at both levels due to the finite capacities of city facilities and incentives budget. Then, we develop an efficient decentralized solution scheme with convergence guarantees based on BIG Hype, a recently-proposed hypergradient-based algorithm for hierarchical games. Finally, we validate our model via numerical simulations over the Anaheim's network, and show that the proposed approach produces sensible results in terms of traffic decongestion and it is able to solve in minutes problems with more than 48000 variables and 110000 constraints.

SYNov 14, 2022
Follow the Clairvoyant: an Imitation Learning Approach to Optimal Control

Andrea Martin, Luca Furieri, Florian Dörfler et al.

We consider control of dynamical systems through the lens of competitive analysis. Most prior work in this area focuses on minimizing regret, that is, the loss relative to an ideal clairvoyant policy that has noncausal access to past, present, and future disturbances. Motivated by the observation that the optimal cost only provides coarse information about the ideal closed-loop behavior, we instead propose directly minimizing the tracking error relative to the optimal trajectories in hindsight, i.e., imitating the clairvoyant policy. By embracing a system level perspective, we present an efficient optimization-based approach for computing follow-the-clairvoyant (FTC) safe controllers. We prove that these attain minimal regret if no constraints are imposed on the noncausal benchmark. In addition, we present numerical experiments to show that our policy retains the hallmark of competitive algorithms of interpolating between classical $\mathcal{H}_2$ and $\mathcal{H}_\infty$ control laws - while consistently outperforming regret minimization methods in constrained scenarios thanks to the superior ability to chase the clairvoyant.

OCMar 7, 2023
Nash Equilibria, Regularization and Computation in Optimal Transport-Based Distributionally Robust Optimization

Soroosh Shafiee, Liviu Aolaritei, Florian Dörfler et al.

We study optimal transport-based distributionally robust optimization problems where a fictitious adversary, often envisioned as nature, can choose the distribution of the uncertain problem parameters by reshaping a prescribed reference distribution at a finite transportation cost. In this framework, we show that robustification is intimately related to various forms of variation and Lipschitz regularization even if the transportation cost function fails to be (some power of) a metric. We also derive conditions for the existence and the computability of a Nash equilibrium between the decision-maker and nature, and we demonstrate numerically that nature's Nash strategy can be viewed as a distribution that is supported on remarkably deceptive adversarial samples. Finally, we identify practically relevant classes of optimal transport-based distributionally robust optimization problems that can be addressed with efficient gradient descent algorithms even if the loss function or the transportation cost function are nonconvex (but not both at the same time).

SYApr 13
A Data-Driven Optimal Control Architecture for Grid-Connected Power Converters

Ruohan Leng, Linbin Huang, Huanhai Xin et al.

Grid-connected power converters are ubiquitous in modern power systems, acting as grid interfaces of renewable energy sources, energy storage systems, electric vehicles, high-voltage DC systems, etc. Conventionally, power converters use multiple PID regulators to achieve different control objectives such as grid synchronization and voltage/power regulation, where the PID parameters are usually tuned based on a presumed (and often overly-simplified) power grid model. However, this may lead to inferior performance or even instabilities in practice, as the real power grid is highly complex, variable, and generally unknown. To tackle this problem, we employ a data-enabled predictive control (DeePC) to perform data-driven, optimal, robust, and adaptive control for power converters. We call the converters that are operated in this way DeePConverters. A DeePConverter can implicitly perceive the characteristics of the power grid from measured data and adjust its control strategy to achieve optimal, robust, and adaptive performance. We present the modular configurations, generalized structure, control behavior specification, inherent robustness, detailed implementation, computational aspects, and online adaptation of DeePConverters. High-fidelity simulations and hardware-in-the-loop (HIL) tests are provided to validate the effectiveness of DeePConverters.

SYApr 15
Stability of Certainty-Equivalent Adaptive LQR for Linear Systems with Unknown Time-Varying Parameters

Marcell Bartos, Johannes Köhler, Florian Dörfler et al.

Standard model-based control design deteriorates when the system dynamics change during operation. To overcome this challenge, online and adaptive methods have been proposed in the literature. In this work, we consider the class of discrete-time linear systems with unknown time-varying parameters. We propose a simple, modular, and computationally tractable approach by combining two classical and well-known building blocks from estimation and control: the least mean square filter and the certainty-equivalent linear quadratic regulator. Despite both building blocks being simple and off-the-shelf, our analysis shows that they can be seamlessly combined to a powerful pipeline with stability guarantees. Namely, finite-gain $\ell^2$-stability of the closed-loop interconnection of the unknown system, the parameter estimator, and the controller is proven, despite the presence of unknown disturbances and time-varying parametric uncertainties. Real-world applicability of the proposed algorithm is showcased by simulations carried out on a nonlinear planar quadrotor.

SYApr 14
System-Theoretic Analysis of Dynamic Generalized Nash Equilibria -- Turnpikes and Dissipativity

Sophie Hall, Florian Dörfler, Timm Faulwasser

Generalized Nash equilibria are used in multi-agent control applications to model strategic interactions between agents that are coupled in the cost, dynamics, and constraints, and provide the foundations for game-theoretic MPC (Receding Horizon Games). We study properties of finite-horizon dynamic GNE trajectories from a system-theoretic perspective. We show how strict dissipativity generates the turnpike phenomenon in GNE solutions. Moreover, we establish a converse turnpike result, i.e., the implication from turnpike to strict dissipativity. We derive conditions under which the steady-state GNE is the optimal operating point and, using a game value function, we give a local characterization of the geometry of storage functions. Finally, we design linear terminal penalties that ensure dynamic GNE trajectories applied in open-loop converge to and remain at the steady-state GNE. These connections provide the foundation for future system-theoretic analysis of GNEs similar to those existing in optimal control as well as for recursive feasibility and closed-loop stability results of game-theoretic MPC.

SYMay 26
Load Management of Distribution Systems via Online Dynamic Pricing

Jiarui Yu, Zhiyu He, Wenbin Wang et al.

The growing adoption of electric vehicles (EVs) is increasing peak demand in distribution systems, which can threaten grid stability and reduce operational efficiency. Dynamic electricity pricing is a promising means of mitigating these peaks by shifting flexible demand. However, most existing approaches rely on detailed user-level consumption data and behavioral models, which are often difficult to obtain in practice and may raise privacy concerns. This paper proposes an Online Feedback Optimization (OFO) algorithm for day-ahead price design with limited data, where only aggregate loads are observed. OFO updates prices iteratively using aggregate load measurements, enabling effective peak reduction without access to individual user data. The formulation also includes a term that penalizes deviations in total electricity cost relative to a reference tariff. Although relying only on aggregate load measurements, the OFO price updates efficiently converge to the optimal price. In finite-horizon simulations, OFO achieves peak reduction close to that of the Stackelberg benchmark with full model information. Meanwhile, its computational effort is substantially lower. Additional tests under multiple initial conditions and delayed charging-window mismatch further confirm the robustness of the proposed method. Overall, these results show that OFO is a scalable and computationally efficient approach for peak-demand management in distribution systems with limited observability.

SYMay 26
Incentive-Based Load Curtailment with Limited Information: A Bilevel Zeroth-Order Learning Approach

Zhisen Jiang, Florian Dörfler, Saverio Bolognani

Incentive-based load curtailment unlocks critical demand-side flexibility but is hindered by the limited knowledge of private user parameters and the inherent nonsmoothness of responses due to physical device constraints. We address this via a constrained bilevel optimization framework and propose the Bi-ZOL (Bilevel Zeroth-Order Learning) algorithm. Unlike conventional black-box methods, Bi-ZOL exploits the bilevel structure to decompose the hypergradient, integrating the exact analytical information of the SO's objective with a zeroth-order estimate of the unknown response sensitivity. This structural decomposition-based learning method mathematically smoothes the nonsmooth response landscape and reduces hypergradient estimation error. We provide theoretical convergence guarantees to an approximate stationary point and demonstrate through simulations that Bi-ZOL achieves near-optimal performance.

LGOct 20, 2022
Trust Region Policy Optimization with Optimal Transport Discrepancies: Duality and Algorithm for Continuous Actions

Antonio Terpin, Nicolas Lanzetti, Batuhan Yardim et al.

Policy Optimization (PO) algorithms have been proven particularly suited to handle the high-dimensionality of real-world continuous control tasks. In this context, Trust Region Policy Optimization methods represent a popular approach to stabilize the policy updates. These usually rely on the Kullback-Leibler (KL) divergence to limit the change in the policy. The Wasserstein distance represents a natural alternative, in place of the KL divergence, to define trust regions or to regularize the objective function. However, state-of-the-art works either resort to its approximations or do not provide an algorithm for continuous state-action spaces, reducing the applicability of the method. In this paper, we explore optimal transport discrepancies (which include the Wasserstein distance) to define trust regions, and we propose a novel algorithm - Optimal Transport Trust Region Policy Optimization (OT-TRPO) - for continuous state-action spaces. We circumvent the infinite-dimensional optimization problem for PO by providing a one-dimensional dual reformulation for which strong duality holds. We then analytically derive the optimal policy update given the solution of the dual problem. This way, we bypass the computation of optimal transport costs and of optimal transport maps, which we implicitly characterize by solving the dual formulation. Finally, we provide an experimental evaluation of our approach across various control tasks. Our results show that optimal transport discrepancies can offer an advantage over state-of-the-art approaches.

LGOct 30, 2023
Efficient Exploration in Continuous-time Model-based Reinforcement Learning

Lenart Treven, Jonas Hübotter, Bhavya Sukhija et al.

Reinforcement learning algorithms typically consider discrete-time dynamics, even though the underlying systems are often continuous in time. In this paper, we introduce a model-based reinforcement learning algorithm that represents continuous-time dynamics using nonlinear ordinary differential equations (ODEs). We capture epistemic uncertainty using well-calibrated probabilistic models, and use the optimistic principle for exploration. Our regret bounds surface the importance of the measurement selection strategy(MSS), since in continuous time we not only must decide how to explore, but also when to observe the underlying system. Our analysis demonstrates that the regret is sublinear when modeling ODEs with Gaussian Processes (GP) for common choices of MSS, such as equidistant sampling. Additionally, we propose an adaptive, data-dependent, practical MSS that, when combined with GP dynamics, also achieves sublinear regret with significantly fewer samples. We showcase the benefits of continuous-time modeling over its discrete-time counterpart, as well as our proposed adaptive MSS over standard baselines, on several applications.

SYApr 20
On the Effect of Quadratic Regularization in Direct Data-Driven LQR

Manuel Klädtke, Feiran Zhao, Florian Dörfler et al.

This paper proposes an explainability concept for direct data-driven linear quadratic regulation (LQR) with quadratic regularization. Our perspective follows the parametric effect of regularization, an analysis approach that translates regularization costs from auxiliary variables to system quantities, enabling intuitive interpretations. The framework further enables the elimination of auxiliary variables, thereby reducing computational complexity. We demonstrate the effectiveness of our approach and the identified effect of regularization via simulations.

OCNov 11, 2020
Parametric local stability condition of a multi-converter system

Taouba Jouini, Florian Dörfler

We study local (also referred to as small-signal) stability of a network of identical DC/AC converters having a rotating degree of freedom. We develop a stability theory for a class of partitioned linear systems with symmetries that has natural links to classical stability theories of interconnected systems. We find stability conditions descending from a particular Lyapunov function involving an oblique projection onto the complement of the synchronous steady state set and enjoying insightful structural properties. Our sufficient and explicit stability conditions can be evaluated in a fully decentralized fashion, reflect a parametric dependence on the converter's steady-state variables, and can be one-to-one generalized to other types of systems exhibiting the same behavior, such as synchronous machines. Our conditions demand for sufficient reactive power support and resistive damping. These requirements are well aligned with practitioners' insights.

MLJun 27, 2022
Wasserstein Distributionally Robust Estimation in High Dimensions: Performance Analysis and Optimal Hyperparameter Tuning

Liviu Aolaritei, Soroosh Shafiee, Florian Dörfler

Distributionally robust optimization (DRO) has become a powerful framework for estimation under uncertainty, offering strong out-of-sample performance and principled regularization. In this paper, we propose a DRO-based method for linear regression and address a central question: how to optimally choose the robustness radius, which controls the trade-off between robustness and accuracy. Focusing on high-dimensional settings where the dimension and the number of samples are both large and comparable in size, we employ tools from high-dimensional asymptotic statistics to precisely characterize the estimation error of the resulting estimator. Remarkably, this error can be recovered by solving a simple convex-concave optimization problem involving only four scalar variables. This characterization enables efficient selection of the radius that minimizes the estimation error. In doing so, it achieves the same effect as cross-validation, but at a fraction of the computational cost. Numerical experiments confirm that our theoretical predictions closely match empirical performance and that the optimal radius selected through our method aligns with that chosen by cross-validation, highlighting both the accuracy and the practical benefits of our approach.

MLSep 5, 2024
Maximum likelihood inference for high-dimensional problems with multiaffine variable relations

Jean-Sébastien Brouillon, Florian Dörfler, Giancarlo Ferrari-Trecate

Maximum Likelihood Estimation of continuous variable models can be very challenging in high dimensions, due to potentially complex probability distributions. The existence of multiple interdependencies among variables can make it very difficult to establish convergence guarantees. This leads to a wide use of brute-force methods, such as grid searching and Monte-Carlo sampling and, when applicable, complex and problem-specific algorithms. In this paper, we consider inference problems where the variables are related by multiaffine expressions. We propose a novel Alternating and Iteratively-Reweighted Least Squares (AIRLS) algorithm, and prove its convergence for problems with Generalized Normal Distributions. We also provide an efficient method to compute the variance of the estimates obtained using AIRLS. Finally, we show how the method can be applied to graphical statistical models. We perform numerical experiments on several inference problems, showing significantly better performance than state-of-the-art approaches in terms of scalability, robustness to noise, and convergence speed due to an empirically observed super-linear convergence rate.

SYMay 12
Towards Closed-loop Stability of Nonlinear Receding Horizon Games

Sophie Hall, Florian Dörfler, Timm Faulwasser

We analyze Receding Horizon Games without any MPC-like terminal ingredients. We show that recursive feasibility can be inferred from the turnpike phenomenon under mild assumptions. Moreover, we prove sufficient conditions for practical asymptotic convergence of the closed-loop trajectories, and we discuss how the gap towards practical asymptotic stability may be closed. We use numerical examples to show that the closed-loop region of attraction around the steady-state GNE shrinks exponentially with the horizon length, a behavior previously known only for model predictive control. Further, we apply a linear end penalty and demonstrate in numerical simulations that it suppresses the leaving arc and ensures asymptotic convergence to the steady-state GNE.

LGNov 4, 2025
A Spatially Informed Gaussian Process UCB Method for Decentralized Coverage Control

Gennaro Guidone, Luca Monegaglia, Elia Raimondi et al.

We present a novel decentralized algorithm for coverage control in unknown spatial environments modeled by Gaussian Processes (GPs). To trade-off between exploration and exploitation, each agent autonomously determines its trajectory by minimizing a local cost function. Inspired by the GP-UCB (Upper Confidence Bound for GPs) acquisition function, the proposed cost combines the expected locational cost with a variance-based exploration term, guiding agents toward regions that are both high in predicted density and model uncertainty. Compared to previous work, our algorithm operates in a fully decentralized fashion, relying only on local observations and communication with neighboring agents. In particular, agents periodically update their inducing points using a greedy selection strategy, enabling scalable online GP updates. We demonstrate the effectiveness of our algorithm in simulation.

IRJan 2
Socially-Aware Recommender Systems Mitigate Opinion Clusterization

Lukas Schüepp, Carmen Amo Alonso, Florian Dörfler et al.

Recommender systems shape online interactions by matching users with creators content to maximize engagement. Creators, in turn, adapt their content to align with users preferences and enhance their popularity. At the same time, users preferences evolve under the influence of both suggested content from the recommender system and content shared within their social circles. This feedback loop generates a complex interplay between users, creators, and recommender algorithms, which is the key cause of filter bubbles and opinion polarization. We develop a social network-aware recommender system that explicitly accounts for this user-creators feedback interaction and strategically exploits the topology of the user's own social network to promote diversification. Our approach highlights how accounting for and exploiting user's social network in the recommender system design is crucial to mediate filter bubble effects while balancing content diversity with personalization. Provably, opinion clusterization is positively correlated with the influence of recommended content on user opinions. Ultimately, the proposed approach shows the power of socially-aware recommender systems in combating opinion polarization and clusterization phenomena.

SYApr 30
Optimal Functional Incentives for Control: The Linear-Quadratic Case with Bilinear Incentives

Jonas G. Matt, Saverio Bolognani, Florian Dörfler

We study the design of functional incentive mechanisms for dynamical systems, in which a leader designs a fixed incentive function to motivate a self-interested follower to actuate the system beneficially over an extended horizon, without real-time revision of the incentive. This stands in contrast to the adaptive paradigm, in which the incentive is itself a continuously updated control variable. We formalize the problem as a discrete-time bi-level optimal control problem and derive analytical results for the linear-quadratic case with bilinear incentives and a myopic follower. Specifically, we establish a necessary and sufficient stability condition for the induced closed-loop system, derive a closed-form expression for the gradient of the expected leader cost with respect to the incentive parameter matrix, and obtain a fully closed-form cost expression in the scalar setting. Based on the latter, explicit characterizations of the optimal incentive parameter are provided in two asymptotic regimes: the infinite-horizon limit and the limit of high follower cost. For long horizons, the optimal incentive is shown to become independent of the follower's private cost parameter, with direct implications for robust mechanism design under private information.

GTApr 26
Strategically Robust Aggregative Games

Andreas Feik, Nicolas Lanzetti, Saverio Bolognani et al.

In many multiagent settings, such as electric vehicle charging and traffic routing, agents must make decisions in the face of uncertain behavior exhibited by others. Often, this uncertainty arises from multiple sources, such as incomplete information, limited computation, or bounded rationality, ultimately impacting the aggregate behavior. To tackle this challenge, we follow recent work on strategically robust game theory and postulate that agents seek protection directly against deviations around the emergent behavior, as opposed to explicitly modeling all sources of uncertainty. Specifically, we propose that each agent protects itself against the worst-case aggregate behavior within an optimal-transport-based ambiguity set centered at the emergent aggregate population behavior. This leads to a novel equilibrium concept, called strategically robust Wardrop equilibrium, that enables one to interpolate between standard Wardrop equilibria (no robustness) and security strategies (maximum robustness). In the setting of convex aggregative games, we establish the existence of a pure strategically robust Wardrop equilibrium and provide tractable computational tools for computing it. Through an application in electric vehicle charging, we demonstrate that strategically robust Wardrop equilibria lead to better decisions, protecting agents against the uncertain aggregate behavior of the population. Remarkably, we also observe that strategic robustness can lead to lower equilibrium costs for all agents, uncovering a "coordination-via-robustification" effect.

ROMar 25, 2024
Bridging the Sim-to-Real Gap with Bayesian Inference

Jonas Rothfuss, Bhavya Sukhija, Lenart Treven et al.

We present SIM-FSVGD for learning robot dynamics from data. As opposed to traditional methods, SIM-FSVGD leverages low-fidelity physical priors, e.g., in the form of simulators, to regularize the training of neural network models. While learning accurate dynamics already in the low data regime, SIM-FSVGD scales and excels also when more data is available. We empirically show that learning with implicit physical priors results in accurate mean model estimation as well as precise uncertainty quantification. We demonstrate the effectiveness of SIM-FSVGD in bridging the sim-to-real gap on a high-performance RC racecar system. Using model-based RL, we demonstrate a highly dynamic parking maneuver with drifting, using less than half the data compared to the state of the art.

OCMar 10, 2025
Decision-Dependent Stochastic Optimization: The Role of Distribution Dynamics

Zhiyu He, Saverio Bolognani, Florian Dörfler et al.

Distribution shifts have long been regarded as troublesome external forces that a decision-maker should either counteract or conform to. An intriguing feedback phenomenon termed decision dependence arises when the deployed decision affects the environment and alters the data-generating distribution. In the realm of performative prediction, this is encoded by distribution maps parameterized by decisions due to strategic behaviors. In contrast, we formalize an endogenous distribution shift as a feedback process featuring nonlinear dynamics that couple the evolving distribution with the decision. Stochastic optimization in this dynamic regime provides a fertile ground to examine the various roles played by dynamics in the composite problem structure. To this end, we develop an online algorithm that achieves optimal decision-making by both adapting to and shaping the dynamic distribution. Throughout the paper, we adopt a distributional perspective and demonstrate how this view facilitates characterizations of distribution dynamics and the optimality and generalization performance of the proposed algorithm. We showcase the theoretical results in an opinion dynamics context, where an opportunistic party maximizes the affinity of a dynamic polarized population, and in a recommender system scenario, featuring performance optimization with discrete distributions in the probability simplex.

LGOct 28, 2025
Sample-efficient and Scalable Exploration in Continuous-Time RL

Klemens Iten, Lenart Treven, Bhavya Sukhija et al.

Reinforcement learning algorithms are typically designed for discrete-time dynamics, even though the underlying real-world control systems are often continuous in time. In this paper, we study the problem of continuous-time reinforcement learning, where the unknown system dynamics are represented using nonlinear ordinary differential equations (ODEs). We leverage probabilistic models, such as Gaussian processes and Bayesian neural networks, to learn an uncertainty-aware model of the underlying ODE. Our algorithm, COMBRL, greedily maximizes a weighted sum of the extrinsic reward and model epistemic uncertainty. This yields a scalable and sample-efficient approach to continuous-time model-based RL. We show that COMBRL achieves sublinear regret in the reward-driven setting, and in the unsupervised RL setting (i.e., without extrinsic rewards), we provide a sample complexity bound. In our experiments, we evaluate COMBRL in both standard and unsupervised RL settings and demonstrate that it scales better, is more sample-efficient than prior methods, and outperforms baselines across several deep RL tasks.

LGMar 20, 2025
Learn to Bid as a Price-Maker Wind Power Producer

Shobhit Singhal, Marta Fochesato, Liviu Aolaritei et al.

Wind power producers (WPPs) participating in short-term power markets face significant imbalance costs due to their non-dispatchable and variable production. While some WPPs have a large enough market share to influence prices with their bidding decisions, existing optimal bidding methods rarely account for this aspect. Price-maker approaches typically model bidding as a bilevel optimization problem, but these methods require complex market models, estimating other participants' actions, and are computationally demanding. To address these challenges, we propose an online learning algorithm that leverages contextual information to optimize WPP bids in the price-maker setting. We formulate the strategic bidding problem as a contextual multi-armed bandit, ensuring provable regret minimization. The algorithm's performance is evaluated against various benchmark strategies using a numerical simulation of the German day-ahead and real-time markets.

LGNov 25, 2025
SOMBRL: Scalable and Optimistic Model-Based RL

Bhavya Sukhija, Lenart Treven, Carmelo Sferrazza et al.

We address the challenge of efficient exploration in model-based reinforcement learning (MBRL), where the system dynamics are unknown and the RL agent must learn directly from online interactions. We propose Scalable and Optimistic MBRL (SOMBRL), an approach based on the principle of optimism in the face of uncertainty. SOMBRL learns an uncertainty-aware dynamics model and greedily maximizes a weighted sum of the extrinsic reward and the agent's epistemic uncertainty. SOMBRL is compatible with any policy optimizers or planners, and under common regularity assumptions on the system, we show that SOMBRL has sublinear regret for nonlinear dynamics in the (i) finite-horizon, (ii) discounted infinite-horizon, and (iii) non-episodic settings. Additionally, SOMBRL offers a flexible and scalable solution for principled exploration. We evaluate SOMBRL on state-based and visual-control environments, where it displays strong performance across all tasks and baselines. We also evaluate SOMBRL on a dynamic RC car hardware and show SOMBRL outperforms the state-of-the-art, illustrating the benefits of principled exploration for MBRL.

ROOct 27, 2025
TARC: Time-Adaptive Robotic Control

Arnav Sukhija, Lenart Treven, Jin Cheng et al.

Fixed-frequency control in robotics imposes a trade-off between the efficiency of low-frequency control and the robustness of high-frequency control, a limitation not seen in adaptable biological systems. We address this with a reinforcement learning approach in which policies jointly select control actions and their application durations, enabling robots to autonomously modulate their control frequency in response to situational demands. We validate our method with zero-shot sim-to-real experiments on two distinct hardware platforms: a high-speed RC car and a quadrupedal robot. Our method matches or outperforms fixed-frequency baselines in terms of rewards while significantly reducing the control frequency and exhibiting adaptive frequency control under real-world conditions.

LGSep 6, 2025
Simulation Priors for Data-Efficient Deep Learning

Lenart Treven, Bhavya Sukhija, Jonas Rothfuss et al.

How do we enable AI systems to efficiently learn in the real-world? First-principles models are widely used to simulate natural systems, but often fail to capture real-world complexity due to simplifying assumptions. In contrast, deep learning approaches can estimate complex dynamics with minimal assumptions but require large, representative datasets. We propose SimPEL, a method that efficiently combines first-principles models with data-driven learning by using low-fidelity simulators as priors in Bayesian deep learning. This enables SimPEL to benefit from simulator knowledge in low-data regimes and leverage deep learning's flexibility when more data is available, all the while carefully quantifying epistemic uncertainty. We evaluate SimPEL on diverse systems, including biological, agricultural, and robotic domains, showing superior performance in learning complex dynamics. For decision-making, we demonstrate that SimPEL bridges the sim-to-real gap in model-based reinforcement learning. On a high-speed RC car task, SimPEL learns a highly dynamic parking maneuver involving drifting with substantially less data than state-of-the-art baselines. These results highlight the potential of SimPEL for data-efficient learning and control in complex real-world environments.

LGAug 7, 2025
A Markov Decision Process Framework for Early Maneuver Decisions in Satellite Collision Avoidance

Francesca Ferrara, Lander W. Schillinger Arana, Florian Dörfler et al.

This work presents a Markov decision process (MDP) framework to model decision-making for collision avoidance maneuver (CAM) and a reinforcement learning policy gradient (RL-PG) algorithm to train an autonomous guidance policy using historic CAM data. In addition to maintaining acceptable collision risks, this approach seeks to minimize the average fuel consumption of CAMs by making early maneuver decisions. We model CAM as a continuous state, discrete action and finite horizon MDP, where the critical decision is determining when to initiate the maneuver. The MDP model also incorporates analytical models for conjunction risk, propellant consumption, and transit orbit geometry. The Markov policy effectively trades-off maneuver delay-which improves the reliability of conjunction risk indicators-with propellant consumption-which increases with decreasing maneuver time. Using historical data of tracked conjunction events, we verify this framework and conduct an extensive ablation study on the hyper-parameters used within the MDP. On synthetic conjunction events, the trained policy significantly minimizes both the overall and average propellant consumption per CAM when compared to a conventional cut-off policy that initiates maneuvers 24 hours before the time of closest approach (TCA). On historical conjunction events, the trained policy consumes more propellant overall but reduces the average propellant consumption per CAM. For both historical and synthetic conjunction events, the trained policy achieves equal if not higher overall collision risk guarantees.

OCOct 18, 2024
Contractivity and linear convergence in bilinear saddle-point problems: An operator-theoretic approach

Colin Dirren, Mattia Bianchi, Panagiotis D. Grontas et al.

We study the convex-concave bilinear saddle-point problem $\min_x \max_y f(x) + y^\top Ax - g(y)$, where both, only one, or none of the functions $f$ and $g$ are strongly convex, and suitable rank conditions on the matrix $A$ hold. The solution of this problem is at the core of many machine learning tasks. By employing tools from monotone operator theory, we systematically prove the contractivity (in turn, the linear convergence) of several first-order primal-dual algorithms, including the Chambolle-Pock method. Our approach results in concise proofs, and it yields new convergence guarantees and tighter bounds compared to known results.

LGJun 18, 2024
Learning diffusion at lightspeed

Antonio Terpin, Nicolas Lanzetti, Martin Gadea et al.

Diffusion regulates numerous natural processes and the dynamics of many successful generative models. Existing models to learn the diffusion terms from observational data rely on complex bilevel optimization problems and model only the drift of the system. We propose a new simple model, JKOnet*, which bypasses the complexity of existing architectures while presenting significantly enhanced representational capabilities: JKOnet* recovers the potential, interaction, and internal energy components of the underlying diffusion process. JKOnet* minimizes a simple quadratic loss and outperforms other baselines in terms of sample efficiency, computational complexity, and accuracy. Additionally, JKOnet* provides a closed-form optimal solution for linearly parametrized functionals, and, when applied to predict the evolution of cellular processes from real-world data, it achieves state-of-the-art accuracy at a fraction of the computational cost of all existing methods. Our methodology is based on the interpretation of diffusion processes as energy-minimizing trajectories in the probability space via the so-called JKO scheme, which we study via its first-order optimality conditions.

LGJun 3, 2024
NeoRL: Efficient Exploration for Nonepisodic RL

Bhavya Sukhija, Lenart Treven, Florian Dörfler et al.

We study the problem of nonepisodic reinforcement learning (RL) for nonlinear dynamical systems, where the system dynamics are unknown and the RL agent has to learn from a single trajectory, i.e., without resets. We propose Nonepisodic Optimistic RL (NeoRL), an approach based on the principle of optimism in the face of uncertainty. NeoRL uses well-calibrated probabilistic models and plans optimistically w.r.t. the epistemic uncertainty about the unknown dynamics. Under continuity and bounded energy assumptions on the system, we provide a first-of-its-kind regret bound of $O(Γ_T \sqrt{T})$ for general nonlinear systems with Gaussian process dynamics. We compare NeoRL to other baselines on several deep RL environments and empirically demonstrate that NeoRL achieves the optimal average cost while incurring the least regret.

LGJun 3, 2024
When to Sense and Control? A Time-adaptive Approach for Continuous-Time RL

Lenart Treven, Bhavya Sukhija, Yarden As et al.

Reinforcement learning (RL) excels in optimizing policies for discrete-time Markov decision processes (MDP). However, various systems are inherently continuous in time, making discrete-time MDPs an inexact modeling choice. In many applications, such as greenhouse control or medical treatments, each interaction (measurement or switching of action) involves manual intervention and thus is inherently costly. Therefore, we generally prefer a time-adaptive approach with fewer interactions with the system. In this work, we formalize an RL framework, Time-adaptive Control & Sensing (TaCoS), that tackles this challenge by optimizing over policies that besides control predict the duration of its application. Our formulation results in an extended MDP that any standard RL algorithm can solve. We demonstrate that state-of-the-art RL algorithms trained on TaCoS drastically reduce the interaction amount over their discrete-time counterpart while retaining the same or improved performance, and exhibiting robustness over discretization frequency. Finally, we propose OTaCoS, an efficient model-based algorithm for our setting. We show that OTaCoS enjoys sublinear regret for systems with sufficiently smooth dynamics and empirically results in further sample-efficiency gains.

OCJan 25, 2024
Towards a Systems Theory of Algorithms

Florian Dörfler, Zhiyu He, Giuseppe Belgioioso et al.

Traditionally, numerical algorithms are seen as isolated pieces of code confined to an {\em in silico} existence. However, this perspective is not appropriate for many modern computational approaches in control, learning, or optimization, wherein {\em in vivo} algorithms interact with their environment. Examples of such {\em open algorithms} include various real-time optimization-based control strategies, reinforcement learning, decision-making architectures, online optimization, and many more. Further, even {\em closed} algorithms in learning or optimization are increasingly abstracted in block diagrams with interacting dynamic modules and pipelines. In this opinion paper, we state our vision on a to-be-cultivated {\em systems theory of algorithms} and argue in favor of viewing algorithms as open dynamical systems interacting with other algorithms, physical systems, humans, or databases. Remarkably, the manifold tools developed under the umbrella of systems theory are well suited for addressing a range of challenges in the algorithmic domain. We survey various instances where the principles of algorithmic systems theory are being developed and outline pertinent modeling, analysis, and design challenges.

MANov 13, 2021
Posetal Games: Efficiency, Existence, and Refinement of Equilibria in Games with Prioritized Metrics

Alessandro Zanardi, Gioele Zardini, Sirish Srinivasan et al.

Modern applications require robots to comply with multiple, often conflicting rules and to interact with the other agents. We present Posetal Games as a class of games in which each player expresses a preference over the outcomes via a partially ordered set of metrics. This allows one to combine hierarchical priorities of each player with the interactive nature of the environment. By contextualizing standard game theoretical notions, we provide two sufficient conditions on the preference of the players to prove existence of pure Nash Equilibria in finite action sets. Moreover, we define formal operations on the preference structures and link them to a refinement of the game solutions, showing how the set of equilibria can be systematically shrunk. The presented results are showcased in a driving game where autonomous vehicles select from a finite set of trajectories. The results demonstrate the interpretability of results in terms of minimum-rank-violation for each player.

LGOct 27, 2021
Learning Stable Deep Dynamics Models for Partially Observed or Delayed Dynamical Systems

Andreas Schlaginhaufen, Philippe Wenk, Andreas Krause et al.

Learning how complex dynamical systems evolve over time is a key challenge in system identification. For safety critical systems, it is often crucial that the learned model is guaranteed to converge to some equilibrium point. To this end, neural ODEs regularized with neural Lyapunov functions are a promising approach when states are fully observed. For practical applications however, partial observations are the norm. As we will demonstrate, initialization of unobserved augmented states can become a key problem for neural ODEs. To alleviate this issue, we propose to augment the system's state with its history. Inspired by state augmentation in discrete-time systems, we thus obtain neural delay differential equations. Based on classical time delay stability analysis, we then show how to ensure stability of the learned models, and theoretically analyze our approach. Our experiments demonstrate its applicability to stable system identification of partially observed systems and learning a stabilizing feedback policy in delayed feedback control.

SYJul 9, 2021
Bayesian Error-in-Variables Models for the Identification of Power Networks

Jean-Sébastien Brouillon, Emanuele Fabbiani, Pulkit Nahata et al.

The increasing integration of intermittent renewable generation, especially at the distribution level,necessitates advanced planning and optimisation methodologies contingent on the knowledge of thegrid, specifically the admittance matrix capturing the topology and line parameters of an electricnetwork. However, a reliable estimate of the admittance matrix may either be missing or quicklybecome obsolete for temporally varying grids. In this work, we propose a data-driven identificationmethod utilising voltage and current measurements collected from micro-PMUs. More precisely,we first present a maximum likelihood approach and then move towards a Bayesian framework,leveraging the principles of maximum a posteriori estimation. In contrast with most existing con-tributions, our approach not only factors in measurement noise on both voltage and current data,but is also capable of exploiting available a priori information such as sparsity patterns and knownline parameters. Simulations conducted on benchmark cases demonstrate that, compared to otheralgorithms, our method can achieve significantly greater accuracy.

LGJun 22, 2021
Distributional Gradient Matching for Learning Uncertain Neural Dynamics Models

Lenart Treven, Philippe Wenk, Florian Dörfler et al.

Differential equations in general and neural ODEs in particular are an essential technique in continuous-time system identification. While many deterministic learning algorithms have been designed based on numerical integration via the adjoint method, many downstream tasks such as active learning, exploration in reinforcement learning, robust control, or filtering require accurate estimates of predictive uncertainties. In this work, we propose a novel approach towards estimating epistemically uncertain neural ODEs, avoiding the numerical integration bottleneck. Instead of modeling uncertainty in the ODE parameters, we directly model uncertainties in the state space. Our algorithm - distributional gradient matching (DGM) - jointly trains a smoother and a dynamics model and matches their gradients via minimizing a Wasserstein loss. Our experiments show that, compared to traditional approximate inference methods based on numerical integration, our approach is faster to train, faster at predicting previously unseen trajectories, and in the context of neural ODEs, significantly more accurate.

OCNov 18, 2014
Topology Design for Optimal Network Coherence

Tyler Summers, Iman Shames, John Lygeros et al.

We consider a network topology design problem in which an initial undirected graph underlying the network is given and the objective is to select a set of edges to add to the graph to optimize the coherence of the resulting network. We show that network coherence is a submodular function of the network topology. As a consequence, a simple greedy algorithm is guaranteed to produce near optimal edge set selections. We also show that fast rank one updates of the Laplacian pseudoinverse using generalizations of the Sherman-Morrison formula and an accelerated variant of the greedy algorithm can speed up the algorithm by several orders of magnitude in practice. These allow our algorithms to scale to network sizes far beyond those that can be handled by convex relaxation heuristics.