SYNov 14, 2022
Implications of Regret on Stability of Linear Dynamical SystemsAren Karapetyan, Anastasios Tsiamis, Efe C. Balta et al.
The setting of an agent making decisions under uncertainty and under dynamic constraints is common for the fields of optimal control, reinforcement learning, and recently also for online learning. In the online learning setting, the quality of an agent's decision is often quantified by the concept of regret, comparing the performance of the chosen decisions to the best possible ones in hindsight. While regret is a useful performance measure, when dynamical systems are concerned, it is important to also assess the stability of the closed-loop system for a chosen policy. In this work, we show that for linear state feedback policies and linear systems subject to adversarial disturbances, linear regret implies asymptotic stability in both time-varying and time-invariant settings. Conversely, we also show that bounded input bounded state stability and summability of the state transition matrices imply linear regret.
SYApr 10, 2022
Regret Analysis of Online Gradient Descent-based Iterative Learning Control with Model MismatchEfe C. Balta, Andrea Iannelli, Roy S. Smith et al.
In Iterative Learning Control (ILC), a sequence of feedforward control actions is generated at each iteration on the basis of partial model knowledge and past measurements with the goal of steering the system toward a desired reference trajectory. This is framed here as an online learning task, where the decision-maker takes sequential decisions by solving a sequence of optimization problems having only partial knowledge of the cost functions. Having established this connection, the performance of an online gradient-descent based scheme using inexact gradient information is analyzed in the setting of dynamic and static regret, standard measures in online learning. Fundamental limitations of the scheme and its integration with adaptation mechanisms are further investigated, followed by numerical simulations on a benchmark ILC problem.
SYApr 29
A Unified Bayesian Framework for Data-Driven Smoothing, Prediction, and ControlMingzhou Yin, Andrea Iannelli, Seyed Ali Nazari et al.
Extending data-driven algorithms based on Willems' fundamental lemma to stochastic data often requires empirical and customized workarounds. This work presents a unified Bayesian framework for linear systems that provides a systematic and general method for handling stochastic data-driven tasks, including smoothing, prediction, and control, via maximum a posteriori estimation. This framework formulates a unified trajectory estimation problem for the three tasks by specifying different types of trajectory knowledge. Then, a Bayesian problem is solved that optimally combines trajectory knowledge with a data-driven characterization of the trajectory from offline data for correlated input-output uncertainties with elliptical distributions. Under specific conditions, this problem is shown to generalize existing data-driven prediction and control algorithms. Numerical examples demonstrate the performance of the unified approach for all three tasks against other data-driven and system identification approaches.
SYMay 8
Sample-Efficient Model-Free Policy Gradient Methods for Stochastic LQR via Robust Linear RegressionBowen Song, Sebastien Gros, Andrea Iannelli
Policy gradient algorithms are widely used in reinforcement learning and belong to the class of approximate dynamic programming methods. This paper studies two key policy gradient algorithms, the Natural Policy Gradient and the Gauss-Newton Method, for solving the Linear Quadratic Regulator (LQR) problem in unknown stochastic linear systems. The main challenge lies in obtaining an unbiased gradient estimate from noisy data due to errors-in-variables in linear regression. This issue is addressed by employing a primal-dual estimation procedure. Using this novel gradient estimation scheme, the paper establishes convergence guarantees with a sample complexity of order O(1/epsilon). Theoretical results are further supported by numerical experiments, which demonstrate the effectiveness of the proposed algorithms.
SYMar 20
End-to-end guarantees for indirect data-driven control of bilinear systems with finite stochastic dataNicolas Chatzikiriakos, Robin Strässer, Frank Allgöwer et al.
In this paper we propose an end-to-end algorithm for indirect data-driven control for bilinear systems with stability guarantees. We consider the case where the collected i.i.d. data is affected by probabilistic noise with possibly unbounded support and leverage tools from statistical learning theory to derive finite sample identification error bounds. To this end, we solve the bilinear identification problem by solving a set of linear and affine identification problems, by a particular choice of a control input during the data collection phase. We provide a priori as well as data-dependent finite sample identification error bounds on the individual matrices as well as ellipsoidal bounds, both of which are structurally suitable for control. Further, we integrate the structure of the derived identification error bounds in a robust controller design to obtain an exponentially stable closed-loop. By means of an extensive numerical study we showcase the interplay between the controller design and the derived identification error bounds. Moreover, we note appealing connections of our results to indirect data-driven control of general nonlinear systems through Koopman operator theory and discuss how our results may be applied in this setup.
SYSep 17, 2024
Sample Complexity Bounds for Linear System Identification from a Finite SetNicolas Chatzikiriakos, Andrea Iannelli
This paper considers a finite sample perspective on the problem of identifying an LTI system from a finite set of possible systems using trajectory data. To this end, we use the maximum likelihood estimator to identify the true system and provide an upper bound for its sample complexity. Crucially, the derived bound does not rely on a potentially restrictive stability assumption. Additionally, we leverage tools from information theory to provide a lower bound to the sample complexity that holds independently of the used estimator. The derived sample complexity bounds are analyzed analytically and numerically.
SYApr 1
Convergence Guarantees of Model-free Policy Gradient Methods for LQR with Stochastic DataBowen Song, Andrea Iannelli
Policy gradient (PG) methods are the backbone of many reinforcement learning algorithms due to their good performance in policy optimization problems. As a gradient-based approach, PG methods typically rely on knowledge of the system dynamics. If this is not available, trajectory data can be utilized to approximate first-order information. When the data are noisy, gradient estimates become inaccurate and a study that investigates uncertainty estimation and the analysis of its propagation through the algorithm is currently missing. To address this, our work focuses on the Linear Quadratic Regulator (LQR) problem for systems subject to additive stochastic noise. After briefly summarizing the state of the art for cases with a known model, we focus on scenarios where the system dynamics are unknown, and approximate gradient information is obtained using zeroth-order optimization techniques. We analyze the theoretical properties by computing the error in the estimated gradient and examining how this error affects the convergence of PG algorithms. Additionally, we provide global convergence guarantees for various versions of PG methods, including those employing adaptive step sizes and variance reduction techniques, which help increase the convergence rate and reduce sample complexity. This study contributed to characterizing the robustness of model-free PG methods, aiming to identify their limitations in the presence of stochastic noise and proposing improvements to enhance their applicability.
OCMar 25
Structure, Analysis, and Synthesis of First-Order AlgorithmsJared Miller, Carsten Scherer, Fabian Jakob et al.
Optimization algorithms can be interpreted through the lens of dynamical systems as the interconnection of linear systems and a set of subgradient nonlinearities. This dynamical systems formulation allows for the analysis and synthesis of optimization algorithms by solving robust control problems. In this work, we use the celebrated internal model principle in control theory to structurally factorize convergent composite optimization algorithms into suitable network-dependent internal models and core subcontrollers. As the key benefit, we reveal that this permits us to synthesize optimization algorithms even if information is transmitted over networks featuring dynamical phenomena such as time delays, channel memory, or crosstalk. Design of these algorithms is achieved under bisection in the exponential convergence rate either through a nonconvex local search or by alternation of convex semidefinite programs. We demonstrate factorization of existing optimization algorithms and the automated synthesis of new optimization algorithms in the networked setting.
SYApr 1
Beyond Bounded Noise: Stochastic Set-Membership Estimation for Nonlinear SystemsFelix Brändle, Nicolas Chatzikiriakos, Andrea Iannelli et al.
In this paper, we derive a novel procedure for set-membership estimation of dynamical systems affected by stochastic noise with unbounded support. By employing a bound on the sample covariance matrix, we are able to provide a finite-sample uncertainty set containing the true system parameters with high probability. Our approach can be natively applied to a wide class of nonlinear systems affected by sub- Gaussian noise. Through our analysis, we provide conditions under which the proposed uncertainty set converges to the true system parameters and establish an upper bound on the convergence rate. The proposed uncertainty set can be used directly for the synthesis of robust controllers with probabilistic stability and performance guarantees. Concluding numerical examples demonstrate the advantages of the proposed formulation over established approaches.
OCMar 30, 2025
Online Convex Optimization and Integral Quadratic Constraints: An automated approach to regret analysisFabian Jakob, Andrea Iannelli
We propose a novel approach for analyzing dynamic regret of first-order constrained online convex optimization algorithms for strongly convex and Lipschitz-smooth objectives. Crucially, we provide a general analysis that is applicable to a wide range of first-order algorithms that can be expressed as an interconnection of a linear dynamical system in feedback with a first-order oracle. By leveraging Integral Quadratic Constraints (IQCs), we derive a semi-definite program which, when feasible, provides a regret guarantee for the online algorithm. For this, the concept of variational IQCs is introduced as the generalization of IQCs to time-varying monotone operators. Our bounds capture the temporal rate of change of the problem in the form of the path length of the time-varying minimizer and the objective function variation. In contrast to standard results in OCO, our results do not require nerither the assumption of gradient boundedness, nor that of a bounded feasible set. Numerical analyses showcase the ability of the approach to capture the dependence of the regret on the function class condition number.
LGNov 21, 2025
Convergence and stability of Q-learning in Hierarchical Reinforcement LearningMassimiliano Manenti, Andrea Iannelli
Hierarchical Reinforcement Learning promises, among other benefits, to efficiently capture and utilize the temporal structure of a decision-making problem and to enhance continual learning capabilities, but theoretical guarantees lag behind practice. In this paper, we propose a Feudal Q-learning scheme and investigate under which conditions its coupled updates converge and are stable. By leveraging the theory of Stochastic Approximation and the ODE method, we present a theorem stating the convergence and stability properties of Feudal Q-learning. This provides a principled convergence and stability analysis tailored to Feudal RL. Moreover, we show that the updates converge to a point that can be interpreted as an equilibrium of a suitably defined game, opening the door to game-theoretic approaches to Hierarchical RL. Lastly, experiments based on the Feudal Q-learning algorithm support the outcomes anticipated by theory.
SYSep 15, 2025
High Effort, Low Gain: Fundamental Limits of Active Learning for Linear Dynamical SystemsNicolas Chatzikiriakos, Kevin Jamieson, Andrea Iannelli
In this work, we consider the problem of identifying an unknown linear dynamical system given a finite hypothesis class. In particular, we analyze the effect of the excitation input on the sample complexity of identifying the true system with high probability. To this end, we present sample complexity lower bounds that capture the choice of the selected excitation input. The sample complexity lower bound gives rise to a system theoretic condition to determine the potential benefit of experiment design. Informed by the analysis of the sample complexity lower bound, we propose a persistent excitation (PE) condition tailored to the considered setting, which we then use to establish sample complexity upper bounds. Notably, the \acs{PE} condition is weaker than in the case of an infinite hypothesis class and allows analyzing different excitation inputs modularly. Crucially, the lower and upper bounds share the same dependency on key problem parameters. Finally, we leverage these insights to propose an active learning algorithm that sequentially excites the system optimally with respect to the current estimate, and provide sample complexity guarantees for the presented algorithm. Concluding simulations showcase the effectiveness of the proposed algorithm.
MLAug 23, 2020
Learning Dynamical Systems using Local Stability PriorsArash Mehrjou, Andrea Iannelli, Bernhard Schölkopf
A coupled computational approach to simultaneously learn a vector field and the region of attraction of an equilibrium point from generated trajectories of the system is proposed. The nonlinear identification leverages the local stability information as a prior on the system, effectively endowing the estimate with this important structural property. In addition, the knowledge of the region of attraction plays an experiment design role by informing the selection of initial conditions from which trajectories are generated and by enabling the use of a Lyapunov function of the system as a regularization term. Numerical results show that the proposed method allows efficient sampling and provides an accurate estimate of the dynamics in an inner approximation of its region of attraction.