Gianluigi Pillonetto

SY
h-index60
34papers
727citations
Novelty42%
AI Score41

34 Papers

LGJan 30, 2023
Deep networks for system identification: a Survey

Gianluigi Pillonetto, Aleksandr Aravkin, Daniel Gedon et al.

Deep learning is a topic of considerable current interest. The availability of massive data collections and powerful software resources has led to an impressive amount of results in many application areas that reveal essential but hidden properties of the observations. System identification learns mathematical descriptions of dynamic systems from input-output data and can thus benefit from the advances of deep neural networks to enrich the possible range of models to choose from. For this reason, we provide a survey of deep learning from a system identification perspective. We cover a wide spectrum of topics to enable researchers to understand the methods, providing rigorous practical and theoretical insights into the benefits and challenges of using them. The main aim of the identified model is to predict new data from previous observations. This can be achieved with different deep learning based modelling techniques and we discuss architectures commonly adopted in the literature, like feedforward, convolutional, and recurrent networks. Their parameters have to be estimated from past data trying to optimize the prediction performance. For this purpose, we discuss a specific set of first-order optimization tools that is emerged as efficient. The survey then draws connections to the well-studied area of kernel-based methods. They control the data fit by regularization terms that penalize models not in line with prior assumptions. We illustrate how to cast them in deep architectures to obtain deep kernel-based methods. The success of deep learning also resulted in surprising empirical observations, like the counter-intuitive behaviour of models with many parameters. We discuss the role of overparameterized models, including their connection to kernels, as well as implicit regularization mechanisms which affect generalization, specifically the interesting phenomena of benign overfitting ...

NAJul 24, 2013
Kalman smoothing and block tridiagonal systems: new connections and numerical stability results

Aleksandr Y. Aravkin, Bradley B. Bell, James V. Burke et al.

The Rauch-Tung-Striebel (RTS) and the Mayne-Fraser (MF) algorithms are two of the most popular smoothing schemes to reconstruct the state of a dynamic linear system from measurements collected on a fixed interval. Another (less popular) approach is the Mayne (M) algorithm introduced in his original paper under the name of Algorithm A. In this paper, we analyze these three smoothers from an optimization and algebraic perspective, revealing new insights on their numerical stability properties. In doing this, we re-interpret classic recursions as matrix decomposition methods for block tridiagonal matrices. First, we show that the classic RTS smoother is an implementation of the forward block tridiagonal (FBT) algorithm (also known as Thomas algorithm) for particular block tridiagonal systems. We study the numerical stability properties of this scheme, connecting the condition number of the full system to properties of the individual blocks encountered during standard recursion. Second, we study the M smoother, and prove it is equivalent to a backward block tridiagonal (BBT) algorithm with a stronger stability guarantee than RTS. Third, we illustrate how the MF smoother solves a block tridiagonal system, and prove that it has the same numerical stability properties of RTS (but not those of M). Finally, we present a new hybrid RTS/M (FBT/BBT) smoothing scheme, which is faster than MF, and has the same numerical stability guarantees of RTS and MF.

LGNov 14, 2025
On-line learning of dynamic systems: sparse regression meets Kalman filtering

Gianluigi Pillonetto, Akram Yazdani, Aleksandr Aravkin

Learning governing equations from data is central to understanding the behavior of physical systems across diverse scientific disciplines, including physics, biology, and engineering. The Sindy algorithm has proven effective in leveraging sparsity to identify concise models of nonlinear dynamical systems. In this paper, we extend sparsity-driven approaches to real-time learning by integrating a cornerstone algorithm from control theory -- the Kalman filter (KF). The resulting Sindy Kalman Filter (SKF) unifies both frameworks by treating unknown system parameters as state variables, enabling real-time inference of complex, time-varying nonlinear models unattainable by either method alone. Furthermore, SKF enhances KF parameter identification strategies, particularly via look-ahead error, significantly simplifying the estimation of sparsity levels, variance parameters, and switching instants. We validate SKF on a chaotic Lorenz system with drifting or switching parameters and demonstrate its effectiveness in the real-time identification of a sparse nonlinear aircraft model built from real flight data.

LGOct 4, 2023
Kernel-based function learning in dynamic and non stationary environments

Alberto Giaretta, Mauro Bisiacco, Gianluigi Pillonetto

One central theme in machine learning is function estimation from sparse and noisy data. An example is supervised learning where the elements of the training set are couples, each containing an input location and an output response. In the last decades, a substantial amount of work has been devoted to design estimators for the unknown function and to study their convergence to the optimal predictor, also characterizing the learning rate. These results typically rely on stationary assumptions where input locations are drawn from a probability distribution that does not change in time. In this work, we consider kernel-based ridge regression and derive convergence conditions under non stationary distributions, addressing also cases where stochastic adaption may happen infinitely often. This includes the important exploration-exploitation problems where e.g. a set of agents/robots has to monitor an environment to reconstruct a sensorial field and their movements rules are continuously updated on the basis of the acquired knowledge on the field and/or the surrounding environment.

MLFeb 21, 2023
Dealing with Collinearity in Large-Scale Linear System Identification Using Gaussian Regression

Wenqi Cao, Gianluigi Pillonetto

Many problems arising in control require the determination of a mathematical model of the application. This has often to be performed starting from input-output data, leading to a task known as system identification in the engineering literature. One emerging topic in this field is estimation of networks consisting of several interconnected dynamic systems. We consider the linear setting assuming that system outputs are the result of many correlated inputs, hence making system identification severely ill-conditioned. This is a scenario often encountered when modeling complex cybernetics systems composed by many sub-units with feedback and algebraic loops. We develop a strategy cast in a Bayesian regularization framework where any impulse response is seen as realization of a zero-mean Gaussian process. Any covariance is defined by the so called stable spline kernel which includes information on smooth exponential decay. We design a novel Markov chain Monte Carlo scheme able to reconstruct the impulse responses posterior by efficiently dealing with collinearity. Our scheme relies on a variation of the Gibbs sampling technique: beyond considering blocks forming a partition of the parameter space, some other (overlapping) blocks are also updated on the basis of the level of collinearity of the system inputs. Theoretical properties of the algorithm are studied obtaining its convergence rate. Numerical experiments are included using systems containing hundreds of impulse responses and highly correlated inputs.

MLSep 30, 2013Code
Generalized system identification with stable spline kernels

Aleksandr Y. Aravkin, James V. Burke, Gianluigi Pillonetto

Regularized least-squares approaches have been successfully applied to linear system identification. Recent approaches use quadratic penalty terms on the unknown impulse response defined by stable spline kernels, which control model space complexity by leveraging regularity and bounded-input bounded-output stability. This paper extends linear system identification to a wide class of nonsmooth stable spline estimators, where regularization functionals and data misfits can be selected from a rich set of piecewise linear-quadratic (PLQ) penalties. This class includes the 1-norm, Huber, and Vapnik, in addition to the least-squares penalty. By representing penalties through their conjugates, the modeler can specify any piecewise linear-quadratic penalty for misfit and regularizer, as well as inequality constraints on the response. The interior-point solver we implement (IPsolve) is locally quadratically convergent, with $O(\min(m,n)^2(m+n))$ arithmetic operations per iteration, where $n$ the number of unknown impulse response coefficients and $m$ the number of observed output measurements. IPsolve is competitive with available alternatives for system identification. This is shown by a comparison with TFOCS, libSVM, and the FISTA algorithm. The code is open source (https://github.com/saravkin/IPsolve). The impact of the approach for system identification is illustrated with numerical experiments featuring robust formulations for contaminated data, relaxation systems, nonnegativity and unimodality constraints on the impulse response, and sparsity promoting regularization. Incorporating constraints yields particularly significant improvements.

OCMar 22, 2013Code
Robust and Trend Following Student's t Kalman Smoothers

Aleksandr Y. Aravkin, James V. Burke, Gianluigi Pillonetto

We present a Kalman smoothing framework based on modeling errors using the heavy tailed Student's t distribution, along with algorithms, convergence theory, open-source general implementation, and several important applications. The computational effort per iteration grows linearly with the length of the time series, and all smoothers allow nonlinear process and measurement models. Robust smoothers form an important subclass of smoothers within this framework. These smoothers work in situations where measurements are highly contaminated by noise or include data unexplained by the forward model. Highly robust smoothers are developed by modeling measurement errors using the Student's t distribution, and outperform the recently proposed L1-Laplace smoother in extreme situations with data containing 20% or more outliers. A second special application we consider in detail allows tracking sudden changes in the state. It is developed by modeling process noise using the Student's t distribution, and the resulting smoother can track sudden changes in the state. These features can be used separately or in tandem, and we present a general smoother algorithm and open source implementation, together with convergence analysis that covers a wide range of smoothers. A key ingredient of our approach is a technique to deal with the non-convexity of the Student's t loss function. Numerical results for linear and nonlinear models illustrate the performance of the new smoothers for robust and tracking applications, as well as for mixed problems that have both types of features.

OCMar 8, 2013Code
Optimization viewpoint on Kalman smoothing, with applications to robust and sparse estimation

Aleksandr Y. Aravkin, James V. Burke, Gianluigi Pillonetto

In this paper, we present the optimization formulation of the Kalman filtering and smoothing problems, and use this perspective to develop a variety of extensions and applications. We first formulate classic Kalman smoothing as a least squares problem, highlight special structure, and show that the classic filtering and smoothing algorithms are equivalent to a particular algorithm for solving this problem. Once this equivalence is established, we present extensions of Kalman smoothing to systems with nonlinear process and measurement models, systems with linear and nonlinear inequality constraints, systems with outliers in the measurements or sudden changes in the state, and systems where the sparsity of the state sequence must be accounted for. All extensions preserve the computational efficiency of the classic algorithms, and most of the extensions are illustrated with numerical examples, which are part of an open source Kalman smoothing Matlab/Octave package.

LGNov 17, 2025
Learning stochasticity: a nonparametric framework for intrinsic noise estimation

Gianluigi Pillonetto, Alberto Giaretta, Mauro Bisiacco

Understanding the principles that govern dynamical systems is a central challenge across many scientific domains, including biology and ecology. Incomplete knowledge of nonlinear interactions and stochastic effects often renders bottom-up modeling approaches ineffective, motivating the development of methods that can discover governing equations directly from data. In such contexts, parametric models often struggle without strong prior knowledge, especially when estimating intrinsic noise. Nonetheless, incorporating stochastic effects is often essential for understanding the dynamic behavior of complex systems such as gene regulatory networks and signaling pathways. To address these challenges, we introduce Trine (Three-phase Regression for INtrinsic noisE), a nonparametric, kernel-based framework that infers state-dependent intrinsic noise from time-series data. Trine features a three-stage algorithm that com- bines analytically solvable subproblems with a structured kernel architecture that captures both abrupt noise-driven fluctuations and smooth, state-dependent changes in variance. We validate Trine on biological and ecological systems, demonstrating its ability to uncover hidden dynamics without relying on predefined parametric assumptions. Across several benchmark problems, Trine achieves performance comparable to that of an oracle. Biologically, this oracle can be viewed as an idealized observer capable of directly tracking the random fluctuations in molecular concentrations or reaction events within a cell. The Trine framework thus opens new avenues for understanding how intrinsic noise affects the behavior of complex systems.

SYMay 2, 2023
Absolute integrability of Mercer kernels is only sufficient for RKHS stability

Mauro Bisiacco, Gianluigi Pillonetto

Reproducing kernel Hilbert spaces (RKHSs) are special Hilbert spaces in one-to-one correspondence with positive definite maps called kernels. They are widely employed in machine learning to reconstruct unknown functions from sparse and noisy data. In the last two decades, a subclass known as stable RKHSs has been also introduced in the setting of linear system identification. Stable RKHSs contain only absolutely integrable impulse responses over the positive real line. Hence, they can be adopted as hypothesis spaces to estimate linear, time-invariant and BIBO stable dynamic systems from input-output data. Necessary and sufficient conditions for RKHS stability are available in the literature and it is known that kernel absolute integrability implies stability. Working in discrete-time, in a recent work we have proved that this latter condition is only sufficient. Working in continuous-time, it is the purpose of this note to prove that the same result holds also for Mercer kernels.

SYMay 1, 2023
On the stability test for reproducing kernel Hilbert spaces

Mauro Bisiacco, Gianluigi Pillonetto

Reproducing kernel Hilbert spaces (RKHSs) are special Hilbert spaces where all the evaluation functionals are linear and bounded. They are in one-to-one correspondence with positive definite maps called kernels. Stable RKHSs enjoy the additional property of containing only functions and absolutely integrable. Necessary and sufficient conditions for RKHS stability are known in the literature: the integral operator induced by the kernel must be bounded as map between $\mathcal{L}_{\infty}$, the space of essentially bounded (test) functions, and $\mathcal{L}_1$, the space of absolutely integrable functions. Considering Mercer (continuous) kernels in continuous-time and the entire discrete-time class, we show that the stability test can be reduced to the study of the kernel operator over test functions which assume (almost everywhere) only the values $\pm 1$. They represent the same functions needed to investigate stability of any single element in the RKHS. In this way, the RKHS stability test becomes an elegant generalization of a straightforward result concerning Bounded-Input Bounded-Output (BIBO) stability of a single linear time-invariant system.

LGMay 6, 2020
Mathematical foundations of stable RKHSs

Mauro Bisiacco, Gianluigi Pillonetto

Reproducing kernel Hilbert spaces (RKHSs) are key spaces for machine learning that are becoming popular also for linear system identification. In particular, the so-called stable RKHSs can be used to model absolutely summable impulse responses. In combination e.g. with regularized least squares they can then be used to reconstruct dynamic systems from input-output data. In this paper we provide new structural properties of stable RKHSs. The relation between stable kernels and other fundamental classes, like those containing absolutely summable or finite-trace kernels, is elucidated. These insights are then brought into the feature space context. First, it is proved that any stable kernel admits feature maps induced by a basis of orthogonal eigenvectors in l2. The exact connection with classical system identification approaches that exploit such kind of functions to model impulse responses is also provided. Then, the necessary and sufficient stability condition for RKHSs designed by formulating kernel eigenvectors and eigenvalues is obtained. Overall, our new results provide novel mathematical foundations of stable RKHSs with impact on stability tests, impulse responses modeling and computational efficiency of regularized schemes for linear system identification.

LGSep 5, 2019
Kernel absolute summability is only sufficient for RKHS stability

Mauro Bisiacco, Gianluigi Pillonetto

Regularized approaches have been successfully applied to linear system identification in recent years. Many of them model unknown impulse responses exploiting the so called Reproducing Kernel Hilbert spaces (RKHSs) that enjoy the notable property of being in one-to-one correspondence with the class of positive semidefinite kernels. The necessary and sufficient condition for a RKHS to be stable, i.e. to contain only BIBO stable linear dynamic systems, has been known in the literature at least since 2006. However, an open question still persists and concerns the equivalence of such condition with the absolute summability of the kernel. This paper provides a definite answer to this matter by proving that such correspondence does not hold. A counterexample is introduced that illustrates the existence of stable RKHSs that are induced by non-absolutely summable kernels.

LGMay 20, 2019
A novel Multiplicative Polynomial Kernel for Volterra series identification

Alberto Dalla Libera, Ruggero Carli, Gianluigi Pillonetto

Volterra series are especially useful for nonlinear system identification, also thanks to their capability to approximate a broad range of input-output maps. However, their identification from a finite set of data is hard, due to the curse of dimensionality. Recent approaches have shown how regularized kernel-based methods can be useful for this task. In this paper, we propose a new regularization network for Volterra models identification. It relies on a new kernel given by the product of basic building blocks. Each block contains some unknown parameters that can be estimated from data using marginal likelihood optimization. In comparison with other algorithms proposed in the literature, numerical experiments show that our approach allows to better select the monomials that really influence the system output, much increasing the prediction capability of the model.

OCMar 7, 2018
Fast Robust Methods for Singular State-Space Models

Jonathan Jonker, Aleksandr Y. Aravkin, James V. Burke et al.

State-space models are used in a wide range of time series analysis formulations. Kalman filtering and smoothing are work-horse algorithms in these settings. While classic algorithms assume Gaussian errors to simplify estimation, recent advances use a broader range of optimization formulations to allow outlier-robust estimation, as well as constraints to capture prior information. Here we develop methods on state-space models where either innovations or error covariances may be singular. These models frequently arise in navigation (e.g. for `colored noise' models or deterministic integrals) and are ubiquitous in auto-correlated time series models such as ARMA. We reformulate all state-space models (singular as well as nonsinguar) as constrained convex optimization problems, and develop an efficient algorithm for this reformulation. The convergence rate is {\it locally linear}, with constants that do not depend on the conditioning of the problem. Numerical comparisons show that the new approach outperforms competing approaches for {\it nonsingular} models, including state of the art interior point (IP) methods. IP methods converge at superlinear rates; we expect them to dominate. However, the steep rate of the proposed approach (independent of problem conditioning) combined with cheap iterations wins against IP in a run-time comparison. We therefore suggest that the proposed approach be the {\it default choice} for estimating state space models outside of the Gaussian context, regardless of whether the error covariances are singular or not.

MLJun 8, 2017
The Generalized Cross Validation Filter

Giulio Bottegal, Gianluigi Pillonetto

Generalized cross validation (GCV) is one of the most important approaches used to estimate parameters in the context of inverse problems and regularization techniques. A notable example is the determination of the smoothness parameter in splines. When the data are generated by a state space model, like in the spline case, efficient algorithms are available to evaluate the GCV score with complexity that scales linearly in the data set size. However, these methods are not amenable to on-line applications since they rely on forward and backward recursions. Hence, if the objective has been evaluated at time $t-1$ and new data arrive at time t, then O(t) operations are needed to update the GCV score. In this paper we instead show that the update cost is $O(1)$, thus paving the way to the on-line use of GCV. This result is obtained by deriving the novel GCV filter which extends the classical Kalman filter equations to efficiently propagate the GCV score over time. We also illustrate applications of the new filter in the context of state estimation and on-line regularized linear system identification.

LGMay 3, 2017
Efficient Spatio-Temporal Gaussian Regression via Kalman Filtering

Marco Todescato, Andrea Carron, Ruggero Carli et al.

In this work we study the non-parametric reconstruction of spatio-temporal dynamical Gaussian processes (GPs) via GP regression from sparse and noisy data. GPs have been mainly applied to spatial regression where they represent one of the most powerful estimation approaches also thanks to their universal representing properties. Their extension to dynamical processes has been instead elusive so far since classical implementations lead to unscalable algorithms. We then propose a novel procedure to address this problem by coupling GP regression and Kalman filtering. In particular, assuming space/time separability of the covariance (kernel) of the process and rational time spectrum, we build a finite-dimensional discrete-time state-space process representation amenable of Kalman filtering. With sampling over a finite set of fixed spatial locations, our major finding is that the Kalman filter state at instant $t_k$ represents a sufficient statistic to compute the minimum variance estimate of the process at any $t \geq t_k$ over the entire spatial domain. This result can be interpreted as a novel Kalman representer theorem for dynamical GPs. We then extend the study to situations where the set of spatial input locations can vary over time. The proposed algorithms are finally tested on both synthetic and real field data, also providing comparisons with standard GP and truncated GP regression techniques.

SYDec 29, 2016
The interplay between system identification and machine learning

Gianluigi Pillonetto

Learning from examples is one of the key problems in science and engineering. It deals with function reconstruction from a finite set of direct and noisy samples. Regularization in reproducing kernel Hilbert spaces (RKHSs) is widely used to solve this task and includes powerful estimators such as regularization networks. Recent achievements include the proof of the statistical consistency of these kernel- based approaches. Parallel to this, many different system identification techniques have been developed but the interaction with machine learning does not appear so strong yet. One reason is that the RKHSs usually employed in machine learning do not embed the information available on dynamic systems, e.g. BIBO stability. In addition, in system identification the independent data assumptions routinely adopted in machine learning are never satisfied in practice. This paper provides new results which strengthen the connection between system identification and machine learning. Our starting point is the introduction of RKHSs of dynamic systems. They contain functionals over spaces defined by system inputs and allow to interpret system identification as learning from examples. In both linear and nonlinear settings, it is shown that this perspective permits to derive in a relatively simple way conditions on RKHS stability (i.e. the property of containing only BIBO stable systems or predictors), also facilitating the design of new kernels for system identification. Furthermore, we prove the convergence of the regularized estimator to the optimal predictor under conditions typical of dynamic systems.

SYOct 3, 2016
A new kernel-based approach to system identification with quantized output data

Giulio Bottegal, Håkan Hjalmarsson, Gianluigi Pillonetto

In this paper we introduce a novel method for linear system identification with quantized output data. We model the impulse response as a zero-mean Gaussian process whose covariance (kernel) is given by the recently proposed stable spline kernel, which encodes information on regularity and exponential stability. This serves as a starting point to cast our system identification problem into a Bayesian framework. We employ Markov Chain Monte Carlo methods to provide an estimate of the system. In particular, we design two methods based on the so-called Gibbs sampler that allow also to estimate the kernel hyperparameters by marginal likelihood maximization via the expectation-maximization method. Numerical simulations show the effectiveness of the proposed scheme, as compared to the state-of-the-art kernel-based methods when these are employed in system identification with quantized data.

MLAug 8, 2016
Boosting as a kernel-based method

Aleksandr Y. Aravkin, Giulio Bottegal, Gianluigi Pillonetto

Boosting combines weak (biased) learners to obtain effective learning algorithms for classification and prediction. In this paper, we show a connection between boosting and kernel-based methods, highlighting both theoretical and practical applications. In the context of $\ell_2$ boosting, we start with a weak linear learner defined by a kernel $K$. We show that boosting with this learner is equivalent to estimation with a special {\it boosting kernel} that depends on $K$, as well as on the regression matrix, noise variance, and hyperparameters. The number of boosting iterations is modeled as a continuous hyperparameter, and fit along with other parameters using standard techniques. We then generalize the boosting kernel to a broad new class of boosting approaches for more general weak learners, including those based on the $\ell_1$, hinge and Vapnik losses. The approach allows fast hyperparameter tuning for this general class, and has a wide range of applications, including robust regression and classification. We illustrate some of these applications with numerical examples on synthetic and real data.

SYJan 17, 2016
On-line Bayesian System Identification

Diego Romeres, Giulia Prando, Gianluigi Pillonetto et al.

We consider an on-line system identification setting, in which new data become available at given time steps. In order to meet real-time estimation requirements, we propose a tailored Bayesian system identification procedure, in which the hyper-parameters are still updated through Marginal Likelihood maximization, but after only one iteration of a suitable iterative optimization algorithm. Both gradient methods and the EM algorithm are considered for the Marginal Likelihood optimization. We compare this "1-step" procedure with the standard one, in which the optimization method is run until convergence to a local minimum. The experiments we perform confirm the effectiveness of the approach we propose.

SYAug 12, 2015
Maximum Entropy Vector Kernels for MIMO system identification

Giulia Prando, Gianluigi Pillonetto, Alessandro Chiuso

Recent contributions have framed linear system identification as a nonparametric regularized inverse problem. Relying on $\ell_2$-type regularization which accounts for the stability and smoothness of the impulse response to be estimated, these approaches have been shown to be competitive w.r.t classical parametric methods. In this paper, adopting Maximum Entropy arguments, we derive a new $\ell_2$ penalty deriving from a vector-valued kernel; to do so we exploit the structure of the Hankel matrix, thus controlling at the same time complexity, measured by the McMillan degree, stability and smoothness of the identified models. As a special case we recover the nuclear norm penalty on the squared block Hankel matrix. In contrast with previous literature on reweighted nuclear norm penalties, our kernel is described by a small number of hyper-parameters, which are iteratively updated through marginal likelihood maximization; constraining the structure of the kernel acts as a (hyper)regularizer which helps controlling the effective degrees of freedom of our estimator. To optimize the marginal likelihood we adapt a Scaled Gradient Projection (SGP) algorithm which is proved to be significantly computationally cheaper than other first and second order off-the-shelf optimization methods. The paper also contains an extensive comparison with many state-of-the-art methods on several Monte-Carlo studies, which confirms the effectiveness of our procedure.

SYJul 2, 2015
Regularized linear system identification using atomic, nuclear and kernel-based norms: the role of the stability constraint

Gianluigi Pillonetto, Tianshi Chen, Alessandro Chiuso et al.

Inspired by ideas taken from the machine learning literature, new regularization techniques have been recently introduced in linear system identification. In particular, all the adopted estimators solve a regularized least squares problem, differing in the nature of the penalty term assigned to the impulse response. Popular choices include atomic and nuclear norms (applied to Hankel matrices) as well as norms induced by the so called stable spline kernels. In this paper, a comparative study of estimators based on these different types of regularizers is reported. Our findings reveal that stable spline kernels outperform approaches based on atomic and nuclear norms since they suitably embed information on impulse response stability and smoothness. This point is illustrated using the Bayesian interpretation of regularization. We also design a new class of regularizers defined by "integral" versions of stable spline/TC kernels. Under quite realistic experimental conditions, the new estimators outperform classical prediction error methods also when the latter are equipped with an oracle for model order selection.

MLJul 2, 2015
Identification of stable models via nonparametric prediction error methods

Diego Romeres, Gianluigi Pillonetto, Alessandro Chiuso

A new Bayesian approach to linear system identification has been proposed in a series of recent papers. The main idea is to frame linear system identification as predictor estimation in an infinite dimensional space, with the aid of regularization/Bayesian techniques. This approach guarantees the identification of stable predictors based on the prediction error minimization. Unluckily, the stability of the predictors does not guarantee the stability of the impulse response of the system. In this paper we propose and compare various techniques to address this issue. Simulations results comparing these techniques will be provided.

SYApr 26, 2015
Bayesian kernel-based system identification with quantized output data

Giulio Bottegal, Gianluigi Pillonetto, Håkan Hjalmarsson

In this paper we introduce a novel method for linear system identification with quantized output data. We model the impulse response as a zero-mean Gaussian process whose covariance (kernel) is given by the recently proposed stable spline kernel, which encodes information on regularity and exponential stability. This serves as a starting point to cast our system identification problem into a Bayesian framework. We employ Markov Chain Monte Carlo (MCMC) methods to provide an estimate of the system. In particular, we show how to design a Gibbs sampler which quickly converges to the target distribution. Numerical simulations show a substantial improvement in the accuracy of the estimates over state-of-the-art kernel-based methods when employed in identification of systems with quantized data.

SYApr 13, 2015
Maximum entropy properties of discrete-time first-order stable spline kernel

Tianshi Chen, Tohid Ardeshiri, Francesca P. Carli et al.

The first order stable spline (SS-1) kernel is used extensively in regularized system identification. In particular, the stable spline estimator models the impulse response as a zero-mean Gaussian process whose covariance is given by the SS-1 kernel. In this paper, we discuss the maximum entropy properties of this prior. In particular, we formulate the exact maximum entropy problem solved by the SS-1 kernel without Gaussian and uniform sampling assumptions. Under general sampling schemes, we also explicitly derive the special structure underlying the SS-1 kernel (e.g. characterizing the tridiagonal nature of its inverse), also giving to it a maximum entropy covariance completion interpretation. Along the way similar maximum entropy properties of the Wiener kernel are also given.

SYNov 21, 2014
Robust EM kernel-based methods for linear system identification

Giulio Bottegal, Aleksandr Y. Aravkin, Håkan Hjalmarsson et al.

Recent developments in system identification have brought attention to regularized kernel-based methods. This type of approach has been proven to compare favorably with classic parametric methods. However, current formulations are not robust with respect to outliers. In this paper, we introduce a novel method to robustify kernel-based system identification methods. To this end, we model the output measurement noise using random variables with heavy-tailed probability density functions (pdfs), focusing on the Laplacian and the Student's t distributions. Exploiting the representation of these pdfs as scale mixtures of Gaussians, we cast our system identification problem into a Gaussian process regression framework, which requires estimating a number of hyperparameters of the data size order. To overcome this difficulty, we design a new maximum a posteriori (MAP) estimator of the hyperparameters, and solve the related optimization problem with a novel iterative scheme based on the Expectation-Maximization (EM) method. In presence of outliers, tests on simulated data and on a real system show a substantial performance improvement compared to currently used kernel-based methods for linear system identification.

SYSep 29, 2014
Bayesian and regularization approaches to multivariable linear system identification: the role of rank penalties

Giulia Prando, Alessandro Chiuso, Gianluigi Pillonetto

Recent developments in linear system identification have proposed the use of non-parameteric methods, relying on regularization strategies, to handle the so-called bias/variance trade-off. This paper introduces an impulse response estimator which relies on an $\ell_2$-type regularization including a rank-penalty derived using the log-det heuristic as a smooth approximation to the rank function. This allows to account for different properties of the estimated impulse response (e.g. smoothness and stability) while also penalizing high-complexity models. This also allows to account and enforce coupling between different input-output channels in MIMO systems. According to the Bayesian paradigm, the parameters defining the relative weight of the two regularization terms as well as the structure of the rank penalty are estimated optimizing the marginal likelihood. Once these hyperameters have been estimated, the impulse response estimate is available in closed form. Experiments show that the proposed method is superior to the estimator relying on the "classic" $\ell_2$-regularization alone as well as those based in atomic and nuclear norm.

MAJul 22, 2014
Multi-agents adaptive estimation and coverage control using Gaussian regression

Andrea Carron, Marco Todescato, Ruggero Carli et al.

We consider a scenario where the aim of a group of agents is to perform the optimal coverage of a region according to a sensory function. In particular, centroidal Voronoi partitions have to be computed. The difficulty of the task is that the sensory function is unknown and has to be reconstructed on line from noisy measurements. Hence, estimation and coverage needs to be performed at the same time. We cast the problem in a Bayesian regression framework, where the sensory function is seen as a Gaussian random field. Then, we design a set of control inputs which try to well balance coverage and estimation, also discussing convergence properties of the algorithm. Numerical experiments show the effectivness of the new approach.

MLDec 21, 2013
Outlier robust system identification: a Bayesian kernel-based approach

Giulio Bottegal, Aleksandr Y. Aravkin, Hakan Hjalmarsson et al.

In this paper, we propose an outlier-robust regularized kernel-based method for linear system identification. The unknown impulse response is modeled as a zero-mean Gaussian process whose covariance (kernel) is given by the recently proposed stable spline kernel, which encodes information on regularity and exponential stability. To build robustness to outliers, we model the measurement noise as realizations of independent Laplacian random variables. The identification problem is cast in a Bayesian framework, and solved by a new Markov Chain Monte Carlo (MCMC) scheme. In particular, exploiting the representation of the Laplacian random variables as scale mixtures of Gaussians, we design a Gibbs sampler which quickly converges to the target distribution. Numerical simulations show a substantial improvement in the accuracy of the estimates over state-of-the-art kernel-based methods.

MLMar 12, 2013
Linear system identification using stable spline kernels and PLQ penalties

Aleksandr Y. Aravkin, James V. Burke, Gianluigi Pillonetto

The classical approach to linear system identification is given by parametric Prediction Error Methods (PEM). In this context, model complexity is often unknown so that a model order selection step is needed to suitably trade-off bias and variance. Recently, a different approach to linear system identification has been introduced, where model order determination is avoided by using a regularized least squares framework. In particular, the penalty term on the impulse response is defined by so called stable spline kernels. They embed information on regularity and BIBO stability, and depend on a small number of parameters which can be estimated from data. In this paper, we provide new nonsmooth formulations of the stable spline estimator. In particular, we consider linear system identification problems in a very broad context, where regularization functionals and data misfits can come from a rich set of piecewise linear quadratic functions. Moreover, our anal- ysis includes polyhedral inequality constraints on the unknown impulse response. For any formulation in this class, we show that interior point methods can be used to solve the system identification problem, with complexity O(n3)+O(mn2) in each iteration, where n and m are the number of impulse response coefficients and measurements, respectively. The usefulness of the framework is illustrated via a numerical experiment where output measurements are contaminated by outliers.

MLFeb 26, 2013
Convex vs nonconvex approaches for sparse estimation: GLasso, Multiple Kernel Learning and Hyperparameter GLasso

Aleksandr Y. Aravkin, James V. Burke, Alessandro Chiuso et al.

The popular Lasso approach for sparse estimation can be derived via marginalization of a joint density associated with a particular stochastic model. A different marginalization of the same probabilistic model leads to a different non-convex estimator where hyperparameters are optimized. Extending these arguments to problems where groups of variables have to be estimated, we study a computational scheme for sparse estimation that differs from the Group Lasso. Although the underlying optimization problem defining this estimator is non-convex, an initialization strategy based on a univariate Bayesian forward selection scheme is presented. This also allows us to define an effective non-convex estimator where only one scalar variable is involved in the optimization process. Theoretical arguments, independent of the correctness of the priors entering the sparse model, are included to clarify the advantages of this non-convex technique in comparison with other convex estimators. Numerical experiments are also used to compare the performance of these approaches.

MLJan 22, 2013
The connection between Bayesian estimation of a Gaussian random field and RKHS

Aleksandr Y. Aravkin, Bradley M. Bell, James V. Burke et al.

Reconstruction of a function from noisy data is often formulated as a regularized optimization problem over an infinite-dimensional reproducing kernel Hilbert space (RKHS). The solution describes the observed data and has a small RKHS norm. When the data fit is measured using a quadratic loss, this estimator has a known statistical interpretation. Given the noisy measurements, the RKHS estimate represents the posterior mean (minimum variance estimate) of a Gaussian random field with covariance proportional to the kernel associated with the RKHS. In this paper, we provide a statistical interpretation when more general losses are used, such as absolute value, Vapnik or Huber. Specifically, for any finite set of sampling locations (including where the data were collected), the MAP estimate for the signal samples is given by the RKHS estimate evaluated at these locations.

MLJan 19, 2013
Sparse/Robust Estimation and Kalman Smoothing with Nonsmooth Log-Concave Densities: Modeling, Computation, and Theory

Aleksandr Y. Aravkin, James V. Burke, Gianluigi Pillonetto

We introduce a class of quadratic support (QS) functions, many of which play a crucial role in a variety of applications, including machine learning, robust statistical inference, sparsity promotion, and Kalman smoothing. Well known examples include the l2, Huber, l1 and Vapnik losses. We build on a dual representation for QS functions using convex analysis, revealing the structure necessary for a QS function to be interpreted as the negative log of a probability density, and providing the foundation for statistical interpretation and analysis of QS loss functions. For a subclass of QS functions called piecewise linear quadratic (PLQ) penalties, we also develop efficient numerical estimation schemes. These components form a flexible statistical modeling framework for a variety of learning applications, together with a toolbox of efficient numerical methods for inference. In particular, for PLQ densities, interior point (IP) methods can be used. IP methods solve nonsmooth optimization problems by working directly with smooth systems of equations characterizing their optimality. The efficiency of the IP approach depends on the structure of particular applications. We consider the class of dynamic inverse problems using Kalman smoothing, where the aim is to reconstruct the state of a dynamical system with known process and measurement models starting from noisy output samples. In the classical case, Gaussian errors are assumed in the process and measurement models. The extended framework allows arbitrary PLQ densities to be used, and the proposed IP approach solves the generalized Kalman smoothing problem while maintaining the linear complexity in the size of the time series, just as in the Gaussian case. This extends the computational efficiency of classic algorithms to a much broader nonsmooth setting, and includes many recently proposed robust and sparse smoothers as special cases.