SYFeb 13, 2018
The role of noise modeling in the estimation of resting-state brain effective connectivityGiulia Prando, Mattia Zorzi, Alessandra Bertoldo et al.
Causal relations among neuronal populations of the brain are studied through the so-called effective connectivity (EC) network. The latter is estimated from EEG or fMRI measurements, by inverting a generative model of the corresponding data. It is clear that the goodness of the estimated network heavily depends on the underlying modeling assumptions. In this present paper we consider the EC estimation problem using fMRI data in resting-state condition. Specifically, we investigate on how to model endogenous fluctuations driving the neuronal activity.
LGJun 6, 2021
A novel Deep Neural Network architecture for non-linear system identificationLuca Zancato, Alessandro Chiuso
We present a novel Deep Neural Network (DNN) architecture for non-linear system identification. We foster generalization by constraining DNN representational power. To do so, inspired by fading memory systems, we introduce inductive bias (on the architecture) and regularization (on the loss function). This architecture allows for automatic complexity selection based solely on available data, in this way the number of hyper-parameters that must be chosen by the user is reduced. Exploiting the highly parallelizable DNN framework (based on Stochastic optimization methods) we successfully apply our method to large scale datasets.
LGMar 25, 2021
Estimating Koopman operators for nonlinear dynamical systems: a nonparametric approachFrancesco Zanini, Alessandro Chiuso
The Koopman operator is a mathematical tool that allows for a linear description of non-linear systems, but working in infinite dimensional spaces. Dynamic Mode Decomposition and Extended Dynamic Mode Decomposition are amongst the most popular finite dimensional approximation. In this paper we capture their core essence as a dual version of the same framework, incorporating them into the Kernel framework. To do so, we leverage the RKHS as a suitable space for learning the Koopman dynamics, thanks to its intrinsic finite-dimensional nature, shaped by data. We finally establish a strong link between kernel methods and Koopman operators, leading to the estimation of the latter through Kernel functions. We provide also simulations for comparison with standard procedures.
LGSep 13, 2018
Derivative-free online learning of inverse dynamics modelsDiego Romeres, Mattia Zorzi, Raffaello Camoriano et al.
This paper discusses online algorithms for inverse dynamics modelling in robotics. Several model classes including rigid body dynamics (RBD) models, data-driven models and semiparametric models (which are a combination of the previous two classes) are placed in a common framework. While model classes used in the literature typically exploit joint velocities and accelerations, which need to be approximated resorting to numerical differentiation schemes, in this paper a new `derivative-free' framework is proposed that does not require this preprocessing step. An extensive experimental study with real data from the right arm of the iCub robot is presented, comparing different model classes and estimation procedures, showing that the proposed `derivative-free' methods outperform existing methodologies.
SYSep 23, 2016
Online Identification of Time-Varying Systems: a Bayesian approachGiulia Prando, Diego Romeres, Alessandro Chiuso
We extend the recently introduced regularization/Bayesian System Identification procedures to the estimation of time-varying systems. Specifically, we consider an online setting, in which new data become available at given time steps. The real-time estimation requirements imposed by this setting are met by estimating the hyper-parameters through just one gradient step in the marginal likelihood maximization and by exploiting the closed-form availability of the impulse response estimate (when Gaussian prior and Gaussian measurement noise are postulated). By relying on the use of a forgetting factor, we propose two methods to tackle the tracking of time-varying systems. In one of them, the forgetting factor is estimated by treating it as a hyper-parameter of the Bayesian inference procedure.
OCMar 17, 2016
Online semi-parametric learning for inverse dynamics modelingDiego Romeres, Mattia Zorzi, Raffaello Camoriano et al.
This paper presents a semi-parametric algorithm for online learning of a robot inverse dynamics model. It combines the strength of the parametric and non-parametric modeling. The former exploits the rigid body dynamics equa- tion, while the latter exploits a suitable kernel function. We provide an extensive comparison with other methods from the literature using real data from the iCub humanoid robot. In doing so we also compare two different techniques, namely cross validation and marginal likelihood optimization, for estimating the hyperparameters of the kernel function.
SYJan 17, 2016
On-line Bayesian System IdentificationDiego Romeres, Giulia Prando, Gianluigi Pillonetto et al.
We consider an on-line system identification setting, in which new data become available at given time steps. In order to meet real-time estimation requirements, we propose a tailored Bayesian system identification procedure, in which the hyper-parameters are still updated through Marginal Likelihood maximization, but after only one iteration of a suitable iterative optimization algorithm. Both gradient methods and the EM algorithm are considered for the Marginal Likelihood optimization. We compare this "1-step" procedure with the standard one, in which the optimization method is run until convergence to a local minimum. The experiments we perform confirm the effectiveness of the approach we propose.
SYAug 12, 2015
Maximum Entropy Vector Kernels for MIMO system identificationGiulia Prando, Gianluigi Pillonetto, Alessandro Chiuso
Recent contributions have framed linear system identification as a nonparametric regularized inverse problem. Relying on $\ell_2$-type regularization which accounts for the stability and smoothness of the impulse response to be estimated, these approaches have been shown to be competitive w.r.t classical parametric methods. In this paper, adopting Maximum Entropy arguments, we derive a new $\ell_2$ penalty deriving from a vector-valued kernel; to do so we exploit the structure of the Hankel matrix, thus controlling at the same time complexity, measured by the McMillan degree, stability and smoothness of the identified models. As a special case we recover the nuclear norm penalty on the squared block Hankel matrix. In contrast with previous literature on reweighted nuclear norm penalties, our kernel is described by a small number of hyper-parameters, which are iteratively updated through marginal likelihood maximization; constraining the structure of the kernel acts as a (hyper)regularizer which helps controlling the effective degrees of freedom of our estimator. To optimize the marginal likelihood we adapt a Scaled Gradient Projection (SGP) algorithm which is proved to be significantly computationally cheaper than other first and second order off-the-shelf optimization methods. The paper also contains an extensive comparison with many state-of-the-art methods on several Monte-Carlo studies, which confirms the effectiveness of our procedure.
SYJul 2, 2015
Regularized linear system identification using atomic, nuclear and kernel-based norms: the role of the stability constraintGianluigi Pillonetto, Tianshi Chen, Alessandro Chiuso et al.
Inspired by ideas taken from the machine learning literature, new regularization techniques have been recently introduced in linear system identification. In particular, all the adopted estimators solve a regularized least squares problem, differing in the nature of the penalty term assigned to the impulse response. Popular choices include atomic and nuclear norms (applied to Hankel matrices) as well as norms induced by the so called stable spline kernels. In this paper, a comparative study of estimators based on these different types of regularizers is reported. Our findings reveal that stable spline kernels outperform approaches based on atomic and nuclear norms since they suitably embed information on impulse response stability and smoothness. This point is illustrated using the Bayesian interpretation of regularization. We also design a new class of regularizers defined by "integral" versions of stable spline/TC kernels. Under quite realistic experimental conditions, the new estimators outperform classical prediction error methods also when the latter are equipped with an oracle for model order selection.
MLJul 2, 2015
Identification of stable models via nonparametric prediction error methodsDiego Romeres, Gianluigi Pillonetto, Alessandro Chiuso
A new Bayesian approach to linear system identification has been proposed in a series of recent papers. The main idea is to frame linear system identification as predictor estimation in an infinite dimensional space, with the aid of regularization/Bayesian techniques. This approach guarantees the identification of stable predictors based on the prediction error minimization. Unluckily, the stability of the predictors does not guarantee the stability of the impulse response of the system. In this paper we propose and compare various techniques to address this issue. Simulations results comparing these techniques will be provided.
OCMar 25, 2015
A Bayesian Approach to Sparse plus Low rank Network IdentificationMattia Zorzi, Alessandro Chiuso
We consider the problem of modeling multivariate time series with parsimonious dynamical models which can be represented as sparse dynamic Bayesian networks with few latent nodes. This structure translates into a sparse plus low rank model. In this paper, we propose a Gaussian regression approach to identify such a model.
SYApr 13, 2015
Maximum entropy properties of discrete-time first-order stable spline kernelTianshi Chen, Tohid Ardeshiri, Francesca P. Carli et al.
The first order stable spline (SS-1) kernel is used extensively in regularized system identification. In particular, the stable spline estimator models the impulse response as a zero-mean Gaussian process whose covariance is given by the SS-1 kernel. In this paper, we discuss the maximum entropy properties of this prior. In particular, we formulate the exact maximum entropy problem solved by the SS-1 kernel without Gaussian and uniform sampling assumptions. Under general sampling schemes, we also explicitly derive the special structure underlying the SS-1 kernel (e.g. characterizing the tridiagonal nature of its inverse), also giving to it a maximum entropy covariance completion interpretation. Along the way similar maximum entropy properties of the Wiener kernel are also given.
RODec 16, 2014
Robust Inference for Visual-Inertial Sensor FusionKonstantine Tsotsos, Alessandro Chiuso, Stefano Soatto
Inference of three-dimensional motion from the fusion of inertial and visual sensory data has to contend with the preponderance of outliers in the latter. Robust filtering deals with the joint inference and classification task of selecting which data fits the model, and estimating its state. We derive the optimal discriminant and propose several approximations, some used in the literature, others new. We compare them analytically, by pointing to the assumptions underlying their approximations, and empirically. We show that the best performing method improves the performance of state-of-the-art visual-inertial sensor fusion systems, while retaining the same computational complexity.
CVNov 27, 2014
Visual Representations: Defining Properties and Deep ApproximationsStefano Soatto, Alessandro Chiuso
Visual representations are defined in terms of minimal sufficient statistics of visual data, for a class of tasks, that are also invariant to nuisance variability. Minimal sufficiency guarantees that we can store a representation in lieu of raw data with smallest complexity and no performance loss on the task at hand. Invariance guarantees that the statistic is constant with respect to uninformative transformations of the data. We derive analytical expressions for such representations and show they are related to feature descriptors commonly used in computer vision, as well as to convolutional neural networks. This link highlights the assumptions and approximations tacitly assumed by these methods and explains empirical practices such as clamping, pooling and joint normalization.
SYSep 29, 2014
Bayesian and regularization approaches to multivariable linear system identification: the role of rank penaltiesGiulia Prando, Alessandro Chiuso, Gianluigi Pillonetto
Recent developments in linear system identification have proposed the use of non-parameteric methods, relying on regularization strategies, to handle the so-called bias/variance trade-off. This paper introduces an impulse response estimator which relies on an $\ell_2$-type regularization including a rank-penalty derived using the log-det heuristic as a smooth approximation to the rank function. This allows to account for different properties of the estimated impulse response (e.g. smoothness and stability) while also penalizing high-complexity models. This also allows to account and enforce coupling between different input-output channels in MIMO systems. According to the Bayesian paradigm, the parameters defining the relative weight of the two regularization terms as well as the structure of the rank penalty are estimated optimizing the marginal likelihood. Once these hyperameters have been estimated, the impulse response estimate is available in closed form. Experiments show that the proposed method is superior to the estimator relying on the "classic" $\ell_2$-regularization alone as well as those based in atomic and nuclear norm.
NAJun 25, 2014
A scaled gradient projection method for Bayesian learning in dynamical systemsSilvia Bonettini, Alessandro Chiuso, Marco Prato
A crucial task in system identification problems is the selection of the most appropriate model class, and is classically addressed resorting to cross-validation or using asymptotic arguments. As recently suggested in the literature, this can be addressed in a Bayesian framework, where model complexity is regulated by few hyperparameters, which can be estimated via marginal likelihood maximization. It is thus of primary importance to design effective optimization methods to solve the corresponding optimization problem. If the unknown impulse response is modeled as a Gaussian process with a suitable kernel, the maximization of the marginal likelihood leads to a challenging nonconvex optimization problem, which requires a stable and effective solution strategy. In this paper we address this problem by means of a scaled gradient projection algorithm, in which the scaling matrix and the steplength parameter play a crucial role to provide a meaning solution in a computational time comparable with second order methods. In particular, we propose both a generalization of the split gradient approach to design the scaling matrix in the presence of box constraints, and an effective implementation of the gradient and objective function. The extensive numerical experiments carried out on several test problems show that our method is very effective in providing in few tenths of a second solutions of the problems with accuracy comparable with state-of-the-art approaches. Moreover, the flexibility of the proposed strategy makes it easily adaptable to a wider range of problems arising in different areas of machine learning, signal processing and system identification.
MLFeb 26, 2013
Convex vs nonconvex approaches for sparse estimation: GLasso, Multiple Kernel Learning and Hyperparameter GLassoAleksandr Y. Aravkin, James V. Burke, Alessandro Chiuso et al.
The popular Lasso approach for sparse estimation can be derived via marginalization of a joint density associated with a particular stochastic model. A different marginalization of the same probabilistic model leads to a different non-convex estimator where hyperparameters are optimized. Extending these arguments to problems where groups of variables have to be estimated, we study a computational scheme for sparse estimation that differs from the Group Lasso. Although the underlying optimization problem defining this estimator is non-convex, an initialization strategy based on a univariate Bayesian forward selection scheme is presented. This also allows us to define an effective non-convex estimator where only one scalar variable is involved in the optimization process. Theoretical arguments, independent of the correctness of the priors entering the sparse model, are included to clarify the advantages of this non-convex technique in comparison with other convex estimators. Numerical experiments are also used to compare the performance of these approaches.