SYMar 26, 2018
Parametric Identification Using Weighted Null-Space FittingMiguel Galrinho, Cristian R. Rojas, Hakan Hjalmarsson
In identification of dynamical systems, the prediction error method using a quadratic cost function provides asymptotically efficient estimates under Gaussian noise and additional mild assumptions, but in general it requires solving a non-convex optimization problem. An alternative class of methods uses a non-parametric model as intermediate step to obtain the model of interest. Weighted null-space fitting (WNSF) belongs to this class. It is a weighted least-squares method consisting of three steps. In the first step, a high-order ARX model is estimated. In a second least-squares step, this high-order estimate is reduced to a parametric estimate. In the third step, weighted least squares is used to reduce the variance of the estimates. The method is flexible in parametrization and suitable for both open- and closed-loop data. In this paper, we show that WNSF provides estimates with the same asymptotic properties as PEM with a quadratic cost function when the model orders are chosen according to the true system. Also, simulation studies indicate that WNSF may be competitive with state-of-the-art methods.
SYMar 21, 2013
Application Set Approximation in Optimal Input Design for Model Predictive ControlAfrooz Ebadat, Mariette Annergren, Christian A. Larsson et al.
This contribution considers one central aspect of experiment design in system identification. When a control design is based on an estimated model, the achievable performance is related to the quality of the estimate. The degradation in control performance due to errors in the estimated model is measured by an application cost function. In order to use an optimization based input design method, a convex approximation of the set of models that atisfies the control specification is required. The standard approach is to use a quadratic approximation of the application cost function, where the main computational effort is to find the corresponding Hessian matrix. Our main contribution is an alternative approach for this problem, which uses the structure of the underlying optimal control problem to considerably reduce the computations needed to find the application set. This technique allows the use of applications oriented input design for MPC on much more complex plants. The approach is numerically evaluated on a distillation control problem.
SYFeb 1, 2017
Asymptotically Efficient Identification of Known-Sensor Hidden Markov ModelsRobert Mattila, Cristian R. Rojas, Vikram Krishnamurthy et al.
We consider estimating the transition probability matrix of a finite-state finite-observation alphabet hidden Markov model with known observation probabilities. The main contribution is a two-step algorithm; a method of moments estimator (formulated as a convex optimization problem) followed by a single iteration of a Newton-Raphson maximum likelihood estimator. The two-fold contribution of this letter is, firstly, to theoretically show that the proposed estimator is consistent and asymptotically efficient, and secondly, to numerically show that the method is computationally less demanding than conventional methods - in particular for large data sets.
LGJun 12, 2023
DRCFS: Doubly Robust Causal Feature SelectionFrancesco Quinzan, Ashkan Soleymani, Patrick Jaillet et al.
Knowing the features of a complex system that are highly relevant to a particular target variable is of fundamental interest in many areas of science. Existing approaches are often limited to linear settings, sometimes lack guarantees, and in most cases, do not scale to the problem at hand, in particular to images. We propose DRCFS, a doubly robust feature selection method for identifying the causal features even in nonlinear and high dimensional settings. We provide theoretical guarantees, illustrate necessary conditions for our assumptions, and perform extensive experiments across a wide range of simulated and semi-synthetic datasets. DRCFS significantly outperforms existing state-of-the-art methods, selecting robust features even in challenging highly non-linear and high-dimensional problems.
SYApr 3, 2017
Computing monotone policies for Markov decision processes: a nearly-isotonic penalty approachRobert Mattila, Cristian R. Rojas, Vikram Krishnamurthy et al.
This paper discusses algorithms for solving Markov decision processes (MDPs) that have monotone optimal policies. We propose a two-stage alternating convex optimization scheme that can accelerate the search for an optimal policy by exploiting the monotone property. The first stage is a linear program formulated in terms of the joint state-action probabilities. The second stage is a regularized problem formulated in terms of the conditional probabilities of actions given states. The regularization uses techniques from nearly-isotonic regression. While a variety of iterative method can be used in the first formulation of the problem, we show in numerical simulations that, in particular, the alternating method of multipliers (ADMM) can be significantly accelerated using the regularization step.
SYJan 13, 2015
Variance Analysis of Linear SIMO Models with Spatially Correlated NoiseNiklas Everitt, Giulio Bottegal, Cristian R. Rojas et al.
Substantial improvement in accuracy of identified linear time-invariant single-input multi-output (SIMO) dynamical models is possible when the disturbances affecting the output measurements are spatially correlated. Using an orthogonal representation for the modules composing the SIMO structure, in this paper we show that the variance of a parameter estimate of a module is dependent on the model structure of the other modules, and the correlation structure of the disturbances. In addition, we quantify the variance-error for the parameter estimates for finite model orders, where the effect of noise correlation structure, model structure and signal spectra are visible. From these results, we derive the noise correlation structure under which the mentioned model parameterization gives the lowest variance, when one module is identified using less parameters than the other modules.
SYMar 22, 2018
An asymptotically optimal indirect approach to continuous-time system identificationRodrigo A. González, Cristian R. Rojas, James S. Welsh
The indirect approach to continuous-time system identification consists in estimating continuous-time models by first determining an appropriate discrete-time model. For a zero-order hold sampling mechanism, this approach usually leads to a transfer function estimate with relative degree 1, independent of the relative degree of the strictly proper real system. In this paper, a refinement of these methods is developed. Inspired by indirect PEM, we propose a method that enforces a fixed relative degree in the continuous-time transfer function estimate, and show that the resulting estimator is consistent and asymptotically efficient. Extensive numerical simulations are put forward to show the performance of this estimator when contrasted with other indirect and direct methods for continuous-time system identification.
LGApr 4, 2023
Optimal Transport for Correctional LearningRebecka Winqvist, Inês Lourenco, Francesco Quinzan et al.
The contribution of this paper is a generalized formulation of correctional learning using optimal transport, which is about how to optimally transport one mass distribution to another. Correctional learning is a framework developed to enhance the accuracy of parameter estimation processes by means of a teacher-student approach. In this framework, an expert agent, referred to as the teacher, modifies the data used by a learning agent, known as the student, to improve its estimation process. The objective of the teacher is to alter the data such that the student's estimation error is minimized, subject to a fixed intervention budget. Compared to existing formulations of correctional learning, our novel optimal transport approach provides several benefits. It allows for the estimation of more complex characteristics as well as the consideration of multiple intervention policies for the teacher. We evaluate our approach on two theoretical examples, and on a human-robot interaction application in which the teacher's role is to improve the robots performance in an inverse reinforcement learning setting.
SYNov 20, 2023
Unraveling the Control Engineer's Craft with Neural NetworksBraghadeesh Lakshminarayanan, Federico Dettù, Cristian R. Rojas et al.
Many industrial processes require suitable controllers to meet their performance requirements. More often, a sophisticated digital twin is available, which is a highly complex model that is a virtual representation of a given physical process, whose parameters may not be properly tuned to capture the variations in the physical process. In this paper, we present a sim2real, direct data-driven controller tuning approach, where the digital twin is used to generate input-output data and suitable controllers for several perturbations in its parameters. State-of-the art neural-network architectures are then used to learn the controller tuning rule that maps input-output data onto the controller parameters, based on artificially generated data from perturbed versions of the digital twin. In this way, as far as we are aware, we tackle for the first time the problem of re-calibrating the controller by meta-learning the tuning rule directly from data, thus practically replacing the control engineer with a machine learning model. The benefits of this methodology are illustrated via numerical simulations for several choices of neural-network architectures.
15.3LGMar 23
Computationally lightweight classifiers with frequentist bounds on predictionsShreeram Murali, Cristian R. Rojas, Dominik Baumann
While both classical and neural network classifiers can achieve high accuracy, they fall short on offering uncertainty bounds on their predictions, making them unfit for safety-critical applications. Existing kernel-based classifiers that provide such bounds scale with $\mathcal O (n^{\sim3})$ in time, making them computationally intractable for large datasets. To address this, we propose a novel, computationally efficient classification algorithm based on the Nadaraya-Watson estimator, for whose estimates we derive frequentist uncertainty intervals. We evaluate our classifier on synthetically generated data and on electrocardiographic heartbeat signals from the MIT-BIH Arrhythmia database. We show that the method achieves competitive accuracy $>$\SI{96}{\percent} at $\mathcal O(n)$ and $\mathcal O(\log n)$ operations, while providing actionable uncertainty bounds. These bounds can, e.g., aid in flagging low-confidence predictions, making them suitable for real-time settings with resource constraints, such as diagnostic monitoring or implantable devices.
SYMay 12, 2025
Safety and optimality in learning-based control at low computational costDominik Baumann, Krzysztof Kowalczyk, Cristian R. Rojas et al.
Applying machine learning methods to physical systems that are supposed to act in the real world requires providing safety guarantees. However, methods that include such guarantees often come at a high computational cost, making them inapplicable to large datasets and embedded devices with low computational power. In this paper, we propose CoLSafe, a computationally lightweight safe learning algorithm whose computational complexity grows sublinearly with the number of data points. We derive both safety and optimality guarantees and showcase the effectiveness of our algorithm on a seven-degrees-of-freedom robot arm.
LGSep 30, 2025
ACE: Adapting sampling for Counterfactual ExplanationsMargarita A. Guerrero, Cristian R. Rojas
Counterfactual Explanations (CFEs) interpret machine learning models by identifying the smallest change to input features needed to change the model's prediction to a desired output. For classification tasks, CFEs determine how close a given sample is to the decision boundary of a trained classifier. Existing methods are often sample-inefficient, requiring numerous evaluations of a black-box model -- an approach that is both costly and impractical when access to the model is limited. We propose Adaptive sampling for Counterfactual Explanations (ACE), a sample-efficient algorithm combining Bayesian estimation and stochastic optimization to approximate the decision boundary with fewer queries. By prioritizing informative points, ACE minimizes evaluations while generating accurate and feasible CFEs. Extensive empirical results show that ACE achieves superior evaluation efficiency compared to state-of-the-art methods, while maintaining effectiveness in identifying minimal and actionable changes.
MAApr 15, 2024
Kernel-based learning with guarantees for multi-agent applicationsKrzysztof Kowalczyk, Paweł Wachel, Cristian R. Rojas
This paper addresses a kernel-based learning problem for a network of agents locally observing a latent multidimensional, nonlinear phenomenon in a noisy environment. We propose a learning algorithm that requires only mild a priori knowledge about the phenomenon under investigation and delivers a model with corresponding non-asymptotic high probability error bounds. Both non-asymptotic analysis of the method and numerical simulation results are presented and discussed in the paper.
MLMay 5, 2023
Decentralized diffusion-based learning under non-parametric limited prior knowledgePaweł Wachel, Krzysztof Kowalczyk, Cristian R. Rojas
We study the problem of diffusion-based network learning of a nonlinear phenomenon, $m$, from local agents' measurements collected in a noisy environment. For a decentralized network and information spreading merely between directly neighboring nodes, we propose a non-parametric learning algorithm, that avoids raw data exchange and requires only mild \textit{a priori} knowledge about $m$. Non-asymptotic estimation error bounds are derived for the proposed method. Its potential applications are illustrated through simulation experiments.
LGNov 15, 2021
A Teacher-Student Markov Decision Process-based Framework for Online Correctional LearningInês Lourenço, Rebecka Winqvist, Cristian R. Rojas et al.
A classical learning setting typically concerns an agent/student who collects data, or observations, from a system in order to estimate a certain property of interest. Correctional learning is a type of cooperative teacher-student framework where a teacher, who has partial knowledge about the system, has the ability to observe and alter (correct) the observations received by the student in order to improve the accuracy of its estimate. In this paper, we show how the variance of the estimate of the student can be reduced with the help of the teacher. We formulate the corresponding online problem - where the teacher has to decide, at each time instant, whether or not to change the observations due to a limited budget - as a Markov decision process, from which the optimal policy is derived using dynamic programming. We validate the framework in numerical experiments, and compare the optimal online policy with the one from the batch setting.
LGMay 28, 2021
Asymptotically Optimal Bandits under Weighted InformationMatias I. Müller, Cristian R. Rojas
We study the problem of regret minimization in a multi-armed bandit setup where the agent is allowed to play multiple arms at each round by spreading the resources usually allocated to only one arm. At each iteration the agent selects a normalized power profile and receives a Gaussian vector as outcome, where the unknown variance of each sample is inversely proportional to the power allocated to that arm. The reward corresponds to a linear combination of the power profile and the outcomes, resembling a linear bandit. By spreading the power, the agent can choose to collect information much faster than in a traditional multi-armed bandit at the price of reducing the accuracy of the samples. This setup is fundamentally different from that of a linear bandit -- the regret is known to scale as $Θ(\sqrt{T})$ for linear bandits, while in this setup the agent receives a much more detailed feedback, for which we derive a tight $\log(T)$ problem-dependent lower-bound. We propose a Thompson-Sampling-based strategy, called Weighted Thompson Sampling (\WTS), that designs the power profile as its posterior belief of each arm being the best arm, and show that its upper bound matches the derived logarithmic lower bound. Finally, we apply this strategy to a problem of control and system identification, where the goal is to estimate the maximum gain (also called $\mathcal{H}_\infty$-norm) of a linear dynamical system based on batches of input-output samples.
MLDec 17, 2019
A Finite-Sample Deviation Bound for Stable Autoregressive ProcessesRodrigo A. González, Cristian R. Rojas
In this paper, we study non-asymptotic deviation bounds of the least squares estimator in Gaussian AR($n$) processes. By relying on martingale concentration inequalities and a tail-bound for $χ^2$ distributed variables, we provide a concentration bound for the sample covariance matrix of the process output. With this, we present a problem-dependent finite-time bound on the deviation probability of any fixed linear combination of the estimated parameters of the AR$(n)$ process. We discuss extensions and limitations of our approach.
MLDec 3, 2019
Bayesian Model Selection for Change Point Detection and ClusteringOthmane Mazhar, Cristian R. Rojas, Carlo Fischione et al.
We address the new problem of estimating a piece-wise constant signal with the purpose of detecting its change points and the levels of clusters. Our approach is to model it as a nonparametric penalized least square model selection on a family of models indexed over the collection of partitions of the design points and propose a computationally efficient algorithm to approximately solve it. Statistically, minimizing such a penalized criterion yields an approximation to the maximum a posteriori probability (MAP) estimator. The criterion is then analyzed and an oracle inequality is derived using a Gaussian concentration inequality. The oracle inequality is used to derive on one hand conditions for consistency and on the other hand an adaptive upper bound on the expected square risk of the estimator, which statistically motivates our approximation. Finally, we apply our algorithm to simulated data to experimentally validate the statistical guarantees and illustrate its behavior.
SYSep 6, 2018
Estimating Models with High-Order Noise Dynamics Using Semi-Parametric Weighted Null-Space FittingMiguel Galrinho, Cristian R. Rojas, Hakan Hjalmarsson
Standard system identification methods often provide inconsistent estimates with closed-loop data. With the prediction error method (PEM), this issue is solved by using a noise model that is flexible enough to capture the noise spectrum. However, a too flexible noise model (i.e., too many parameters) increases the model complexity, which can cause additional numerical problems for PEM. In this paper, we consider the weighted null-space fitting (WNSF) method. With this method, the system is first modeled using a non-parametric ARX model, which is then reduced to a parametric model of interest using weighted least squares. In the reduction step, a parametric noise model does not need to be estimated if it is not of interest. Because the flexibility of the noise model is increased with the sample size, this will still provide consistent estimates in closed loop and asymptotically efficient estimates in open loop. In this paper, we prove these results, and we derive the asymptotic covariance for the estimation error obtained in closed loop, which is optimal for an infinite-order noise model. For this purpose, we also derive a new technical result for geometric variance analysis, instrumental to our end. Finally, we perform a simulation study to illustrate the benefits of the method when the noise model cannot be parametrized by a low-order model.
SYJun 6, 2017
Sparse Iterative Learning Control with Application to a Wafer Stage: Achieving Performance, Resource Efficiency, and Task FlexibilityTom Oomen, Cristian R. Rojas
Trial-varying disturbances are a key concern in Iterative Learning Control (ILC) and may lead to inefficient and expensive implementations and severe performance deterioration. The aim of this paper is to develop a general framework for optimization-based ILC that allows for enforcing additional structure, including sparsity. The proposed method enforces sparsity in a generalized setting through convex relaxations using $\ell_1$ norms. The proposed ILC framework is applied to the optimization of sampling sequences for resource efficient implementation, trial-varying disturbance attenuation, and basis function selection. The framework has a large potential in control applications such as mechatronics, as is confirmed through an application on a wafer stage.
ITJul 26, 2015
Estimator Selection: End-Performance Metric AspectsDimitrios Katselis, Cristian R. Rojas, Carolyn L. Beck
Recently, a framework for application-oriented optimal experiment design has been introduced. In this context, the distance of the estimated system from the true one is measured in terms of a particular end-performance metric. This treatment leads to superior unknown system estimates to classical experiment designs based on usual pointwise functional distances of the estimated system from the true one. The separation of the system estimator from the experiment design is done within this new framework by choosing and fixing the estimation method to either a maximum likelihood (ML) approach or a Bayesian estimator such as the minimum mean square error (MMSE). Since the MMSE estimator delivers a system estimate with lower mean square error (MSE) than the ML estimator for finite-length experiments, it is usually considered the best choice in practice in signal processing and control applications. Within the application-oriented framework a related meaningful question is: Are there end-performance metrics for which the ML estimator outperforms the MMSE when the experiment is finite-length? In this paper, we affirmatively answer this question based on a simple linear Gaussian regression example.
MLJul 22, 2015
Evaluation of Spectral Learning for the Identification of Hidden Markov ModelsRobert Mattila, Cristian R. Rojas, Bo Wahlberg
Hidden Markov models have successfully been applied as models of discrete time series in many fields. Often, when applied in practice, the parameters of these models have to be estimated. The currently predominating identification methods, such as maximum-likelihood estimation and especially expectation-maximization, are iterative and prone to have problems with local minima. A non-iterative method employing a spectral subspace-like approach has recently been proposed in the machine learning literature. This paper evaluates the performance of this algorithm, and compares it to the performance of the expectation-maximization algorithm, on a number of numerical examples. We find that the performance is mixed; it successfully identifies some systems with relatively few available observations, but fails completely for some systems even when a large amount of observations is available. An open question is how this discrepancy can be explained. We provide some indications that it could be related to how well-conditioned some system parameters are.
MLJan 23, 2015
Bayesian Learning for Low-Rank matrix reconstructionMartin Sundin, Cristian R. Rojas, Magnus Jansson et al.
We develop latent variable models for Bayesian learning based low-rank matrix completion and reconstruction from linear measurements. For under-determined systems, the developed methods are shown to reconstruct low-rank matrices when neither the rank nor the noise power is known a-priori. We derive relations between the latent variable models and several low-rank promoting penalty functions. The relations justify the use of Kronecker structured covariance matrices in a Gaussian based prior. In the methods, we use evidence approximation and expectation-maximization to learn the model parameters. The performance of the methods is evaluated through extensive numerical simulations.
SYApr 20, 2015
Approximate Regularization Paths for Nuclear Norm Minimization Using Singular Value Bounds -- With Implementation and Extended AppendixNiclas Blomberg, Cristian R. Rojas, Bo Wahlberg
The widely used nuclear norm heuristic for rank minimization problems introduces a regularization parameter which is difficult to tune. We have recently proposed a method to approximate the regularization path, i.e., the optimal solution as a function of the parameter, which requires solving the problem only for a sparse set of points. In this paper, we extend the algorithm to provide error bounds for the singular values of the approximation. We exemplify the algorithms on large scale benchmark examples in model order reduction. Here, the order of a dynamical system is reduced by means of constrained minimization of the nuclear norm of a Hankel matrix.
STDec 1, 2014
How to monitor and mitigate stair-casing in l1 trend filteringCristian R. Rojas, Bo Wahlberg
In this paper we study the estimation of changing trends in time-series using $\ell_1$ trend filtering. This method generalizes 1D Total Variation (TV) denoising for detection of step changes in means to detecting changes in trends, and it relies on a convex optimization problem for which there are very efficient numerical algorithms. It is known that TV denoising suffers from the so-called stair-case effect, which leads to detecting false change points. The objective of this paper is to show that $\ell_1$ trend filtering also suffers from a certain stair-case problem. The analysis is based on an interpretation of the dual variables of the optimization problem in the method as integrated random walk. We discuss consistency conditions for $\ell_1$ trend filtering, how to monitor their fulfillment, and how to modify the algorithm to avoid the stair-case false detection problem.
SYJul 22, 2014
Approximate Regularization Path for Nuclear Norm Based H2 Model ReductionNiclas Blomberg, Cristian R. Rojas, Bo Wahlberg
This paper concerns model reduction of dynamical systems using the nuclear norm of the Hankel matrix to make a trade-off between model fit and model complexity. This results in a convex optimization problem where this trade-off is determined by one crucial design parameter. The main contribution is a methodology to approximately calculate all solutions up to a certain tolerance to the model reduction problem as a function of the design parameter. This is called the regularization path in sparse estimation and is a very important tool in order to find the appropriate balance between fit and complexity. We extend this to the more complicated nuclear norm case. The key idea is to determine when to exactly calculate the optimal solution using an upper bound based on the so-called duality gap. Hence, by solving a fixed number of optimization problems the whole regularization path up to a given tolerance can be efficiently computed. We illustrate this approach on some numerical examples.
NAJun 30, 2014
Relevance Singular Vector Machine for low-rank matrix sensingMartin Sundin, Saikat Chatterjee, Magnus Jansson et al.
In this paper we develop a new Bayesian inference method for low rank matrix reconstruction. We call the new method the Relevance Singular Vector Machine (RSVM) where appropriate priors are defined on the singular vectors of the underlying matrix to promote low rank. To accelerate computations, a numerically efficient approximation is developed. The proposed algorithms are applied to matrix completion and matrix reconstruction problems and their performance is studied numerically.
STJan 21, 2014
On change point detection using the fused lasso methodCristian R. Rojas, Bo Wahlberg
In this paper we analyze the asymptotic properties of l1 penalized maximum likelihood estimation of signals with piece-wise constant mean values and/or variances. The focus is on segmentation of a non-stationary time series with respect to changes in these model parameters. This change point detection and estimation problem is also referred to as total variation denoising or l1 -mean filtering and has many important applications in most fields of science and engineering. We establish the (approximate) sparse consistency properties, including rate of convergence, of the so-called fused lasso signal approximator (FLSA). We show that this only holds if the sign of the corresponding consecutive changes are all different, and that this estimator is otherwise incapable of correctly detecting the underlying sparsity pattern. The key idea is to notice that the optimality conditions for this problem can be analyzed using techniques related to brownian bridge theory.
MLSep 21, 2012
A Note on the SPICE MethodCristian R. Rojas, Dimitrios Katselis, Håkan Hjalmarsson
In this article, we analyze the SPICE method developed in [1], and establish its connections with other standard sparse estimation methods such as the Lasso and the LAD-Lasso. This result positions SPICE as a computationally efficient technique for the calculation of Lasso-type estimators. Conversely, this connection is very useful for establishing the asymptotic properties of SPICE under several problem scenarios and for suggesting suitable modifications in cases where the naive version of SPICE would not work.