LGJan 30, 2023
Deep networks for system identification: a SurveyGianluigi Pillonetto, Aleksandr Aravkin, Daniel Gedon et al.
Deep learning is a topic of considerable current interest. The availability of massive data collections and powerful software resources has led to an impressive amount of results in many application areas that reveal essential but hidden properties of the observations. System identification learns mathematical descriptions of dynamic systems from input-output data and can thus benefit from the advances of deep neural networks to enrich the possible range of models to choose from. For this reason, we provide a survey of deep learning from a system identification perspective. We cover a wide spectrum of topics to enable researchers to understand the methods, providing rigorous practical and theoretical insights into the benefits and challenges of using them. The main aim of the identified model is to predict new data from previous observations. This can be achieved with different deep learning based modelling techniques and we discuss architectures commonly adopted in the literature, like feedforward, convolutional, and recurrent networks. Their parameters have to be estimated from past data trying to optimize the prediction performance. For this purpose, we discuss a specific set of first-order optimization tools that is emerged as efficient. The survey then draws connections to the well-studied area of kernel-based methods. They control the data fit by regularization terms that penalize models not in line with prior assumptions. We illustrate how to cast them in deep architectures to obtain deep kernel-based methods. The success of deep learning also resulted in surprising empirical observations, like the counter-intuitive behaviour of models with many parameters. We discuss the role of overparameterized models, including their connection to kernels, as well as implicit regularization mechanisms which affect generalization, specifically the interesting phenomena of benign overfitting ...
CEJul 2, 2012
Robust inversion via semistochastic dimensionality reductionAleksandr Aravkin, Michael P. Friedlander, Tristan van Leeuwen
We consider a class of inverse problems where it is possible to aggregate the results of multiple experiments. This class includes problems where the forward model is the solution operator to linear ODEs or PDEs. The tremendous size of such problems motivates dimensionality reduction techniques based on randomly mixing experiments. These techniques break down, however, when robust data-fitting formulations are used, which are essential in cases of missing data, unusually large errors, and systematic features in the data unexplained by the forward model. We survey robust methods within a statistical framework, and propose a semistochastic optimization approach that allows dimensionality reduction. The efficacy of the methods are demonstrated for a large-scale seismic inverse problem using the robust Student's t-distribution, where a useful synthetic velocity model is recovered in the extreme scenario of 60% data missing at random. The semistochastic approach achieves this recovery using 20% of the effort required by a direct robust approach.
LGNov 10, 2022
Spatiotemporal k-meansOlga Dorabiala, Devavrat Vivek Dabke, Jennifer Webster et al.
Spatiotemporal data is increasingly available due to emerging sensor and data acquisition technologies that track moving objects. Spatiotemporal clustering addresses the need to efficiently discover patterns and trends in moving object behavior without human supervision. One application of interest is the discovery of moving clusters, where clusters have a static identity, but their location and content can change over time. We propose a two phase spatiotemporal clustering method called spatiotemporal k-means (STkM) that is able to analyze the multi-scale relationships within spatiotemporal data. By optimizing an objective function that is unified over space and time, the method can track dynamic clusters at both short and long timescales with minimal parameter tuning and no post-processing. We begin by proposing a theoretical generating model for spatiotemporal data and prove the efficacy of STkM in this setting. We then evaluate STkM on a recently developed collective animal behavior benchmark dataset and show that STkM outperforms baseline methods in the low-data limit, which is a critical regime of consideration in many emerging applications. Finally, we showcase how STkM can be extended to more complex machine learning tasks, particularly unsupervised region of interest detection and tracking in videos.
LGNov 14, 2025
On-line learning of dynamic systems: sparse regression meets Kalman filteringGianluigi Pillonetto, Akram Yazdani, Aleksandr Aravkin
Learning governing equations from data is central to understanding the behavior of physical systems across diverse scientific disciplines, including physics, biology, and engineering. The Sindy algorithm has proven effective in leveraging sparsity to identify concise models of nonlinear dynamical systems. In this paper, we extend sparsity-driven approaches to real-time learning by integrating a cornerstone algorithm from control theory -- the Kalman filter (KF). The resulting Sindy Kalman Filter (SKF) unifies both frameworks by treating unknown system parameters as state variables, enabling real-time inference of complex, time-varying nonlinear models unattainable by either method alone. Furthermore, SKF enhances KF parameter identification strategies, particularly via look-ahead error, significantly simplifying the estimation of sparsity levels, variance parameters, and switching instants. We validate SKF on a chaotic Lorenz system with drifting or switching parameters and demonstrate its effectiveness in the real-time identification of a sparse nonlinear aircraft model built from real flight data.
MENov 23, 2025
A joint optimization approach to identifying sparse dynamics using least squares kernel collocationAlexander W. Hsu, Ike Griss Salas, Jacob M. Stevens-Haas et al.
We develop an all-at-once modeling framework for learning systems of ordinary differential equations (ODE) from scarce, partial, and noisy observations of the states. The proposed methodology amounts to a combination of sparse recovery strategies for the ODE over a function library combined with techniques from reproducing kernel Hilbert space (RKHS) theory for estimating the state and discretizing the ODE. Our numerical experiments reveal that the proposed strategy leads to significant gains in terms of accuracy, sample efficiency, and robustness to noise, both in terms of learning the equation and estimating the unknown states. This work demonstrates capabilities well beyond existing and widely used algorithms while extending the modeling flexibility of other recent developments in equation discovery.
OCOct 19, 2021
Theoretical Advances in Current Estimation and Navigation from a Glider-Based Acoustic Doppler Current Profiler (ADCP)Jacob Stevens-Haas, Sarah E. Webster, Aleksandr Aravkin
We examine acoustic Doppler current profiler (ADCP) measurements from underwater gliders to determine glider position, glider velocity, and subsurface current. ADCPs, however, do not directly observe the quantities of interest; instead, they measure the relative motion of the vehicle and the water column. We examine the lineage of mathematical innovations that have previously been applied to this problem, discovering an unstated but incorrect assumption of independence. We reframe a recent method to form a joint probability model of current and vehicle navigation, which allows us to correct this assumption and extend the classic Kalman smoothing method. Detailed simulations affirm the efficacy of our approach for computing estimates and their uncertainty. The joint model developed here sets the stage for future work to incorporate constraints, range measurements, and robust statistical modeling.
MLAug 16, 2021
Robust Trimmed k-meansOlga Dorabiala, J. Nathan Kutz, Aleksandr Aravkin
Clustering is a fundamental tool in unsupervised learning, used to group objects by distinguishing between similar and dissimilar features of a given data set. One of the most common clustering algorithms is k-means. Unfortunately, when dealing with real-world data many traditional clustering algorithms are compromised by lack of clear separation between groups, noisy observations, and/or outlying data points. Thus, robust statistical algorithms are required for successful data analytics. Current methods that robustify k-means clustering are specialized for either single or multi-membership data, but do not perform competitively in both cases. We propose an extension of the k-means algorithm, which we call Robust Trimmed k-means (RTKM) that simultaneously identifies outliers and clusters points and can be applied to either single- or multi-membership data. We test RTKM on various real-world datasets and show that RTKM performs competitively with other methods on single membership data with outliers and multi-membership data without outliers. We also show that RTKM leverages its relative advantages to outperform other methods on multi-membership data containing outliers.
MED-PHMay 14, 2021
A hyperparameter-tuning approach to automated inverse planningKelsey Maass, Aleksandr Aravkin, Minsun Kim
Radiotherapy inverse planning often requires planners to modify parameters in the treatment planning system's objective function to produce clinically acceptable plans. Due to the manual steps in this process, plan quality can vary depending on the planning time available and the planner's skills. This study investigates two hyperparameter-tuning methods for automated inverse planning. Because this framework does not train a model on previously-optimized plans, it can be readily adapted to practice pattern changes, and plan quality is not limited by that of a training cohort. We selected 10 patients who received lung SBRT using manually-generated clinical plans. We used random sampling (RS) and Bayesian optimization (BO) to tune parameters using linear-quadratic utility functions based on 11 clinical goals. Normalizing all plans to have PTV D95 equal to 48 Gy, we compared plan quality for the automatically-generated and manually-generated plans. We also investigated the impact of iteration count on the automatically-generated plans, comparing planning time and plan utility for RS and BO plans with and without stopping criteria. Without stopping criteria, the median planning time was 1.9 and 2.3 hours for RS and BO plans. The OAR doses in the RS and BO plans had a median percent difference (MPD) of 48.7% and 60.4% below clinical dose limits and an MPD of 2.8% and 3.3% below clinical plan doses. With stopping criteria, the utility decreased by an MPD of 5.3% and 3.9% for RS and BO plans, but the median planning time was reduced to 0.5 and 0.7 hours, and the OAR doses still had an MPD of 42.9% and 49.7% below clinical dose limits and an MPD of 0.3% and 1.8% below clinical plan doses. This study demonstrates that hyperparameter-tuning approaches to automated inverse planning can reduce active planning time with plan quality that is similar to or better than manually-generated plans.
NAMar 24, 2021
Analysis of Truncated Orthogonal Iteration for Sparse Eigenvector ProblemsHexuan Liu, Aleksandr Aravkin
A wide range of problems in computational science and engineering require estimation of sparse eigenvectors for high dimensional systems. Here, we propose two variants of the Truncated Orthogonal Iteration to compute multiple leading eigenvectors with sparsity constraints simultaneously. We establish numerical convergence results for the proposed algorithms using a perturbation framework, and extend our analysis to other existing alternatives for sparse eigenvector estimation. We then apply our algorithms to solve the sparse principle component analysis problem for a wide range of test datasets, from simple simulations to real-world datasets including MNIST, sea surface temperature and 20 newsgroups. In all these cases, we show that the new methods get state of the art results quickly and with minimal parameter tuning.
MLMay 21, 2019
Time-varying Autoregression with Low Rank TensorsKameron Decker Harris, Aleksandr Aravkin, Rajesh Rao et al.
We present a windowed technique to learn parsimonious time-varying autoregressive models from multivariate timeseries. This unsupervised method uncovers interpretable spatiotemporal structure in data via non-smooth and non-convex optimization. In each time window, we assume the data follow a linear model parameterized by a system matrix, and we model this stack of potentially different system matrices as a low rank tensor. Because of its structure, the model is scalable to high-dimensional data and can easily incorporate priors such as smoothness over time. We find the components of the tensor using alternating minimization and prove that any stationary point of this algorithm is a local minimum. We demonstrate on a synthetic example that our method identifies the true rank of a switching linear system in the presence of noise. We illustrate our model's utility and superior scalability over extant methods when applied to several synthetic and real-world example: two types of time-varying linear systems, worm behavior, sea surface temperature, and monkey brain datasets.
OCNov 28, 2018
Basis Pursuit Denoise with Nonsmooth ConstraintsRobert Baraldi, Rajiv Kumar, Aleksandr Aravkin
Level-set optimization formulations with data-driven constraints minimize a regularization functional subject to matching observations to a given error level. These formulations are widely used, particularly for matrix completion and sparsity promotion in data interpolation and denoising. The misfit level is typically measured in the l2 norm, or other smooth metrics. In this paper, we present a new flexible algorithmic framework that targets nonsmooth level-set constraints, including L1, Linf, and even L0 norms. These constraints give greater flexibility for modeling deviations in observation and denoising, and have significant impact on the solution. Measuring error in the L1 and L0 norms makes the result more robust to large outliers, while matching many observations exactly. We demonstrate the approach for basis pursuit denoise (BPDN) problems as well as for extensions of BPDN to matrix factorization, with applications to interpolation and denoising of 5D seismic data. The new methods are particularly promising for seismic applications, where the amplitude in the data varies significantly, and measurement noise in low-amplitude regions can wreak havoc for standard Gaussian error models.
OCApr 23, 2018
Simultaneous shot inversion for nonuniform geometries using fast data interpolationMichelle Liu, Rajiv Kumar, Eldad Haber et al.
Stochastic optimization is key to efficient inversion in PDE-constrained optimization. Using 'simultaneous shots', or random superposition of source terms, works very well in simple acquisition geometries where all sources see all receivers, but this rarely occurs in practice. We develop an approach that interpolates data to an ideal acquisition geometry while solving the inverse problem using simultaneous shots. The approach is formulated as a joint inverse problem, combining ideas from low-rank interpolation with full-waveform inversion. Results using synthetic experiments illustrate the flexibility and efficiency of the approach.
CVAug 26, 2017
Distributed Bundle AdjustmentKarthikeyan Natesan Ramamurthy, Chung-Ching Lin, Aleksandr Aravkin et al.
Most methods for Bundle Adjustment (BA) in computer vision are either centralized or operate incrementally. This leads to poor scaling and affects the quality of solution as the number of images grows in large scale structure from motion (SfM). Furthermore, they cannot be used in scenarios where image acquisition and processing must be distributed. We address this problem with a new distributed BA algorithm. Our distributed formulation uses alternating direction method of multipliers (ADMM), and, since each processor sees only a small portion of the data, we show that robust formulations improve performance. We analyze convergence of the proposed algorithm, and illustrate numerical performance, accuracy of the parameter estimates, and scalability of the distributed implementation in the context of synthetic 3D datasets with known camera position and orientation ground truth. The results are comparable to an alternate state-of-the-art centralized bundle adjustment algorithm on synthetic and real 3D reconstruction problems. The runtime of our implementation scales linearly with the number of observed points.
MLOct 4, 2016
A SMART Stochastic Algorithm for Nonconvex Optimization with Applications to Robust Machine LearningAleksandr Aravkin, Damek Davis
In this paper, we show how to transform any optimization problem that arises from fitting a machine learning model into one that (1) detects and removes contaminated data from the training set while (2) simultaneously fitting the trimmed model on the uncontaminated data that remains. To solve the resulting nonconvex optimization problem, we introduce a fast stochastic proximal-gradient algorithm that incorporates prior knowledge through nonsmooth regularization. For datasets of size $n$, our approach requires $O(n^{2/3}/\varepsilon)$ gradient evaluations to reach $\varepsilon$-accuracy and, when a certain error bound holds, the complexity improves to $O(κn^{2/3}\log(1/\varepsilon))$. These rates are $n^{1/3}$ times better than those achieved by typical, full gradient methods.
MLMay 26, 2016
A General Family of Trimmed Estimators for Robust High-dimensional Data AnalysisEunho Yang, Aurelie Lozano, Aleksandr Aravkin
We consider the problem of robustifying high-dimensional structured estimation. Robust techniques are key in real-world applications which often involve outliers and data corruption. We focus on trimmed versions of structurally regularized M-estimators in the high-dimensional setting, including the popular Least Trimmed Squares estimator, as well as analogous estimators for generalized linear models and graphical models, using possibly non-convex loss functions. We present a general analysis of their statistical convergence rates and consistency, and then take a closer look at the trimmed versions of the Lasso and Graphical Lasso estimators as special cases. On the optimization side, we show how to extend algorithms for M-estimators to fit trimmed variants and provide guarantees on their numerical convergence. The generality and competitive performance of high-dimensional trimmed estimators are illustrated numerically on both simulated and real-world genomics data.
OCJan 19, 2016
Non-smooth Variable ProjectionTristan van Leeuwen, Aleksandr Aravkin
Variable projection solves structured optimization problems by completely minimizing over a subset of the variables while iterating over the remaining variables. Over the last 30 years, the technique has been widely used, with empirical and theoretical results demonstrating both greater efficacy and greater stability compared to competing approaches. Classic examples have exploited closed-form projections and smoothness of the objective function. We extend the approach to problems that include non-smooth terms, and where the projection subproblems can only be solved inexactly by iterative methods. We propose an inexact adaptive algonrithm for solving such problems and analyze its computational complexity. Finally, we show how the theory can be used to design methods for selected problems occurring frequently in machine-learning and inverse problems.
OCJun 4, 2014
A variational approach to stable principal component pursuitAleksandr Aravkin, Stephen Becker, Volkan Cevher et al.
We introduce a new convex formulation for stable principal component pursuit (SPCP) to decompose noisy signals into low-rank and sparse representations. For numerical solutions of our SPCP formulation, we first develop a convex variational framework and then accelerate it with quasi-Newton methods. We show, via synthetic and real data experiments, that our approach offers advantages over the classical SPCP formulations in scalability and practical parameter selection.
MLDec 5, 2013
Iterative Log ThresholdingDmitry Malioutov, Aleksandr Aravkin
Sparse reconstruction approaches using the re-weighted l1-penalty have been shown, both empirically and theoretically, to provide a significant improvement in recovering sparse signals in comparison to the l1-relaxation. However, numerical optimization of such penalties involves solving problems with l1-norms in the objective many times. Using the direct link of reweighted l1-penalties to the concave log-regularizer for sparsity, we derive a simple prox-like algorithm for the log-regularized formulation. The proximal splitting step of the algorithm has a closed form solution, and we call the algorithm 'log-thresholding' in analogy to soft thresholding for the l1-penalty. We establish convergence results, and demonstrate that log-thresholding provides more accurate sparse reconstructions compared to both soft and hard thresholding. Furthermore, the approach can be directly extended to optimization over matrices with penalty for rank (i.e. the nuclear norm penalty and its re-weigthed version), where we suggest a singular-value log-thresholding approach.