2.9SYMay 29
Bounds on Prediction Error When Using an Impulse Response/Equilibrium Model StructureTyrone L. Vincent, Michael B. Wakin
An impulse response/equilibrium model (IREM) structure combines a linear convolution model with a nonlinear function that sets the current operating point via an equilibrium variable with integrator dynamics. This model structure is well suited for mildly nonlinear systems and in particular has been applied to battery fast charging control. This paper provides observability conditions for the IREM model structure and bounds on the prediction error. These conditions can be evaluated directly on the system impulse response.
LGJul 9, 2022
Error Analysis of Tensor-Train Cross ApproximationZhen Qin, Alexander Lidiak, Zhexuan Gong et al.
Tensor train decomposition is widely used in machine learning and quantum physics due to its concise representation of high-dimensional tensors, overcoming the curse of dimensionality. Cross approximation-originally developed for representing a matrix from a set of selected rows and columns-is an efficient method for constructing a tensor train decomposition of a tensor from few of its entries. While tensor train cross approximation has achieved remarkable performance in practical applications, its theoretical analysis, in particular regarding the error of the approximation, is so far lacking. To our knowledge, existing results only provide element-wise approximation accuracy guarantees, which lead to a very loose bound when extended to the entire tensor. In this paper, we bridge this gap by providing accuracy guarantees in terms of the entire tensor for both exact and noisy measurements. Our results illustrate how the choice of selected subtensors affects the quality of the cross approximation and that the approximation error caused by model error and/or measurement error may not grow exponentially with the order of the tensor. These results are verified by numerical experiments, and may have important implications for the usefulness of cross approximations for high-order tensors, such as those encountered in the description of quantum many-body states.
SYJul 16, 2013
Technical Report: Observability with Random ObservationsBorhan M. Sanandaji, Michael B. Wakin, Tyrone L. Vincent
Recovery of the initial state of a high-dimensional system can require a large number of measurements. In this paper, we explain how this burden can be significantly reduced when randomized measurement operators are employed. Our work builds upon recent results from Compressive Sensing (CS). In particular, we make the connection to CS analysis for random block diagonal matrices. By deriving Concentration of Measure (CoM) inequalities, we show that the observability matrix satisfies the Restricted Isometry Property (RIP) (a sufficient condition for stable recovery of sparse vectors) under certain conditions on the state transition matrix. For example, we show that if the state transition matrix is unitary, and if independent, randomly-populated measurement matrices are employed, then it is possible to uniquely recover a sparse high-dimensional initial state when the total number of measurements scales linearly in the sparsity level (the number of non-zero entries) of the initial state and logarithmically in the state dimension. We further extend our RIP analysis for scaled unitary and symmetric state transition matrices. We support our analysis with a case study of a two-dimensional diffusion process.
MLJan 5, 2024
Guaranteed Nonconvex Factorization Approach for Tensor Train RecoveryZhen Qin, Michael B. Wakin, Zhihui Zhu
In this paper, we provide the first convergence guarantee for the factorization approach. Specifically, to avoid the scaling ambiguity and to facilitate theoretical analysis, we optimize over the so-called left-orthogonal TT format which enforces orthonormality among most of the factors. To ensure the orthonormal structure, we utilize the Riemannian gradient descent (RGD) for optimizing those factors over the Stiefel manifold. We first delve into the TT factorization problem and establish the local linear convergence of RGD. Notably, the rate of convergence only experiences a linear decline as the tensor order increases. We then study the sensing problem that aims to recover a TT format tensor from linear measurements. Assuming the sensing operator satisfies the restricted isometry property (RIP), we show that with a proper initialization, which could be obtained through spectral initialization, RGD also converges to the ground-truth tensor at a linear rate. Furthermore, we expand our analysis to encompass scenarios involving Gaussian noise in the measurements. We prove that RGD can reliably recover the ground truth at a linear rate, with the recovery error exhibiting only polynomial growth in relation to the tensor order. We conduct various experiments to validate our theoretical findings.
LGJun 19, 2025
A Scalable Factorization Approach for High-Order Structured Tensor RecoveryZhen Qin, Michael B. Wakin, Zhihui Zhu
Tensor decompositions, which represent an $N$-order tensor using approximately $N$ factors of much smaller dimensions, can significantly reduce the number of parameters. This is particularly beneficial for high-order tensors, as the number of entries in a tensor grows exponentially with the order. Consequently, they are widely used in signal recovery and data analysis across domains such as signal processing, machine learning, and quantum physics. A computationally and memory-efficient approach to these problems is to optimize directly over the factors using local search algorithms such as gradient descent, a strategy known as the factorization approach in matrix and tensor optimization. However, the resulting optimization problems are highly nonconvex due to the multiplicative interactions between factors, posing significant challenges for convergence analysis and recovery guarantees. In this paper, we present a unified framework for the factorization approach to solving various tensor decomposition problems. Specifically, by leveraging the canonical form of tensor decompositions--where most factors are constrained to be orthonormal to mitigate scaling ambiguity--we apply Riemannian gradient descent (RGD) to optimize these orthonormal factors on the Stiefel manifold. Under a mild condition on the loss function, we establish a Riemannian regularity condition for the factorized objective and prove that RGD converges to the ground-truth tensor at a linear rate when properly initialized. Notably, both the initialization requirement and the convergence rate scale polynomially rather than exponentially with $N$, improving upon existing results for Tucker and tensor-train format tensors.
CVAug 20, 2025
QA-VLM: Providing human-interpretable quality assessment for wire-feed laser additive manufacturing parts with Vision Language ModelsQiaojie Zheng, Jiucai Zhang, Joy Gockel et al.
Image-based quality assessment (QA) in additive manufacturing (AM) often relies heavily on the expertise and constant attention of skilled human operators. While machine learning and deep learning methods have been introduced to assist in this task, they typically provide black-box outputs without interpretable justifications, limiting their trust and adoption in real-world settings. In this work, we introduce a novel QA-VLM framework that leverages the attention mechanisms and reasoning capabilities of vision-language models (VLMs), enriched with application-specific knowledge distilled from peer-reviewed journal articles, to generate human-interpretable quality assessments. Evaluated on 24 single-bead samples produced by laser wire direct energy deposition (DED-LW), our framework demonstrates higher validity and consistency in explanation quality than off-the-shelf VLMs. These results highlight the potential of our approach to enable trustworthy, interpretable quality assessment in AM applications.
LGMar 8, 2021
Digital Beamforming Robust to Time-Varying Carrier Frequency OffsetShuang Li, Payam Nayeri, Michael B. Wakin
Adaptive interference cancellation is rapidly becoming a necessity for our modern wireless communication systems, due to the proliferation of wireless devices that interfere with each other. To cancel interference, digital beamforming algorithms adaptively adjust the weight vector of the antenna array, and in turn its radiation pattern, to minimize interference while maximizing the desired signal power. While these algorithms are effective in ideal scenarios, they are sensitive to signal corruptions. In this work, we consider the case when the transmitter and receiver in a communication system cannot be synchronized, resulting in a carrier frequency offset that corrupts the signal. We present novel beamforming algorithms that are robust to signal corruptions arising from this time-variant carrier frequency offset. In particular, we bring in the Discrete Prolate Spheroidal Sequences (DPSS's) and propose two atomic-norm-minimization (ANM)-based methods in both 1D and 2D frameworks to design a weight vector that can be used to cancel interference when there exist unknown time-varying frequency drift in the pilot and interferer signals. Both algorithms do not assume a pilot signal is known. Noting that solving ANM optimization problems via semi-definite programs can be a computational burden, we also present a novel fast algorithm to approximately solve our 1D ANM optimization problem. Finally, we confirm the benefits of our proposed algorithms and show the advantages over existing approaches with a series of experiments.
LGNov 18, 2019
The Effectiveness of Variational Autoencoders for Active LearningFarhad Pourkamali-Anaraki, Michael B. Wakin
The high cost of acquiring labels is one of the main challenges in deploying supervised machine learning algorithms. Active learning is a promising approach to control the learning process and address the difficulties of data labeling by selecting labeled training examples from a large pool of unlabeled instances. In this paper, we propose a new data-driven approach to active learning by choosing a small set of labeled data points that are both informative and representative. To this end, we present an efficient geometric technique to select a diverse core-set in a low-dimensional latent space obtained by training a Variational Autoencoder (VAE). Our experiments demonstrate an improvement in accuracy over two related techniques and, more importantly, signify the representation power of generative modeling for developing new active learning methods in high-dimensional data settings.
OCJul 11, 2019
The Landscape of Non-convex Empirical Risk with Degenerate Population RiskShuang Li, Gongguo Tang, Michael B. Wakin
The landscape of empirical risk has been widely studied in a series of machine learning problems, including low-rank matrix factorization, matrix sensing, matrix completion, and phase retrieval. In this work, we focus on the situation where the corresponding population risk is a degenerate non-convex loss function, namely, the Hessian of the population risk can have zero eigenvalues. Instead of analyzing the non-convex empirical risk directly, we first study the landscape of the corresponding population risk, which is usually easier to characterize, and then build a connection between the landscape of the empirical risk and its population risk. In particular, we establish a correspondence between the critical points of the empirical risk and its population risk without the strongly Morse assumption, which is required in existing literature but not satisfied in degenerate scenarios. We also apply the theory to matrix sensing and phase retrieval to demonstrate how to infer the landscape of empirical risk from that of the corresponding population risk.
OCApr 22, 2019
Provable Bregman-divergence based Methods for Nonconvex and Non-Lipschitz ProblemsQiuwei Li, Zhihui Zhu, Gongguo Tang et al.
The (global) Lipschitz smoothness condition is crucial in establishing the convergence theory for most optimization methods. Unfortunately, most machine learning and signal processing problems are not Lipschitz smooth. This motivates us to generalize the concept of Lipschitz smoothness condition to the relative smoothness condition, which is satisfied by any finite-order polynomial objective function. Further, this work develops new Bregman-divergence based algorithms that are guaranteed to converge to a second-order stationary point for any relatively smooth problem. In addition, the proposed optimization methods cover both the proximal alternating minimization and the proximal alternating linearized minimization when we specialize the Bregman divergence to the Euclidian distance. Therefore, this work not only develops guaranteed optimization methods for non-Lipschitz smooth problems but also solves an open problem of showing the second-order convergence guarantees for these alternating minimization methods.
OCNov 7, 2018
Global Optimality in Distributed Low-rank Matrix FactorizationZhihui Zhu, Qiuwei Li, Xinshuo Yang et al.
We study the convergence of a variant of distributed gradient descent (DGD) on a distributed low-rank matrix approximation problem wherein some optimization variables are used for consensus (as in classical DGD) and some optimization variables appear only locally at a single node in the network. We term the resulting algorithm DGD+LOCAL. Using algorithmic connections to gradient descent and geometric connections to the well-behaved landscape of the centralized low-rank matrix approximation problem, we identify sufficient conditions where DGD+LOCAL is guaranteed to converge with exact consensus to a global minimizer of the original centralized problem. For the distributed low-rank matrix approximation problem, these guarantees are stronger---in terms of consensus and optimality---than what appear in the literature for classical DGD and more general problems.
LGMay 13, 2018
The Global Optimization Geometry of Shallow Linear Neural NetworksZhihui Zhu, Daniel Soudry, Yonina C. Eldar et al.
We examine the squared error loss landscape of shallow linear neural networks. We show---with significantly milder assumptions than previous works---that the corresponding optimization problems have benign geometric properties: there are no spurious local minima and the Hessian at every saddle point has at least one negative eigenvalue. This means that at every saddle point there is a directional negative curvature which algorithms can utilize to further decrease the objective value. These geometric properties imply that many local search algorithms (such as the gradient descent which is widely utilized for training neural networks) can provably solve the training problem with global convergence.
NAAug 10, 2017
The Fast Slepian TransformSanthosh Karnik, Zhihui Zhu, Michael B. Wakin et al.
The discrete prolate spheroidal sequences (DPSS's) provide an efficient representation for discrete signals that are perfectly timelimited and nearly bandlimited. Due to the high computational complexity of projecting onto the DPSS basis - also known as the Slepian basis - this representation is often overlooked in favor of the fast Fourier transform (FFT). We show that there exist fast constructions for computing approximate projections onto the leading Slepian basis elements. The complexity of the resulting algorithms is comparable to the FFT, and scales favorably as the quality of the desired approximation is increased. In the process of bounding the complexity of these algorithms, we also establish new nonasymptotic results on the eigenvalue distribution of discrete time-frequency localization operators. We then demonstrate how these algorithms allow us to efficiently compute the solution to certain least-squares problems that arise in signal processing. We also provide simulations comparing these fast, approximate Slepian methods to exact Slepian methods as well as the traditional FFT based methods.
NAOct 11, 2015
Computing Active Subspaces Efficiently with Gradient SketchingPaul G. Constantine, Armin Eftekhari, Michael B. Wakin
Active subspaces are an emerging set of tools for identifying and exploiting the most important directions in the space of a computer simulation's input parameters; these directions depend on the simulation's quantity of interest, which we treat as a function from inputs to outputs. To identify a function's active subspace, one must compute the eigenpairs of a matrix derived from the function's gradient, which presents challenges when the gradient is not available as a subroutine. We numerically study two methods for estimating the necessary eigenpairs using only linear measurements of the function's gradient. In practice, these measurements can be estimated by finite differences using only two function evaluations, regardless of the dimension of the function's input space.
NASep 1, 2009
Analysis of Orthogonal Matching Pursuit using the Restricted Isometry PropertyMark A. Davenport, Michael B. Wakin
Orthogonal Matching Pursuit (OMP) is the canonical greedy algorithm for sparse approximation. In this paper we demonstrate that the restricted isometry property (RIP) can be used for a very straightforward analysis of OMP. Our main conclusion is that the RIP of order $K+1$ (with isometry constant $δ< \frac{1}{3\sqrt{K}}$) is sufficient for OMP to exactly recover any $K$-sparse signal. Our analysis relies on simple and intuitive observations about OMP and matrices which satisfy the RIP. For restricted classes of $K$-sparse signals (those that are highly compressible), a relaxed bound on the isometry constant is also established. A deeper understanding of OMP may benefit the analysis of greedy algorithms in general. To demonstrate this, we also briefly revisit the analysis of the Regularized OMP (ROMP) algorithm.