Yousef Saad

23papers

139citations

Novelty50%

AI Score38

Ranked #112,347 of 201,326 authors (top 56%)#24,919 in LG (top 59%)

23 Papers

NAOct 5, 2014

Approximating spectral densities of large matrices

Lin Lin, Yousef Saad, Chao Yang

In physics, it is sometimes desirable to compute the so-called \emph{Density Of States} (DOS), also known as the \emph{spectral density}, of a real symmetric matrix $A$. The spectral density can be viewed as a probability density distribution that measures the likelihood of finding eigenvalues near some point on the real line. The most straightforward way to obtain this density is to compute all eigenvalues of $A$. But this approach is generally costly and wasteful, especially for matrices of large dimension. There exists alternative methods that allow us to estimate the spectral density function at much lower cost. The major computational cost of these methods is in multiplying $A$ with a number of vectors, which makes them appealing for large-scale problems where products of the matrix $A$ with arbitrary vectors are relatively inexpensive. This paper defines the problem of estimating the spectral density carefully, and discusses how to measure the accuracy of an approximate spectral density. It then surveys a few known methods for estimating the spectral density, and proposes some new variations of existing methods. All methods are discussed from a numerical linear algebra point of view.

NAAug 5, 2014

Efficient estimation of eigenvalue counts in an interval

Edoardo Di Napoli, Eric Polizzi, Yousef Saad

Estimating the number of eigenvalues located in a given interval of a large sparse Hermitian matrix is an important problem in certain applications and it is a prerequisite of eigensolvers based on a divide-and-conquer paradigm. Often an exact count is not necessary and methods based on stochastic estimates can be utilized to yield rough approximations. This paper examines a number of techniques tailored to this specific task. It reviews standard approaches and explores new ones based on polynomial and rational approximation filtering combined with a stochastic procedure.

NADec 26, 2015

A Thick-Restart Lanczos algorithm with polynomial filtering for Hermitian eigenvalue problems

Ruipeng Li, Yuanzhe Xi, Eugene Vecharynski et al.

Polynomial filtering can provide a highly effective means of computing all eigenvalues of a real symmetric (or complex Hermitian) matrix that are located in a given interval, anywhere in the spectrum. This paper describes a technique for tackling this problem by combining a Thick-Restart version of the Lanczos algorithm with deflation (`locking') and a new type of polynomial filters obtained from a least-squares technique. The resulting algorithm can be utilized in a `spectrum-slicing' approach whereby a very large number of eigenvalues and associated eigenvectors of the matrix are computed by extracting eigenpairs located in different sub-intervals independently from one another.

NAFeb 14, 2018

The Eigenvalues Slicing Library (EVSL): Algorithms, Implementation, and Software

Ruipeng Li, Yuanzhe Xi, Lucas Erlandson et al.

This paper describes a software package called EVSL (for EigenValues Slicing Library) for solving large sparse real symmetric standard and generalized eigenvalue problems. As its name indicates, the package exploits spectrum slicing, a strategy that consists of dividing the spectrum into a number of subintervals and extracting eigenpairs from each subinterval independently. In order to enable such a strategy, the methods implemented in EVSL rely on a quick calculation of the spectral density of a given matrix, or a matrix pair. What distinguishes EVSL from other currently available packages is that EVSL relies entirely on filtering techniques. Polynomial and rational filtering are both implemented and are coupled with Krylov subspace methods and the subspace iteration algorithm. On the implementation side, the package offers interfaces for various scenarios including matrix-free modes, whereby the user can supply his/her own functions to perform matrix-vector operations or to solve sparse linear systems. The paper describes the algorithms in EVSL, provides details on their implementations, and discusses performance issues for the various methods.

NAMay 15, 2017

SMASH: Structured matrix approximation by separation and hierarchy

Difeng Cai, Edmond Chow, Yousef Saad et al.

This paper presents an efficient method to perform Structured Matrix Approximation by Separation and Hierarchy (SMASH), when the original dense matrix is associated with a kernel function. Given points in a domain, a tree structure is first constructed based on an adaptive partitioning of the computational domain to facilitate subsequent approximation procedures. In contrast to existing schemes based on either analytic or purely algebraic approximations, SMASH takes advantage of both approaches and greatly improves the efficiency. The algorithm follows a bottom-up traversal of the tree and is able to perform the operations associated with each node on the same level in parallel. A strong rank-revealing factorization is applied to the initial analytic approximation in the separation regime so that a special structure is incorporated into the final nested bases. As a consequence, the storage is significantly reduced on one hand and a hierarchy of the original grid is constructed on the other hand. Due to this hierarchy, nested bases at upper levels can be computed in a similar way as the leaf level operations but on coarser grids. The main advantages of SMASH include its simplicity of implementation, its flexibility to construct various hierarchical rank structures and a low storage cost. Rigorous error analysis and complexity analysis are conducted to show that this scheme is fast and stable. The efficiency and robustness of SMASH are demonstrated through various test problems arising from integral equations, structured matrices, etc.

NANov 17, 2013

Graph partitioning using matrix values for preconditioning symmetric positive definite systems

Eugene Vecharynski, Yousef Saad, Masha Sosonkina

Prior to the parallel solution of a large linear system, it is required to perform a partitioning of its equations/unknowns. Standard partitioning algorithms are designed using the considerations of the efficiency of the parallel matrix-vector multiplication, and typically disregard the information on the coefficients of the matrix. This information, however, may have a significant impact on the quality of the preconditioning procedure used within the chosen iterative scheme. In the present paper, we suggest a spectral partitioning algorithm, which takes into account the information on the matrix coefficients and constructs partitions with respect to the objective of enhancing the quality of the nonoverlapping additive Schwarz (block Jacobi) preconditioning for symmetric positive definite linear systems. For a set of test problems with large variations in magnitudes of matrix coefficients, our numerical experiments demonstrate a noticeable improvement in the convergence of the resulting solution scheme when using the new partitioning approach.

NANov 29, 2018

Solving the 3D High-Frequency Helmholtz Equation using Contour Integration and Polynomial Preconditioning

Xiao Liu, Yuanzhe Xi, Yousef Saad et al.

We propose an iterative solution method for the 3D high-frequency Helmholtz equation that exploits a contour integral formulation of spectral projectors. In this framework, the solution in certain invariant subspaces is approximated by solving complex-shifted linear systems, resulting in faster GMRES iterations due to the restricted spectrum. The shifted systems are solved by exploiting a polynomial fixed-point iteration, which is a robust scheme even if the magnitude of the shift is small. Numerical tests in 3D indicate that $O(n^{1/3})$ matrix-vector products are needed to solve a high-frequency problem with a matrix size $n$ with high accuracy. The method has a small storage requirement, can be applied to both dense and sparse linear systems, and is highly parallelizable.

NAAug 28, 2018

Spectrum-Adapted Polynomial Approximation for Matrix Functions

Li Fan, David I Shuman, Shashanka Ubaru et al.

We propose and investigate two new methods to approximate $f({\bf A}){\bf b}$ for large, sparse, Hermitian matrices ${\bf A}$. The main idea behind both methods is to first estimate the spectral density of ${\bf A}$, and then find polynomials of a fixed order that better approximate the function $f$ on areas of the spectrum with a higher density of eigenvalues. Compared to state-of-the-art methods such as the Lanczos method and truncated Chebyshev expansion, the proposed methods tend to provide more accurate approximations of $f({\bf A}){\bf b}$ at lower polynomial orders, and for matrices ${\bf A}$ with a large number of distinct interior eigenvalues and a small spectral width.

LGOct 22, 2022

An Efficient Nonlinear Acceleration method that Exploits Symmetry of the Hessian

Huan He, Shifan Zhao, Ziyuan Tang et al.

Nonlinear acceleration methods are powerful techniques to speed up fixed-point iterations. However, many acceleration methods require storing a large number of previous iterates and this can become impractical if computational resources are limited. In this paper, we propose a nonlinear Truncated Generalized Conjugate Residual method (nlTGCR) whose goal is to exploit the symmetry of the Hessian to reduce memory usage. The proposed method can be interpreted as either an inexact Newton or a quasi-Newton method. We show that, with the help of global strategies like residual check techniques, nlTGCR can converge globally for general nonlinear problems and that under mild conditions, nlTGCR is able to achieve superlinear convergence. We further analyze the convergence of nlTGCR in a stochastic setting. Numerical results demonstrate the superiority of nlTGCR when compared with several other competitive baseline approaches on a few problems. Our code will be available in the future.

NAFeb 14, 2018

A Posteriori Error Estimate for Computing $\mathrm{tr}(f(A))$ by Using the Lanczos Method

Jie Chen, Yousef Saad

An outstanding problem when computing a function of a matrix, $f(A)$, by using a Krylov method is to accurately estimate errors when convergence is slow. Apart from the case of the exponential function which has been extensively studied in the past, there are no well-established solutions to the problem. Often the quantity of interest in applications is not the matrix $f(A)$ itself, but rather, matrix-vector products or bilinear forms. When the computation related to $f(A)$ is a building block of a larger problem (e.g., approximately computing its trace), a consequence of the lack of reliable error estimates is that the accuracy of the computed result is unknown. In this paper, we consider the problem of computing $\mathrm{tr}(f(A))$ for a symmetric positive-definite matrix $A$ by using the Lanczos method and make two contributions: (i) we propose an error estimate for the bilinear form associated with $f(A)$, and (ii) an error estimate for the trace of $f(A)$. We demonstrate the practical usefulness of these estimates for large matrices and in particular, show that the trace error estimate is indicative of the number of accurate digits. As an application, we compute the log-determinant of a covariance matrix in Gaussian process analysis and underline the importance of error tolerance as a stopping criterion, as a means of bounding the number of Lanczos steps to achieve a desired accuracy.

NANov 26, 2017

Beyond AMLS: Domain decomposition with rational filtering

Vassilis Kalantzis, Yuanzhe Xi, Yousef Saad

This paper proposes a rational filtering domain decomposition technique for the solution of large and sparse symmetric generalized eigenvalue problems. The proposed technique is purely algebraic and decomposes the eigenvalue problem associated with each subdomain into two disjoint subproblems. The first subproblem is associated with the interface variables and accounts for the interaction among neighboring subdomains. To compute the solution of the original eigenvalue problem at the interface variables we leverage ideas from contour integral eigenvalue solvers. The second subproblem is associated with the interior variables in each subdomain and can be solved in parallel among the different subdomains using real arithmetic only. Compared to rational filtering projection methods applied to the original matrix pencil, the proposed technique integrates only a part of the matrix resolvent while it applies any orthogonalization necessary to vectors whose length is equal to the number of interface variables. In addition, no estimation of the number of eigenvalues lying inside the interval of interest is needed. Numerical experiments performed in distributed memory architectures illustrate the competitiveness of the proposed technique against rational filtering Krylov approaches.

NANov 24, 2025

Designing Preconditioners for SGD: Local Conditioning, Noise Floors, and Basin Stability

Mitchell Scott, Tianshi Xu, Ziyuan Tang et al.

Stochastic Gradient Descent (SGD) often slows in the late stage of training due to anisotropic curvature and gradient noise. We analyze preconditioned SGD in the geometry induced by a symmetric positive definite matrix $\mathbf{M}$, deriving bounds in which both the convergence rate and the stochastic noise floor are governed by $\mathbf{M}$-dependent quantities: the rate through an effective condition number in the $\mathbf{M}$-metric, and the floor through the product of that condition number and the preconditioned noise level. For nonconvex objectives, we establish a preconditioner-dependent basin-stability guarantee: when smoothness and basin size are measured in the $\mathbf{M}$-norm, the probability that the iterates remain in a well-behaved local region admits an explicit lower bound. This perspective is particularly relevant in Scientific Machine Learning (SciML), where achieving small training loss under stochastic updates is closely tied to physical fidelity, numerical stability, and constraint satisfaction. The framework applies to both diagonal/adaptive and curvature-aware preconditioners and yields a simple design principle: choose $\mathbf{M}$ to improve local conditioning while attenuating noise. Experiments on a quadratic diagnostic and three SciML benchmarks validate the predicted rate-floor behavior.

LGOct 6, 2021

GDA-AM: On the effectiveness of solving minimax optimization via Anderson Acceleration

Huan He, Shifan Zhao, Yuanzhe Xi et al.

Many modern machine learning algorithms such as generative adversarial networks (GANs) and adversarial training can be formulated as minimax optimization. Gradient descent ascent (GDA) is the most commonly used algorithm due to its simplicity. However, GDA can converge to non-optimal minimax points. We propose a new minimax optimization framework, GDA-AM, that views the GDAdynamics as a fixed-point iteration and solves it using Anderson Mixing to con-verge to the local minimax. It addresses the diverging issue of simultaneous GDAand accelerates the convergence of alternating GDA. We show theoretically that the algorithm can achieve global convergence for bilinear problems under mild conditions. We also empirically show that GDA-AMsolves a variety of minimax problems and improves GAN training on several datasets

LGJun 22, 2021

Graph coarsening: From scientific computing to machine learning

Jie Chen, Yousef Saad, Zechen Zhang

The general method of graph coarsening or graph reduction has been a remarkably useful and ubiquitous tool in scientific computing and it is now just starting to have a similar impact in machine learning. The goal of this paper is to take a broad look into coarsening techniques that have been successfully deployed in scientific computing and see how similar principles are finding their way in more recent applications related to machine learning. In scientific computing, coarsening plays a central role in algebraic multigrid methods as well as the related class of multilevel incomplete LU factorizations. In machine learning, graph coarsening goes under various names, e.g., graph downsampling or graph reduction. Its goal in most cases is to replace some original graph by one which has fewer nodes, but whose structure and characteristics are similar to those of the original graph. As will be seen, a common strategy in these methods is to rely on spectral properties to define the coarse graph.

NAOct 8, 2018

Find the dimension that counts: Fast dimension estimation and Krylov PCA

Shashanka Ubaru, Abd-Krim Seghouane, Yousef Saad

High dimensional data and systems with many degrees of freedom are often characterized by covariance matrices. In this paper, we consider the problem of simultaneously estimating the dimension of the principal (dominant) subspace of these covariance matrices and obtaining an approximation to the subspace. This problem arises in the popular principal component analysis (PCA), and in many applications of machine learning, data analysis, signal and image processing, and others. We first present a novel method for estimating the dimension of the principal subspace. We then show how this method can be coupled with a Krylov subspace method to simultaneously estimate the dimension and obtain an approximation to the subspace. The dimension estimation is achieved at no additional cost. The proposed method operates on a model selection framework, where the novel selection criterion is derived based on random matrix perturbation theory ideas. We present theoretical analyses which (a) show that the proposed method achieves strong consistency (i.e., yields optimal solution as the number of data-points $n\rightarrow \infty$), and (b) analyze conditions for exact dimension estimation in the finite $n$ case. Using recent results, we show that our algorithm also yields near optimal PCA. The proposed method avoids forming the sample covariance matrix (associated with the data) explicitly and computing the complete eigen-decomposition. Therefore, the method is inexpensive, which is particularly advantageous in modern data applications where the covariance matrices can be very large. Numerical experiments illustrate the performance of the proposed method in various applications.

NANov 1, 2017

Sampling and multilevel coarsening algorithms for fast matrix approximations

Shashanka Ubaru, Yousef Saad

This paper addresses matrix approximation problems for matrices that are large, sparse and/or that are representations of large graphs. To tackle these problems, we consider algorithms that are based primarily on coarsening techniques, possibly combined with random sampling. A multilevel coarsening technique is proposed which utilizes a hypergraph associated with the data matrix and a graph coarsening strategy based on column matching. Theoretical results are established that characterize the quality of the dimension reduction achieved by a coarsening step, when a proper column matching strategy is employed. We consider a number of standard applications of this technique as well as a few new ones. Among the standard applications we first consider the problem of computing the partial SVD for which a combination of sampling and coarsening yields significantly improved SVD results relative to sampling alone. We also consider the Column subset selection problem, a popular low rank approximation method used in data related applications, and show how multilevel coarsening can be adapted for this problem. Similarly, we consider the problem of graph sparsification and show how coarsening techniques can be employed to solve it. Numerical experiments illustrate the performances of the methods in various applications.

OCMay 29, 2017

Solving Almost all Systems of Random Quadratic Equations

Gang Wang, Georgios B. Giannakis, Yousef Saad et al.

This paper deals with finding an $n$-dimensional solution $x$ to a system of quadratic equations of the form $y_i=|\langle{a}_i,x\rangle|^2$ for $1\le i \le m$, which is also known as phase retrieval and is NP-hard in general. We put forth a novel procedure for minimizing the amplitude-based least-squares empirical loss, that starts with a weighted maximal correlation initialization obtainable with a few power or Lanczos iterations, followed by successive refinements based upon a sequence of iteratively reweighted (generalized) gradient iterations. The two (both the initialization and gradient flow) stages distinguish themselves from prior contributions by the inclusion of a fresh (re)weighting regularization technique. The overall algorithm is conceptually simple, numerically scalable, and easy-to-implement. For certain random measurement models, the novel procedure is shown capable of finding the true solution $x$ in time proportional to reading the data $\{(a_i;y_i)\}_{1\le i \le m}$. This holds with high probability and without extra assumption on the signal $x$ to be recovered, provided that the number $m$ of equations is some constant $c>0$ times the number $n$ of unknowns in the signal vector, namely, $m>cn$. Empirically, the upshots of this contribution are: i) (almost) $100\%$ perfect signal recovery in the high-dimensional (say e.g., $n\ge 2,000$) regime given only an information-theoretic limit number of noiseless equations, namely, $m=2n-1$ in the real-valued Gaussian case; and, ii) (nearly) optimal statistical accuracy in the presence of additive noise of bounded support. Finally, substantial numerical tests using both synthetic data and real images corroborate markedly improved signal recovery performance and computational efficiency of our novel procedure relative to state-of-the-art approaches.

NAJun 20, 2017

Fast computation of spectral densities for generalized eigenvalue problems

Yuanzhe Xi, Ruipeng Li, Yousef Saad

The distribution of the eigenvalues of a Hermitian matrix (or of a Hermitian matrix pencil) reveals important features of the underlying problem, whether a Hamiltonian system in physics, or a social network in behavioral sciences. However, computing all the eigenvalues explicitly is prohibitively expensive for real-world applications. This paper presents two types of methods to efficiently estimate the spectral density of a matrix pencil $(A, B)$ when both $A$ and $B$ are Hermitian and, in addition, $B$ is positive definite. The first one is based on the Kernel Polynomial Method (KPM) and the second on Gaussian quadrature by the Lanczos procedure. By employing Chebyshev polynomial approximation techniques, we can avoid direct factorizations in both methods, making the resulting algorithms suitable for large matrices. Under some assumptions, we prove bounds that suggest that the Lanczos method converges twice as fast as the KPM method. Numerical examples further indicate that the Lanczos method can provide more accurate spectral densities when the eigenvalue distribution is highly non-uniform. As an application, we show how to use the computed spectral density to partition the spectrum into intervals that contain roughly the same number of eigenvalues. This procedure, which makes it possible to compute the spectrum by parts, is a key ingredient in the new breed of eigensolvers that exploit "spectrum slicing".

LGFeb 28, 2017

Low-rank Label Propagation for Semi-supervised Learning with 100 Millions Samples

Raphael Petegrosso, Wei Zhang, Zhuliu Li et al.

The success of semi-supervised learning crucially relies on the scalability to a huge amount of unlabelled data that are needed to capture the underlying manifold structure for better classification. Since computing the pairwise similarity between the training data is prohibitively expensive in most kinds of input data, currently, there is no general ready-to-use semi-supervised learning method/tool available for learning with tens of millions or more data points. In this paper, we adopted the idea of two low-rank label propagation algorithms, GLNP (Global Linear Neighborhood Propagation) and Kernel Nyström Approximation, and implemented the parallelized version of the two algorithms accelerated with Nesterov's accelerated projected gradient descent for Big-data Label Propagation (BigLP). The parallel algorithms are tested on five real datasets ranging from 7000 to 10,000,000 in size and a simulation dataset of 100,000,000 samples. In the experiments, the implementation can scale up to datasets with 100,000,000 samples and hundreds of features and the algorithms also significantly improved the prediction accuracy when only a very small percentage of the data is labeled. The results demonstrate that the BigLP implementation is highly scalable to big data and effective in utilizing the unlabeled data for semi-supervised learning.

NAAug 19, 2016

Fast estimation of approximate matrix ranks using spectral densities

Shashanka Ubaru, Yousef Saad, Abd-Krim Seghouane

In many machine learning and data related applications, it is required to have the knowledge of approximate ranks of large data matrices at hand. In this paper, we present two computationally inexpensive techniques to estimate the approximate ranks of such large matrices. These techniques exploit approximate spectral densities, popular in physics, which are probability density distributions that measure the likelihood of finding eigenvalues of the matrix at a given point on the real line. Integrating the spectral density over an interval gives the eigenvalue count of the matrix in that interval. Therefore the rank can be approximated by integrating the spectral density over a carefully selected interval. Two different approaches are discussed to estimate the approximate rank, one based on Chebyshev polynomials and the other based on the Lanczos algorithm. In order to obtain the appropriate interval, it is necessary to locate a gap between the eigenvalues that correspond to noise and the relevant eigenvalues that contribute to the matrix rank. A method for locating this gap and selecting the interval of integration is proposed based on the plot of the spectral density. Numerical experiments illustrate the performance of these techniques on matrices from typical applications.

ITDec 30, 2015

Low rank approximation and decomposition of large matrices using error correcting codes

Shashanka Ubaru, Arya Mazumdar, Yousef Saad

Low rank approximation is an important tool used in many applications of signal processing and machine learning. Recently, randomized sketching algorithms were proposed to effectively construct low rank approximations and obtain approximate singular value decompositions of large matrices. Similar ideas were used to solve least squares regression problems. In this paper, we show how matrices from error correcting codes can be used to find such low rank approximations and matrix decompositions, and extend the framework to linear least squares regression problems. The benefits of using these code matrices are the following: (i) They are easy to generate and they reduce randomness significantly. (ii) Code matrices with mild properties satisfy the subspace embedding property, and have a better chance of preserving the geometry of an entire subspace of vectors. (iii) For parallel and distributed applications, code matrices have significant advantages over structured random matrices and Gaussian random matrices. (iv) Unlike Fourier or Hadamard transform matrices, which require sampling $O(k\log k)$ columns for a rank-$k$ approximation, the log factor is not necessary for certain types of code matrices. That is, $(1+ε)$ optimal Frobenius norm error can be achieved for a rank-$k$ approximation with $O(k/ε)$ samples. (v) Fast multiplication is possible with structured code matrices, so fast approximations can be achieved for general dense input matrices. (vi) For least squares regression problem $\min\|Ax-b\|_2$ where $A\in \mathbb{R}^{n\times d}$, the $(1+ε)$ relative error approximation can be achieved with $O(d/ε)$ samples, with high probability, when certain code matrices are used.

NAMay 29, 2015

Low-rank correction methods for algebraic domain decomposition preconditioners

Ruipeng Li, Yousef Saad

This paper presents a parallel preconditioning method for distributed sparse linear systems, based on an approximate inverse of the original matrix, that adopts a general framework of distributed sparse matrices and exploits the domain decomposition method and low-rank corrections. The domain decomposition approach decouples the matrix and once inverted, a low-rank approximation is applied by exploiting the Sherman-Morrison-Woodbury formula, which yields two variants of the preconditioning methods. The low-rank expansion is computed by the Lanczos procedure with reorthogonalizations. Numerical experiments indicate that, when combined with Krylov subspace accelerators, this preconditioner can be efficient and robust for solving symmetric sparse linear systems. Comparisons with other distributed-memory preconditioning methods are presented.

NAMay 16, 2015

Schur Complement based domain decomposition preconditioners with Low-rank corrections

Ruipeng Li, Yuanzhe Xi, Yousef Saad

This paper introduces a robust preconditioner for general sparse symmetric matrices, that is based on low-rank approximations of the Schur complement in a Domain Decomposition (DD) framework. In this "Schur Low Rank" (SLR) preconditioning approach, the coefficient matrix is first decoupled by DD, and then a low-rank correction is exploited to compute an approximate inverse of the Schur complement associated with the interface points. The method avoids explicit formation of the Schur complement matrix. We show the feasibility of this strategy for a model problem, and conduct a detailed spectral analysis for the relationship between the low-rank correction and the quality of the preconditioning. Numerical experiments on general matrices illustrate the robustness and efficiency of the proposed approach.