NAApr 2
Linear Systems and Eigenvalue Problems: Open Questions from a Simons WorkshopNoah Amsel, Yves Baumann, Paul Beckman et al. · berkeley
This document presents a series of open questions arising in matrix computations, i.e., the numerical solution of linear algebra problems. It is a result of working groups at the workshop Linear Systems and Eigenvalue Problems, which was organized at the Simons Institute for the Theory of Computing program on Complexity and Linear Algebra in Fall 2025. The complexity and numerical solution of linear algebra problems is a crosscutting area between theoretical computer science and numerical analysis. The value of the particular problem formulations here is that they were produced via discussions between researchers from both groups. The open questions are organized in five categories: iterative solvers for linear systems, eigenvalue computation, low-rank approximation, randomized sketching, and other areas including tensors, quantum systems, and matrix functions.
NAMay 11, 2018
Rational minimax approximation via adaptive barycentric representationsSilviu-Ioan Filip, Yuji Nakatsukasa, Lloyd N. Trefethen et al.
Computing rational minimax approximations can be very challenging when there are singularities on or near the interval of approximation - precisely the case where rational functions outperform polynomials by a landslide. We show that far more robust algorithms than previously available can be developed by making use of rational barycentric representations whose support points are chosen in an adaptive fashion as the approximant is computed. Three variants of this barycentric strategy are all shown to be powerful: (1) a classical Remez algorithm, (2) a "AAA-Lawson" method of iteratively reweighted least-squares, and (3) a differential correction algorithm. Our preferred combination, implemented in the Chebfun MINIMAX code, is to use (2) in an initial phase and then switch to (1) for generically quadratic convergence. By such methods we can calculate approximations up to type (80, 80) of $|x|$ on $[-1, 1]$ in standard 16-digit floating point arithmetic, a problem for which Varga, Ruttan, and Carpenter required 200-digit extended precision.
NAOct 6, 2016
Vector spaces of linearizations for matrix polynomials: a bivariate polynomial approachYuji Nakatsukasa, Vanni Noferini, Alex Townsend
We revisit the landmark paper [D. S. Mackey, N. Mackey, C. Mehl, and V. Mehrmann, SIAM J. Matrix Anal. Appl., 28 (2006), pp.~971--1004] and, by viewing matrices as coefficients for bivariate polynomials, we provide concise proofs for key properties of linearizations for matrix polynomials. We also show that every pencil in the double ansatz space is intrinsically connected to a Bézout matrix, which we use to prove the eigenvalue exclusion theorem. In addition our exposition allows for any polynomial basis and for any field. The new viewpoint also leads to new results. We generalize the double ansatz space by exploiting its algebraic interpretation as a space of Bézout pencils to derive new linearizations with potential applications in the theory of structured matrix polynomials. Moreover, we analyze the conditioning of double ansatz space linearizations in the important practical case of a Chebyshev basis.
NAJan 3, 2018
Rational approximation of $x^n$Yuji Nakatsukasa, Lloyd N. Trefethen
Let $E_{kk}^{(n)}$ denote the minimax (i.e., best supremum norm) error in approximation of $x^n$ on $[\kern .3pt 0,1]$ by rational functions of type $(k,k)$ with $k<n$. We show that in an appropriate limit $E_{kk}^{(n)} \sim 2\kern .3pt H^{k+1/2}$ independently of $n$, where $H \approx 1/9.28903$ is Halphen's constant. This is the same formula as for minimax approximation of $e^x$ on $(-\infty,0\kern .3pt]$.
NAAug 21, 2012
Mysteries around the graph Laplacian eigenvalue 4Yuji Nakatsukasa, Naoki Saito, Ernest Woei
We describe our current understanding on the phase transition phenomenon of the graph Laplacian eigenvectors constructed on a certain type of unweighted trees, which we previously observed through our numerical experiments. The eigenvalue distribution for such a tree is a smooth bell-shaped curve starting from the eigenvalue 0 up to 4. Then, at the eigenvalue 4, there is a sudden jump. Interestingly, the eigenvectors corresponding to the eigenvalues below 4 are semi-global oscillations (like Fourier modes) over the entire tree or one of the branches; on the other hand, those corresponding to the eigenvalues above 4 are much more localized and concentrated (like wavelets) around junctions/branching vertices. For a special class of trees called starlike trees, we obtain a complete understanding of such phase transition phenomenon. For a general graph, we prove the number of the eigenvalues larger than 4 is bounded from above by the number of vertices whose degrees is strictly higher than 2. Moreover, we also prove that if a graph contains a branching path, then the magnitudes of the components of any eigenvector corresponding to the eigenvalue greater than 4 decay exponentially from the branching vertex toward the leaf of that branch.
NAJun 28, 2016
Finding a low-rank basis in a matrix subspaceYuji Nakatsukasa, Tasuku Soma, André Uschmajew
For a given matrix subspace, how can we find a basis that consists of low-rank matrices? This is a generalization of the sparse vector problem. It turns out that when the subspace is spanned by rank-1 matrices, the matrices can be obtained by the tensor CP decomposition. For the higher rank case, the situation is not as straightforward. In this work we present an algorithm based on a greedy process applicable to higher rank problems. Our algorithm first estimates the minimum rank by applying soft singular value thresholding to a nuclear norm relaxation, and then computes a matrix with that rank using the method of alternating projections. We provide local convergence results, and compare our algorithm with several alternative approaches. Applications include data compression beyond the classical truncated SVD, computing accurate eigenvectors of a near-multiple eigenvalue, image separation and graph Laplacian eigenproblems.
NAAug 6, 2010
Gerschgorin's theorem for generalized eigenvalue problems in the Euclidean metricYuji Nakatsukasa
We present Gerschgorin-type eigenvalue inclusion sets applicable to generalized eigenvalue problems.Our sets are defined by circles in the complex plane in the standard Euclidean metric, and are easier to compute than known similar results.As one application we use our results to provide a forward error analysis for a computed eigenvalue of a diagonalizable pencil.
NADec 23, 2017
Fourth-order time-stepping for stiff PDEs on the sphereHadrien Montanelli, Yuji Nakatsukasa
We present in this paper algorithms for solving stiff PDEs on the unit sphere with spectral accuracy in space and fourth-order accuracy in time. These are based on a variant of the double Fourier sphere method in coefficient space with multiplication matrices that differ from the usual ones, and implicit-explicit time-stepping schemes. Operating in coefficient space with these new matrices allows one to use a sparse direct solver, avoids the coordinate singularity and maintains smoothness at the poles, while implicit-explicit schemes circumvent severe restrictions on the time-steps due to stiffness. A comparison is made against exponential integrators and it is found that implicit-explicit schemes perform best. Implementations in MATLAB and Chebfun make it possible to compute the solution of many PDEs to high accuracy in a very convenient fashion.
NAMay 27, 2019
The low-rank eigenvalue problemYuji Nakatsukasa
The nonzero eigenvalues of $AB$ are equal to those of $BA$: an identity that holds as long as the products are square, even when $A,B$ are rectangular. This fact naturally suggests an efficient algorithm for computing eigenvalues and eigenvectors of a low-rank matrix $X= AB$ with $A,B^T\in\mathbb{C}^{N\times r}, N\gg r$: form the small $r\times r$ matrix $BA$ and find its eigenvalues and eigenvectors. For nonzero eigenvalues, the eigenvectors are related by $ ABv = λv \Leftrightarrow BAw = λw $ with $w=Bv$, and the same holds for Jordan vectors. For zero eigenvalues, the Jordan blocks can change sizes between $AB$ and $BA$, and we characterize this behavior.
NANov 1, 2017
Inertia laws and localization of real eigenvalues for generalized indefinite eigenvalue problemsYuji Nakatsukasa, Vanni Noferini
Sylvester's law of inertia states that the number of positive, negative and zero eigenvalues of Hermitian matrices is preserved under congruence transformations. The same is true of generalized Hermitian definite eigenvalue problems, in which the two matrices are allowed to undergo different congruence transformations, but not for the indefinite case. In this paper we investigate the possible change in inertia under congruence for generalized Hermitian indefinite eigenproblems, and derive sharp bounds that show the inertia of the two individual matrices often still provides useful information about the eigenvalues of the pencil, especially when one of the matrices is almost definite. A prominent application of the original Sylvester's law is in finding the number of eigenvalues in an interval. Our results can be used for estimating the number of real eigenvalues in an interval for generalized indefinite and nonlinear eigenvalue problems.
NAJun 14, 2018
Approximate and integrate: Variance reduction in Monte Carlo integration via function approximationYuji Nakatsukasa
Classical algorithms in numerical analysis for numerical integration (quadrature/cubature) follow the principle of approximate and integrate: the integrand is approximated by a simple function (e.g. a polynomial), which is then integrated exactly. In high-dimensional integration, such methods quickly become infeasible due to the curse of dimensionality. A common alternative is the Monte Carlo method (MC), which simply takes the average of random samples, improving the estimate as more and more samples are taken. The main issue with MC is its slow (though dimension-independent) convergence, and various techniques have been proposed to reduce the variance. In this work we suggest a numerical analyst's interpretation of MC: it approximates the integrand with a constant function, and integrates that constant exactly. This observation leads naturally to MC-like methods where the approximant is a non-constant function, for example low-degree polynomials, sparse grids or low-rank functions. We show that these methods have the same $O(1/\sqrt{N})$ asymptotic convergence as in MC, but with reduced variance, equal to the quality of the underlying function approximation. We also discuss methods that improve the approximation quality as more samples are taken, and thus can converge faster than $O(1/\sqrt{N})$. The main message is that techniques in high-dimensional approximation theory can be combined with Monte Carlo integration to accelerate convergence.
NAApr 24, 2018
A Backward Stable Algorithm for Computing the CS Decomposition via the Polar DecompositionEvan S. Gawlik, Yuji Nakatsukasa, Brian D. Sutton
We introduce a backward stable algorithm for computing the CS decomposition of a partitioned $2n \times n$ matrix with orthonormal columns, or a rank-deficient partial isometry. The algorithm computes two $n \times n$ polar decompositions (which can be carried out in parallel) followed by an eigendecomposition of a judiciously crafted $n \times n$ Hermitian matrix. We prove that the algorithm is backward stable whenever the aforementioned decompositions are computed in a backward stable way. Since the polar decomposition and the symmetric eigendecomposition are highly amenable to parallelization, the algorithm inherits this feature. We illustrate this fact by invoking recently developed algorithms for the polar decomposition and symmetric eigendecomposition that leverage Zolotarev's best rational approximations of the sign function. Numerical examples demonstrate that the resulting algorithm for computing the CS decomposition enjoys excellent numerical stability.
NAMay 25
A multilevel sketch-and-solve method for overdetermined least squares problemsIrina-Beatrice Haas, Michael B. Giles, Yuji Nakatsukasa
Sketch-and-solve (SAS) is a very successful method to efficiently estimate the solution of heavily overdetermined large linear least squares problems. It uses random sketching to reduce the size of the problem, hence reducing the computational cost. Several authors have shown that averaging several solutions from SAS further improves the accuracy, which is measured by the residual associated to the approximate solution. Going further, we combine solutions from sketch-and-solve in a multilevel manner, such that the approximate solution is a combination of SAS samples obtained from small sketches and more accurate correction terms obtained from larger sketches. We first consider the variance of the estimator, which depends on the variance of the coarse samples and the correction terms. We show that the variance of the correction terms on each level follows a trend and decreases faster than the variance of the simple SAS estimator. However, we then show that the overall computational cost of our multilevel framework is slightly higher than that of the simple average estimator, so a naive application of multilevel methods appears unattractive for least squares problems.
NAApr 6
Adaptive LSQR Preconditioning from One Small SketchJung Eun Huh, Coralia Cartis, Yuji Nakatsukasa
We propose APLICUR, an adaptive preconditioning framework for large-scale linear least-squares (LLS) problems. Using a single small sketch computed once at initialization, APLICUR incrementally refines a CUR-based preconditioner throughout the Krylov solve, interleaving preconditioning with iteration. This enables early convergence without the need to construct a costly high-quality preconditioner upfront. With a modest sketch dimension (typically 5 - 250), largely independent of both the problem size and numerical rank, APLICUR achieves convergence guarantees that are likewise independent of the sketch size. The method is applicable to general matrices without structural assumptions (e.g. need not be heavily overdetermined) and is well suited to large, sparse, or numerically low-rank problems. We conduct extensive numerical studies to examine the behavior of the proposed framework and guide the effective algorithmic design choices. Across a range of test problems, \mainalg{} achieves competitive or improved time-to-accuracy performance compared with established randomized preconditioners, including Blendenpik and Nyström PCG, while maintaining low setup cost and robustness across problem regimes.
NASep 19, 2010
Quadratic perturbation bounds for generalized eigenvalue problemsYuji Nakatsukasa
We prove quadratic eigenvalue perturbation bounds for generalized Hermitian eigenvalue problems. The bounds are proportional to the square of the norm of the perturbation matrices divided by the gap between the spectrums. Using the results we provide a simple derivation of the first-order perturbation expansion of a multiple eigenvalue, whose trailing term is tighter than known results. We also present quadratic bounds for the non-Hermitian case.
NAApr 17
Towards Universal Convergence of Backward Error in Linear System SolversMichał Dereziński, Yuji Nakatsukasa, Elizaveta Rebrova
The quest for an algorithm that solves an $n\times n$ linear system in $O(n^2)$ time complexity, or $O(n^2 \text{poly}(1/ε))$ when solving up to $ε$ relative error, is a long-standing open problem in numerical linear algebra and theoretical computer science. There are two predominant paradigms for measuring relative error: forward error (i.e., distance from the output to the optimum solution) and backward error (i.e., distance to the nearest problem solved by the output). In most prior studies, convergence of iterative linear system solvers is measured via various notions of forward error, and as a result, depends heavily on the conditioning of the input. Yet, the numerical analysis literature has long advocated for backward error as the more practically relevant notion of approximation. In this work, we show that -- surprisingly -- the classical and simple Richardson iteration incurs at most $1/k$ (relative) backward error after $k$ iterations on any positive semidefinite (PSD) linear system, irrespective of its condition number. This universal convergence rate implies an $O(n^2/ε)$ complexity algorithm for solving a PSD linear system to $ε$ backward error, and we establish similar or better complexity when using a variety of Krylov solvers beyond Richardson. Then, by directly minimizing backward error over a Krylov subspace, we attain an even faster $O(1/k^2)$ universal rate, and we turn this into an efficient algorithm, MINBERR, with complexity $O(n^2/\sqrtε)$. We extend this approach via normal equations to solving general linear systems, for which we empirically observe $O(1/k)$ convergence. We report strong numerical performance of our algorithms on benchmark problems.
NAJul 12, 2011
On the condition numbers of a multiple generalized eigenvalueYuji Nakatsukasa
For standard eigenvalue problems, a closed-form expression for the condition numbers of a multiple eigenvalue is known. In particular, they are uniformly 1 in the Hermitian case, and generally take different values in the non-Hermitian case. We consider the generalized eigenvalue problem and identify the condition numbers of a multiple eigenvalue. Our main result is that a multiple eigenvalue generally has multiple condition numbers, even in the Hermitian definite case. The condition numbers are characterized in terms of the singular values of the outer product of the corresponding left and right eigenvectors.
NAAug 31, 2010
Perturbation bounds of eigenvalues of Hermitian matrices with block structuresYuji Nakatsukasa
We derive new perturbation bounds for eigenvalues of Hermitian matrices with block structures. The structures we consider range from a standard 2-by-2 block form to block tridiagonal and tridigaonal forms. The main idea is the observation that an eigenvalue is insensitive to componentwise perturbations if the corresponding eigenvector components are small. We show that the same idea can be used to explain two well-known phenomena, one concerning extremal eigenvalues of Wilkinson's matrices and another concerning the efficiency of aggressive early deflation applied to the symmetric tridiagonal QR algorithm.
NAMay 6
Finding accurate eigenvalues and eigenvectors of positive semi-definite matrices given a subspaceYuji Nakatsukasa, Zheng Tang
We revisit a classical problem in numerical linear algebra: given an $k$-dimensional subspace $\mathcal{Q}$ that approximates the leading eigenspace of an $n\times n$ positive semi-definite matrix $A$, the goal is to extract high-accuracy eigenvalues. The Rayleigh-Ritz (RR) method is the standard algorithm for the task, which has been shown to be optimal in several ways (when $A$ is symmetric, not necessarily positive semi-definite $A\succeq 0$). In this paper, we show that when $A \succeq 0$, alternative methods can outperform RR, while having the same computational complexity, that is, the main cost is in computing $AQ$, plus an $O(nk^2)$ term. In particular, we advocate the use of Nystr{ö}m's method, showing that the approximate eigenvalues always have higher accuracy than RR, and the improvement can be arbitrarily large. The difference is significant, especially when $A$ has a fast-decaying spectrum. A similar improvement is numerically observed for the purpose of approximating the leading eigenvectors. In contrast, when the target eigenvalues are the trailing ones, the situation is reversed, and the Nystr{ö}m method performs poorly; we suggest a remedy for this situation.
NEApr 4, 2020
Rational neural networksNicolas Boullé, Yuji Nakatsukasa, Alex Townsend
We consider neural networks with rational activation functions. The choice of the nonlinear activation function in deep learning architectures is crucial and heavily impacts the performance of a neural network. We establish optimal bounds in terms of network complexity and prove that rational neural networks approximate smooth functions more efficiently than ReLU networks with exponentially smaller depth. The flexibility and smoothness of rational activation functions make them an attractive alternative to ReLU, as we demonstrate with numerical experiments.
SPJul 26, 2019
Classification of chaotic time series with deep learningNicolas Boullé, Vassilios Dallas, Yuji Nakatsukasa et al.
We use standard deep neural networks to classify univariate time series generated by discrete and continuous dynamical systems based on their chaotic or non-chaotic behaviour. Our approach to circumvent the lack of precise models for some of the most challenging real-life applications is to train different neural networks on a data set from a dynamical system with a basic or low-dimensional phase space and then use these networks to classify univariate time series of a dynamical system with more intricate or high-dimensional phase space. We illustrate this generalisation approach using the logistic map, the sine-circle map, the Lorenz system, and the Kuramoto--Sivashinsky equation. We observe that a convolutional neural network without batch normalization layers outperforms state-of-the-art neural networks for time series classification and is able to generalise and classify time series as chaotic or not with high accuracy.
NASep 28, 2018
Shifted CholeskyQR for computing the QR factorization of ill-conditioned matricesTakeshi Fukaya, Ramaseshan Kannan, Yuji Nakatsukasa et al.
The Cholesky QR algorithm is an efficient communication-minimizing algorithm for computing the QR factorization of a tall-skinny matrix. Unfortunately it has the inherent numerical instability and breakdown when the matrix is ill-conditioned. A recent work establishes that the instability can be cured by repeating the algorithm twice (called CholeskyQR2). However, the applicability of CholeskyQR2 is still limited by the requirement that the Cholesky factorization of the Gram matrix runs to completion, which means it does not always work for matrices $X$ with $κ_2(X)\gtrsim {\bf u}^{-\frac{1}{2}}$ where ${\bf u}$ is the unit roundoff. In this work we extend the applicability to $κ_2(X)=\mathcal{O}({\bf u}^{-1})$ by introducing a shift to the computed Gram matrix so as to guarantee the Cholesky factorization $R^TR= A^TA+sI$ succeeds numerically. We show that the computed $AR^{-1}$ has reduced condition number $\leq {\bf u}^{-\frac{1}{2}}$, for which CholeskyQR2 safely computes the QR factorization, yielding a computed $Q$ of orthogonality $\|Q^TQ-I\|_2$ and residual $\|A-QR\|_F/\|A\|_F$ both $\mathcal{O}({\bf u})$. Thus we obtain the required QR factorization by essentially running Cholesky QR thrice. We extensively analyze the resulting algorithm shiftedCholeskyQR to reveal its excellent numerical stability. shiftedCholeskyQR is also highly parallelizable, and applicable and effective also when working in an oblique inner product space. We illustrate our findings through experiments, in which we achieve significant (up to x40) speedup over alternative methods.
NAJun 12, 2017
The AAA algorithm for rational approximationYuji Nakatsukasa, Olivier Sète, Lloyd N. Trefethen
We introduce a new algorithm for approximation by rational functions on a real or complex set of points, implementable in 40 lines of Matlab and requiring no user input parameters. Even on a disk or interval the algorithm may outperform existing methods, and on more complicated domains it is especially competitive. The core ideas are (1) representation of the rational approximant in barycentric form with interpolation at certain support points and (2) greedy selection of the support points to avoid exponential instabilities. The name AAA stands for "adaptive Antoulas--Anderson" in honor of the authors who introduced a scheme based on (1). We present the core algorithm with a Matlab code and nine applications and describe variants targeted at problems of different kinds. Comparisons are made with vector fitting, RKFIT, and other existing methods for rational approximation.