Rikhav Shah

h-index5

8papers

8citations

Novelty51%

AI Score51

Ranked #38,788 of 201,326 authors (top 19%)#51 in NA (top 9%)

8 Papers

NAApr 2

Linear Systems and Eigenvalue Problems: Open Questions from a Simons Workshop

Noah Amsel, Yves Baumann, Paul Beckman et al. · berkeley

This document presents a series of open questions arising in matrix computations, i.e., the numerical solution of linear algebra problems. It is a result of working groups at the workshop Linear Systems and Eigenvalue Problems, which was organized at the Simons Institute for the Theory of Computing program on Complexity and Linear Algebra in Fall 2025. The complexity and numerical solution of linear algebra problems is a crosscutting area between theoretical computer science and numerical analysis. The value of the particular problem formulations here is that they were produced via discussions between researchers from both groups. The open questions are organized in five categories: iterative solvers for linear systems, eigenvalue computation, low-rank approximation, randomized sketching, and other areas including tensors, quantum systems, and matrix functions.

NAMay 29

Spectral density estimation for normal matrices

Cameron Musco, Christopher Musco, Rikhav Shah et al.

The spectral density estimation problem asks for an algorithm that, given an $n\times n$ matrix $A$, outputs a probability measure that is a good approximation to the uniform distribution on the eigenvalues of $A$, called the spectral density of $A$. This paper considers the setting where $A$ is a large normal matrix that is accessible only through matrix-vector product queries. We provide an algorithm that makes just $m$ matrix-vector queries to $A$ and returns, with high probability, a measure within earth mover's distance $O(1/m+\log m/{\sqrt n})$ of the true spectral density of $A$. We provide a complementary lower bound that any algorithm producing an $\varepsilon$-approximation to the true spectral density for large matrices must make $Ω(1/\varepsilon)$ matrix-vector queries. The lower bound holds even for the more restricted case of real symmetric input matrices. In combination with our upper bound, it shows that spectral density estimation is essentially no harder for complex normal matrices than for real symmetric matrices.

LGMay 16

DynMuon: A Dynamic Spectral Shaping View of Muon

Fangzhou Wu, Rikhav Shah, Sandeep Silwal et al.

In recent years, Muon has emerged as the dominant method for training large language models, and transformers more broadly. The essential difference, when compared to standard gradient descent methods, is to replace the usual update matrix $M=UΣV^\top$ with its polar factor $UV^\top$. In this work, we consider a class of Muon-like updates, where we replace the update $M$ with $UΣ^p V^\top$ for some parameter $p$. We call this a "spectral-shaping" operation, and develop a theory of how to pick $p$ which depends on (a) local curvature of the loss function, (b) noise stemming from stochastic gradients and label noise, and (c) training stage. Our theory and experimentation reveal a previously overlooked behavior: positive $p$ helps early by emphasizing high-curvature directions and accelerating signal contraction, while mildly negative $p$ helps later by reallocating update strength toward low-curvature directions that still contain useful training signals. Building on the insight, we propose DynMuon, an efficient dynamic spectral shaping method that schedules $p$ from positive to mildly negative over training. Extensive experiments across model sizes, architectures, and training settings show that DynMuon consistently achieves lower validation loss than Muon, while requiring 10.6-26.5% fewer steps to reach the same target loss.

PRApr 10

Sparse Pseudospectral Shattering

Rikhav Shah, Nikhil Srivastava, Edward Zeng

The eigenvalues and eigenvectors of nonnormal matrices can be unstable under perturbations of their entries. This renders an obstacle to the analysis of numerical algorithms for non-Hermitian eigenvalue problems. A recent technique to handle this issue is pseudospectral shattering [BGVKS23], showing that adding a random perturbation to any matrix has a regularizing effect on the stability of the eigenvalues and eigenvectors. Prior work has analyzed the regularizing effect of dense Gaussian perturbations, where independent noise is added to every entry of a given matrix [BVKS20, BGVKS23, BKMS21, JSS21]. We show that the same effect can be achieved by adding a sparse random perturbation. In particular, we show that given any $n\times n$ matrix $M$ of polynomially bounded norm: (a) perturbing $O(n\log^2(n))$ random entries of $M$ by adding i.i.d. complex Gaussians yields $\logκ_V(A)=O(\text{poly}\log(n))$ and $\log (1/η(A))=O(\text{poly}\log(n))$ with high probability; (b) perturbing $O(n^{1+α})$ random entries of $M$ for any constant $α>0$ yields $\logκ_V(A)=O_α(\log(n))$ and $\log(1/η(A))=O_α(\log(n))$ with high probability. Here, $κ_V(A)$ denotes the condition number of the eigenvectors of the perturbed matrix $A$ and $η(A)$ denotes its minimum eigenvalue gap. A key mechanism of the proof is to reduce the study of $κ_V(A)$ to control of the pseudospectral area and minimum eigenvalue gap of $A$, which are further reduced to estimates on the least two singular values of shifts of $A$. We obtain the required least singular value estimates via a streamlining of an argument of Tao and Vu [TV07] specialized to the case of sparse complex Gaussian perturbations. [Rest of abstract in pdf].

NAApr 16

How ill-conditioned can submatrices of the Fourier matrix be?

Rikhav Shah, John Urschel

The discrete Fourier transform matrix is one of the most important matrices in linear algebra, and submatrices of it arise in a variety of applications. Though the discrete Fourier transform matrix is unitary, its submatrices can be exponentially ill-conditioned, an obstacle to accurate computation. This work resolves the exact rate of the exponential ill-conditioning for square submatrices with contiguous rows and columns. As a consequence, we obtain a tight upper bound of $2 G/π$ on the exponential rate for all submatrices with contiguous columns, where $G$ is Catalan's constant. These results follow from a more general analysis of Vandermonde and Vandermonde-like matrices for which similarly tight bounds are developed.

NAMar 12

Matrix Factorizations with Uniformly Random Pivoting

Isabel Detherage, Rikhav Shah

This paper highlights a formal connection between two families of widely used matrix factorization algorithms in numerical linear algebra. One family consists of the Jacobi eigenvalue algorithm and its variants for computing the Hermitian eigendecomposition and singular value decomposition. The other consists of Gaussian elimination and the Gram-Schmidt procedure with various pivoting rules for computing the Cholesky decomposition and QR decomposition respectively. Both families are cast as special cases of a more general class of factorization algorithms. We provide a randomized pivoting rule that applies to this general class (which differs substantially from the usual pivoting rules for Gaussian elimination / Gram-Schmidt) which admits a unified analysis of the entire class of algorithms. The result is the same linear rate of convergence for each algorithm, irrespective of which factorization it computes. One important consequence of this randomized pivoting rule is a provable polynomial bound on the numerical stability of the Jacobi eigenvalue algorithm without any preconditioning, which addresses a longstanding open problem of Demmel and VeseliÄ `92.

DSOct 2, 2025

Even Faster Kernel Matrix Linear Algebra via Density Estimation

Rikhav Shah, Sandeep Silwal, Haike Xu

This paper studies the use of kernel density estimation (KDE) for linear algebraic tasks involving the kernel matrix of a collection of $n$ data points in $\mathbb R^d$. In particular, we improve upon existing algorithms for computing the following up to $(1+\varepsilon)$ relative error: matrix-vector products, matrix-matrix products, the spectral norm, and sum of all entries. The runtimes of our algorithms depend on the dimension $d$, the number of points $n$, and the target error $\varepsilon$. Importantly, the dependence on $n$ in each case is far lower when accessing the kernel matrix through KDE queries as opposed to reading individual entries. Our improvements over existing best algorithms (particularly those of Backurs, Indyk, Musco, and Wagner '21) for these tasks reduce the polynomial dependence on $\varepsilon$, and additionally decreases the dependence on $n$ in the case of computing the sum of all entries of the kernel matrix. We complement our upper bounds with several lower bounds for related problems, which provide (conditional) quadratic time hardness results and additionally hint at the limits of KDE based approaches for the problems we study.

LGDec 2, 2019

Using Dimensionality Reduction to Optimize t-SNE

Rikhav Shah, Sandeep Silwal

t-SNE is a popular tool for embedding multi-dimensional datasets into two or three dimensions. However, it has a large computational cost, especially when the input data has many dimensions. Many use t-SNE to embed the output of a neural network, which is generally of much lower dimension than the original data. This limits the use of t-SNE in unsupervised scenarios. We propose using \textit{random} projections to embed high dimensional datasets into relatively few dimensions, and then using t-SNE to obtain a two dimensional embedding. We show that random projections preserve the desirable clustering achieved by t-SNE, while dramatically reducing the runtime of finding the embedding.