NAJul 16, 2014
Universality in Numerical Computations with Random Data. Case StudiesPercy Deift, Govind Menon, Sheehan Olver et al.
The authors present evidence for universality in numerical computations with random data. Given a (possibly stochastic) numerical algorithm with random input data, the time (or number of iterations) to convergence (within a given tolerance) is a random variable, called the halting time. Two-component universality is observed for the fluctuations of the halting time, i.e., the histogram for the halting times, centered by the sample average and scaled by the sample variance, collapses to a universal curve, independent of the input data distribution, as the dimension increases. Thus, up to two components, the sample average and the sample variance, the statistics for the halting time are universally prescribed. The case studies include six standard numerical algorithms, as well as a model of neural computation and decision making. A link to relevant software is provided in for the reader who would like to do computations of his'r own.
NAMay 15, 2013
How long does it take to compute the eigenvalues of a random symmetric matrix?Christian W. Pfrang, Percy Deift, Govind Menon
We present the results of an empirical study of the performance of the QR algorithm (with and without shifts) and the Toda algorithm on random symmetric matrices. The random matrices are chosen from six ensembles, four of which lie in the Wigner class. For all three algorithms, we observe a form of universality for the deflation time statistics for random matrices within the Wigner class. For these ensembles, the empirical distribution of a normalized deflation time is found to collapse onto a curve that depends only on the algorithm, but not on the matrix size or deflation tolerance provided the matrix size is large enough (see Figure 4, Figure 7 and Figure 10). For the QR algorithm with the Wilkinson shift, the observed universality is even stronger and includes certain non-Wigner ensembles. Our experiments also provide a quantitative statistical picture of the accelerated convergence with shifts.
DSOct 22, 2022
Deep Linear Networks for Matrix Completion -- An Infinite Depth LimitNadav Cohen, Govind Menon, Zsolt Veraszto
The deep linear network (DLN) is a model for implicit regularization in gradient based optimization of overparametrized learning architectures. Training the DLN corresponds to a Riemannian gradient flow, where the Riemannian metric is defined by the architecture of the network and the loss function is defined by the learning task. We extend this geometric framework, obtaining explicit expressions for the volume form, including the case when the network has infinite depth. We investigate the link between the Riemannian geometry and the training asymptotics for matrix completion with rigorous analysis and numerics. We propose that implicit regularization is a result of bias towards high state space volume.
NANov 6, 2016
Smoothed Analysis for the Conjugate Gradient AlgorithmGovind Menon, Thomas Trogdon
The purpose of this paper is to establish bounds on the rate of convergence of the conjugate gradient algorithm when the underlying matrix is a random positive definite perturbation of a deterministic positive definite matrix. We estimate all finite moments of a natural halting time when the random perturbation is drawn from the Laguerre unitary ensemble in a critical scaling regime explored in Deift et al. (2016). These estimates are used to analyze the expected iteration count in the framework of smoothed analysis, introduced by Spielman and Teng (2001). The rigorous results are compared with numerical calculations in several cases of interest.
NASep 8, 2023
Riemannian Langevin Monte Carlo schemes for sampling PSD matrices with fixed rankTianmin Yu, Shixin Zheng, Jianfeng Lu et al.
This paper introduces two explicit schemes to sample matrices from Gibbs distributions on $\mathcal S^{n,p}_+$, the manifold of real positive semi-definite (PSD) matrices of size $n\times n$ and rank $p$. Given an energy function $\mathcal E:\mathcal S^{n,p}_+\to \mathbb{R}$ and certain Riemannian metrics $g$ on $\mathcal S^{n,p}_+$, these schemes rely on an Euler-Maruyama discretization of the Riemannian Langevin equation (RLE) with Brownian motion on the manifold. We present numerical schemes for RLE under two fundamental metrics on $\mathcal S^{n,p}_+$: (a) the metric obtained from the embedding of $\mathcal S^{n,p}_+ \subset \mathbb{R}^{n\times n} $; and (b) the Bures-Wasserstein metric corresponding to quotient geometry. We also provide examples of energy functions with explicit Gibbs distributions that allow numerical validation of these schemes.
LGNov 3, 2025
Regularization Implies balancedness in the deep linear networkKathryn Lindsey, Govind Menon
We use geometric invariant theory (GIT) to study the deep linear network (DLN). The Kempf-Ness theorem is used to establish that the $L^2$ regularizer is minimized on the balanced manifold. This allows us to decompose the training dynamics into two distinct gradient flows: a regularizing flow on fibers and a learning flow on the balanced manifold. We show that the regularizing flow is exactly solvable using the moment map. This approach provides a common mathematical framework for balancedness in deep learning and linear systems theory. We use this framework to interpret balancedness in terms of model reduction and Bayesian principles.
PRFeb 12
On the implicit regularization of Langevin dynamics with projected noiseGovind Menon, Austin J. Stromme, Adrien Vacher
We study Langevin dynamics with noise projected onto the directions orthogonal to an isometric group action. This mathematical model is introduced to shed new light on the effects of symmetry on stochastic gradient descent for over-parametrized models. Our main result identifies a novel form of implicit regularization: when the initial and target density are both invariant under the group action, Langevin dynamics with projected noise is equivalent in law to Langevin dynamics with isotropic diffusion but with an additional drift term proportional to the negative log volume of the group orbit. We prove this result by constructing a coupling of the two processes via a third process on the group itself, and identify the additional drift as the mean curvature of the orbits.
LGSep 11, 2025
An entropy formula for the Deep Linear NetworkGovind Menon, Tianmin Yu
We study the Riemannian geometry of the Deep Linear Network (DLN) as a foundation for a thermodynamic description of the learning process. The main tools are the use of group actions to analyze overparametrization and the use of Riemannian submersion from the space of parameters to the space of observables. The foliation of the balanced manifold in the parameter space by group orbits is used to define and compute a Boltzmann entropy. We also show that the Riemannian geometry on the space of observables defined in [2] is obtained by Riemannian submersion of the balanced manifold. The main technical step is an explicit construction of an orthonormal basis for the tangent space of the balanced manifold using the theory of Jacobi matrices.