Govind Menon

h-index2

8papers

78citations

Novelty56%

AI Score46

Ranked #60,808 of 205,806 authors (top 30%)#268 in NA (top 8%)

8 Papers

NAJul 16, 2014

Universality in Numerical Computations with Random Data. Case Studies

Percy Deift, Govind Menon, Sheehan Olver et al.

The authors present evidence for universality in numerical computations with random data. Given a (possibly stochastic) numerical algorithm with random input data, the time (or number of iterations) to convergence (within a given tolerance) is a random variable, called the halting time. Two-component universality is observed for the fluctuations of the halting time, i.e., the histogram for the halting times, centered by the sample average and scaled by the sample variance, collapses to a universal curve, independent of the input data distribution, as the dimension increases. Thus, up to two components, the sample average and the sample variance, the statistics for the halting time are universally prescribed. The case studies include six standard numerical algorithms, as well as a model of neural computation and decision making. A link to relevant software is provided in for the reader who would like to do computations of his'r own.

NAMay 15, 2013

How long does it take to compute the eigenvalues of a random symmetric matrix?

Christian W. Pfrang, Percy Deift, Govind Menon

We present the results of an empirical study of the performance of the QR algorithm (with and without shifts) and the Toda algorithm on random symmetric matrices. The random matrices are chosen from six ensembles, four of which lie in the Wigner class. For all three algorithms, we observe a form of universality for the deflation time statistics for random matrices within the Wigner class. For these ensembles, the empirical distribution of a normalized deflation time is found to collapse onto a curve that depends only on the algorithm, but not on the matrix size or deflation tolerance provided the matrix size is large enough (see Figure 4, Figure 7 and Figure 10). For the QR algorithm with the Wilkinson shift, the observed universality is even stronger and includes certain non-Wigner ensembles. Our experiments also provide a quantitative statistical picture of the accelerated convergence with shifts.

DSOct 22, 2022

Deep Linear Networks for Matrix Completion -- An Infinite Depth Limit

Nadav Cohen, Govind Menon, Zsolt Veraszto

The deep linear network (DLN) is a model for implicit regularization in gradient based optimization of overparametrized learning architectures. Training the DLN corresponds to a Riemannian gradient flow, where the Riemannian metric is defined by the architecture of the network and the loss function is defined by the learning task. We extend this geometric framework, obtaining explicit expressions for the volume form, including the case when the network has infinite depth. We investigate the link between the Riemannian geometry and the training asymptotics for matrix completion with rigorous analysis and numerics. We propose that implicit regularization is a result of bias towards high state space volume.

NANov 6, 2016

Smoothed Analysis for the Conjugate Gradient Algorithm

Govind Menon, Thomas Trogdon

The purpose of this paper is to establish bounds on the rate of convergence of the conjugate gradient algorithm when the underlying matrix is a random positive definite perturbation of a deterministic positive definite matrix. We estimate all finite moments of a natural halting time when the random perturbation is drawn from the Laguerre unitary ensemble in a critical scaling regime explored in Deift et al. (2016). These estimates are used to analyze the expected iteration count in the framework of smoothed analysis, introduced by Spielman and Teng (2001). The rigorous results are compared with numerical calculations in several cases of interest.

NASep 8, 2023

Riemannian Langevin Monte Carlo schemes for sampling PSD matrices with fixed rank

Tianmin Yu, Shixin Zheng, Jianfeng Lu et al.

This paper introduces two explicit schemes to sample matrices from Gibbs distributions on $\mathcal S^{n,p}_+$, the manifold of real positive semi-definite (PSD) matrices of size $n\times n$ and rank $p$. Given an energy function $\mathcal E:\mathcal S^{n,p}_+\to \mathbb{R}$ and certain Riemannian metrics $g$ on $\mathcal S^{n,p}_+$, these schemes rely on an Euler-Maruyama discretization of the Riemannian Langevin equation (RLE) with Brownian motion on the manifold. We present numerical schemes for RLE under two fundamental metrics on $\mathcal S^{n,p}_+$: (a) the metric obtained from the embedding of $\mathcal S^{n,p}_+ \subset \mathbb{R}^{n\times n} $; and (b) the Bures-Wasserstein metric corresponding to quotient geometry. We also provide examples of energy functions with explicit Gibbs distributions that allow numerical validation of these schemes.

LGNov 3, 2025

Regularization Implies balancedness in the deep linear network

Kathryn Lindsey, Govind Menon

We use geometric invariant theory (GIT) to study the deep linear network (DLN). The Kempf-Ness theorem is used to establish that the $L^2$ regularizer is minimized on the balanced manifold. This allows us to decompose the training dynamics into two distinct gradient flows: a regularizing flow on fibers and a learning flow on the balanced manifold. We show that the regularizing flow is exactly solvable using the moment map. This approach provides a common mathematical framework for balancedness in deep learning and linear systems theory. We use this framework to interpret balancedness in terms of model reduction and Bayesian principles.

PRFeb 12

On the implicit regularization of Langevin dynamics with projected noise

Govind Menon, Austin J. Stromme, Adrien Vacher

We study Langevin dynamics with noise projected onto the directions orthogonal to an isometric group action. This mathematical model is introduced to shed new light on the effects of symmetry on stochastic gradient descent for over-parametrized models. Our main result identifies a novel form of implicit regularization: when the initial and target density are both invariant under the group action, Langevin dynamics with projected noise is equivalent in law to Langevin dynamics with isotropic diffusion but with an additional drift term proportional to the negative log volume of the group orbit. We prove this result by constructing a coupling of the two processes via a third process on the group itself, and identify the additional drift as the mean curvature of the orbits.

LGSep 11, 2025

An entropy formula for the Deep Linear Network

Govind Menon, Tianmin Yu

We study the Riemannian geometry of the Deep Linear Network (DLN) as a foundation for a thermodynamic description of the learning process. The main tools are the use of group actions to analyze overparametrization and the use of Riemannian submersion from the space of parameters to the space of observables. The foliation of the balanced manifold in the parameter space by group orbits is used to define and compute a Boltzmann entropy. We also show that the Riemannian geometry on the space of observables defined in [2] is obtained by Riemannian submersion of the balanced manifold. The main technical step is an explicit construction of an orthonormal basis for the tangent space of the balanced manifold using the theory of Jacobi matrices.