Maria Colombo

LG
4papers
35citations
Novelty57%
AI Score45

4 Papers

LGNov 29, 2022
Infinite-width limit of deep linear neural networks

Lénaïc Chizat, Maria Colombo, Xavier Fernández-Real et al.

This paper studies the infinite-width limit of deep linear neural networks initialized with random parameters. We obtain that, when the number of neurons diverges, the training dynamics converge (in a precise sense) to the dynamics obtained from a gradient descent on an infinitely wide deterministic linear neural network. Moreover, even if the weights remain random, we get their precise law along the training dynamics, and prove a quantitative convergence result of the linear predictor in terms of the number of neurons. We finally study the continuous-time limit obtained for infinitely wide linear neural networks and show that the linear predictors of the neural network converge at an exponential rate to the minimal $\ell_2$-norm minimizer of the risk.

APFeb 20, 2019
On the role of numerical viscosity in the study of the local limit of nonlocal conservation laws

Maria Colombo, Gianluca Crippa, Marie Graff et al.

We deal with the numerical investigation of the local limit of nonlocal conservation laws. Previous numerical experiments suggest convergence in the local limit. However, recent analytic results state that (i) in general convergence does not hold because one can exhibit counterexamples; (ii) convergence can be recovered provided viscosity is added to both the local and the nonlocal equations. Motivated by these analytic results, we investigate the role of numerical viscosity in the numerical study of the local limit of nonlocal conservation laws. In particular, we show that the numerical viscosity of Lax-Friedrichs type schemes jeopardizes the reliability of the numerical scheme and erroneously detects convergence in cases where convergence is ruled out by analytic results. We also test Godunov type schemes, less affected by numerical viscosity, and show that in some cases they provide more reliable results.

APMar 2
Quantitative Convergence of Wasserstein Gradient Flows of Kernel Mean Discrepancies

Lénaïc Chizat, Maria Colombo, Roberto Colombo et al.

We study the quantitative convergence of Wasserstein gradient flows of Kernel Mean Discrepancy (KMD) (also known as Maximum Mean Discrepancy (MMD)) functionals. Our setting covers in particular the training dynamics of shallow neural networks in the infinite-width and continuous time limit, as well as interacting particle systems with pairwise Riesz kernel interaction in the mean-field and overdamped limit. Our main analysis concerns the model case of KMD functionals given by the squared Sobolev distance $ \mathscr{E}^ν_{s}(μ)= \frac{1}{2}\lVert μ-ν\rVert_{\dot H^{-s}}^{2}$ for any $s\geq 1 $ and $ν$ a fixed probability measure on the $d$-dimensional torus. First, inspired by Yudovich theory for the $2d$-Euler equation, we establish existence and uniqueness in natural weak regularity classes. Next, we show that for $s=1$ the flow converges globally at an exponential rate under minimal assumptions, while for $s>1$ we prove local convergence at polynomial rates that depend explicitly on $s$ and on the Sobolev regularity of $μ$ and $ν$. These rates hold both at the energy level and in higher regularity classes and are tight for $ν$ uniform. We then consider the gradient flow of the population loss for shallow neural networks with ReLU activation, which can be cast as a Wasserstein--Fisher--Rao gradient flow on the space of nonnegative measures on the sphere $\mathbb{S}^d$. Exploiting a correspondence with the Sobolev energy case with $s=(d+3)/2$, we derive an explicit polynomial local convergence rate for this dynamics. Except for the special case $s=1$, even non-quantitative convergence was previously open in all these settings. We also include numerical experiments in dimension $d=1$ using both PDE and particle methods which illustrate our analysis.

MLMay 10
Quantitative Local Convergence of Mean-Field Stein Variational Gradient Flow

Lénaïc Chizat, Maria Colombo, Roberto Colombo et al.

Stein Variational Gradient Descent (SVGD) is a deterministic interacting-particle method for sampling from a target probability measure given access to its score function. In the mean-field and continuous-time limit, it is known that the flow converges weakly toward the target, but no quantitative rate is known for the last iterate. In this paper, we establish quantitative local convergence in strong norms for this dynamics, when the interaction kernel is of Riesz type on the $d$-dimensional torus. Specifically, assuming that the initial density and the target are smooth and close in $L^2$-norm, we obtain explicit polynomial convergence rates in $L^2$-norm that depend on the dimension and on the regularity parameters of the kernel, the initialization and the target. We further show that these rates are sharp in certain regimes, and support the theory with numerical experiments. In the edge case of kernels with a Coulomb singularity, we recover the global exponential convergence result established in prior work. Our analysis is inspired by recent results on Wasserstein gradient flows of kernel mean discrepancies.