MLJun 20, 2022
Label noise (stochastic) gradient descent implicitly solves the Lasso for quadratic parametrisationLoucas Pillaud-Vivien, Julien Reygner, Nicolas Flammarion
Understanding the implicit bias of training algorithms is of crucial importance in order to explain the success of overparametrised neural networks. In this paper, we study the role of the label noise in the training dynamics of a quadratically parametrised model through its continuous time version. We explicitly characterise the solution chosen by the stochastic flow and prove that it implicitly solves a Lasso program. To fully complete our analysis, we provide nonasymptotic convergence guarantees for the dynamics as well as conditions for support recovery. We also give experimental results which support our theoretical claims. Our findings highlight the fact that structured noise can induce better generalisation and help explain the greater performances of stochastic dynamics as observed in practice.
APNov 10, 2015
Optimal convergence rate of the multitype sticky particle approximation of one-dimensional diagonal hyperbolic systems with monotonic initial dataBenjamin Jourdain, Julien Reygner
Brenier and Grenier [SIAM J. Numer. Anal., 1998] proved that sticky particle dynamics with a large number of particles allow to approximate the entropy solution to scalar one-dimensional conservation laws with monotonic initial data. In [arXiv:1501.01498], we introduced a multitype version of this dynamics and proved that the associated empirical cumulative distribution functions converge to the viscosity solution, in the sense of Bianchini and Bres-san [Ann. of Math. (2), 2005], of one-dimensional diagonal hyperbolic systems with monotonic initial data of arbitrary finite variation. In the present paper, we analyse the L 1 error of this approximation procedure, by splitting it into the discretisation error of the initial data and the non-entropicity error induced by the evolution of the particle system. We prove that the error at time t is bounded from above by a term of order (1 + t)/n, where n denotes the number of particles, and give an example showing that this rate is optimal. We last analyse the additional error introduced when replacing the multitype sticky particle dynamics by an iterative scheme based on the typewise sticky particle dynamics, and illustrate the convergence of this scheme by numerical simulations.
STOct 19, 2020
Reweighting samples under covariate shift using a Wasserstein distance criterionJulien Reygner, Adrien Touboul
Considering two random variables with different laws to which we only have access through finite size iid samples, we address how to reweight the first sample so that its empirical distribution converges towards the true law of the second sample as the size of both samples goes to infinity. We study an optimal reweighting that minimizes the Wasserstein distance between the empirical measures of the two samples, and leads to an expression of the weights in terms of Nearest Neighbors. The consistency and some asymptotic convergence rates in terms of expected Wasserstein distance are derived, and do not need the assumption of absolute continuity of one random variable with respect to the other. These results have some application in Uncertainty Quantification for decoupled estimation and in the bound of the generalization error for the Nearest Neighbor Regression under covariate shift.