PRDSLGMLFeb 13, 2020

Fast Convergence for Langevin Diffusion with Manifold Structure

arXiv:2002.05576v26 citations
AI Analysis

This addresses sampling challenges in machine learning applications like Bayesian inference, though it is an incremental step towards understanding SGD in symmetric settings.

The paper tackles the problem of sampling from nonconvex distributions with symmetries, such as those in Bayesian matrix factorization, by analyzing Langevin diffusion's mixing time based on manifold geometry, showing it can be fast despite nonconvexity.

In this paper, we study the problem of sampling from distributions of the form p(x) \propto e^{-βf(x)} for some function f whose values and gradients we can query. This mode of access to f is natural in the scenarios in which such problems arise, for instance sampling from posteriors in parametric Bayesian models. Classical results show that a natural random walk, Langevin diffusion, mixes rapidly when f is convex. Unfortunately, even in simple examples, the applications listed above will entail working with functions f that are nonconvex -- for which sampling from p may in general require an exponential number of queries. In this paper, we focus on an aspect of nonconvexity relevant for modern machine learning applications: existence of invariances (symmetries) in the function f, as a result of which the distribution p will have manifolds of points with equal probability. First, we give a recipe for proving mixing time bounds for Langevin diffusion as a function of the geometry of these manifolds. Second, we specialize our arguments to classic matrix factorization-like Bayesian inference problems where we get noisy measurements A(XX^T), X \in R^{d \times k} of a low-rank matrix, i.e. f(X) = \|A(XX^T) - b\|^2_2, X \in R^{d \times k}, and βthe inverse of the variance of the noise. Such functions f are invariant under orthogonal transformations, and include problems like matrix factorization, sensing, completion. Beyond sampling, Langevin dynamics is a popular toy model for studying stochastic gradient descent. Along these lines, we believe that our work is an important first step towards understanding how SGD behaves when there is a high degree of symmetry in the space of parameters the produce the same output.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes