LGDSOCCOMLFeb 10, 2020

Logsmooth Gradient Concentration and Tighter Runtimes for Metropolized Hamiltonian Monte Carlo

arXiv:2002.04121v340 citations
AI Analysis

This provides a tighter runtime bound for a widely-used sampling algorithm in machine learning and statistics, addressing a bottleneck in prior analyses for efficient sampling from high-dimensional distributions.

The paper tackles the problem of analyzing the mixing time of Metropolized Hamiltonian Monte Carlo (HMC) for sampling from strongly logconcave distributions, showing that it mixes in $ ilde{O}(\kappa d)$ iterations, improving upon prior runtimes of $ ilde{O}(\kappa^{1.5}\sqrt{d} + \kappa d)$ by a factor of $(\kappa/d)^{1/2}$ when the condition number $\kappa$ is large.

We show that the gradient norm $\|\nabla f(x)\|$ for $x \sim \exp(-f(x))$, where $f$ is strongly convex and smooth, concentrates tightly around its mean. This removes a barrier in the prior state-of-the-art analysis for the well-studied Metropolized Hamiltonian Monte Carlo (HMC) algorithm for sampling from a strongly logconcave distribution. We correspondingly demonstrate that Metropolized HMC mixes in $\tilde{O}(κd)$ iterations, improving upon the $\tilde{O}(κ^{1.5}\sqrt{d} + κd)$ runtime of (Dwivedi et. al. '18, Chen et. al. '19) by a factor $(κ/d)^{1/2}$ when the condition number $κ$ is large. Our mixing time analysis introduces several techniques which to our knowledge have not appeared in the literature and may be of independent interest, including restrictions to a nonconvex set with good conductance behavior, and a new reduction technique for boosting a constant-accuracy total variation guarantee under weak warmness assumptions. This is the first high-accuracy mixing time result for logconcave distributions using only first-order function information which achieves linear dependence on $κ$; we also give evidence that this dependence is likely to be necessary for standard Metropolized first-order methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes