Higher Order Generalization Error for First Order Discretization of Langevin Diffusion
This work addresses the computational efficiency of stochastic optimization methods for machine learning practitioners, but it is incremental as it builds on known results with new smoothness conditions.
The paper tackles the problem of analyzing generalization error for discretizations of Langevin diffusion, such as SGLD, and shows that under additional smoothness assumptions, first-order methods can achieve arbitrarily low runtime complexity, specifically requiring Ω(ε^{-1/N} log(ε^{-1})) iterations for ε expected generalization error.
We propose a novel approach to analyze generalization error for discretizations of Langevin diffusion, such as the stochastic gradient Langevin dynamics (SGLD). For an $ε$ tolerance of expected generalization error, it is known that a first order discretization can reach this target if we run $Ω(ε^{-1} \log (ε^{-1}) )$ iterations with $Ω(ε^{-1})$ samples. In this article, we show that with additional smoothness assumptions, even first order methods can achieve arbitrarily runtime complexity. More precisely, for each $N>0$, we provide a sufficient smoothness condition on the loss function such that a first order discretization can reach $ε$ expected generalization error given $Ω( ε^{-1/N} \log (ε^{-1}) )$ iterations with $Ω(ε^{-1})$ samples.