Generalization Bounds for Heavy-Tailed SDEs through the Fractional Fokker-Planck Equation
This work addresses a gap in understanding generalization for heavy-tailed stochastic optimization algorithms, which is incremental but provides improved theoretical guarantees for machine learning practitioners.
The paper tackles the problem of proving high-probability generalization bounds for heavy-tailed stochastic differential equations (SDEs) without non-computable information-theoretic terms, achieving bounds with better dimension dependence and identifying a phase transition where heavy tails can be beneficial or harmful depending on the problem structure.
Understanding the generalization properties of heavy-tailed stochastic optimization algorithms has attracted increasing attention over the past years. While illuminating interesting aspects of stochastic optimizers by using heavy-tailed stochastic differential equations as proxies, prior works either provided expected generalization bounds, or introduced non-computable information theoretic terms. Addressing these drawbacks, in this work, we prove high-probability generalization bounds for heavy-tailed SDEs which do not contain any nontrivial information theoretic terms. To achieve this goal, we develop new proof techniques based on estimating the entropy flows associated with the so-called fractional Fokker-Planck equation (a partial differential equation that governs the evolution of the distribution of the corresponding heavy-tailed SDE). In addition to obtaining high-probability bounds, we show that our bounds have a better dependence on the dimension of parameters as compared to prior art. Our results further identify a phase transition phenomenon, which suggests that heavy tails can be either beneficial or harmful depending on the problem structure. We support our theory with experiments conducted in a variety of settings.