Almost Bayesian: The Fractal Dynamics of Stochastic Gradient Descent
This work addresses a foundational problem in machine learning theory by linking SGD to Bayesian principles, which is incremental as it builds on existing understanding of optimization dynamics.
The paper tackles the relationship between stochastic gradient descent (SGD) and Bayesian statistics by showing that SGD behaves as diffusion on a fractal landscape, interpretable as a modified Bayesian sampler that accounts for accessibility constraints. The results are verified experimentally through weight diffusion analysis, offering insights into the learning process and connecting SGD to Bayesian sampling.
We show that the behavior of stochastic gradient descent is related to Bayesian statistics by showing that SGD is effectively diffusion on a fractal landscape, where the fractal dimension can be accounted for in a purely Bayesian way. By doing this we show that SGD can be regarded as a modified Bayesian sampler which accounts for accessibility constraints induced by the fractal structure of the loss landscape. We verify our results experimentally by examining the diffusion of weights during training. These results offer insight into the factors which determine the learning process, and seemingly answer the question of how SGD and purely Bayesian sampling are related.