Universal halting times in optimization and machine learning
This work provides empirical evidence for universal halting time distributions in optimization, which could impact algorithm design and analysis in machine learning and physics, though it appears incremental as it builds on existing concepts of universality.
The authors investigated the halting time distributions of optimization algorithms on random systems like spin glasses and deep learning, finding that after centering and scaling, these distributions remain unchanged across different landscape distributions, with two qualitative classes observed: Gumbel-like and Gaussian-like.
The authors present empirical distributions for the halting time (measured by the number of iterations to reach a given accuracy) of optimization algorithms applied to two random systems: spin glasses and deep learning. Given an algorithm, which we take to be both the optimization routine and the form of the random landscape, the fluctuations of the halting time follow a distribution that, after centering and scaling, remains unchanged even when the distribution on the landscape is changed. We observe two qualitative classes: A Gumbel-like distribution that appears in Google searches, human decision times, the QR eigenvalue algorithm and spin glasses, and a Gaussian-like distribution that appears in conjugate gradient method, deep network with MNIST input data and deep network with random input data. This empirical evidence suggests presence of a class of distributions for which the halting time is independent of the underlying distribution under some conditions.