OCMLJun 8, 2020

Halting Time is Predictable for Large Models: A Universality Property and Average-case Analysis

arXiv:2006.04299v331 citations
AI Analysis

This work addresses the problem of understanding typical algorithm performance in optimization for researchers and practitioners, offering a foundational insight that could simplify analysis, though it is incremental in extending universality concepts to new contexts.

The paper tackles the challenge of average-case analysis in optimization by showing that for large-scale problems like random least squares and one-hidden layer neural networks with random weights trained with first-order methods, the halting time is independent of the input probability distribution, exhibiting a universality property. It provides the first explicit average-case convergence rates, revealing a tighter complexity not captured by worst-case analysis.

Average-case analysis computes the complexity of an algorithm averaged over all possible inputs. Compared to worst-case analysis, it is more representative of the typical behavior of an algorithm, but remains largely unexplored in optimization. One difficulty is that the analysis can depend on the probability distribution of the inputs to the model. However, we show that this is not the case for a class of large-scale problems trained with first-order methods including random least squares and one-hidden layer neural networks with random weights. In fact, the halting time exhibits a universality property: it is independent of the probability distribution. With this barrier for average-case analysis removed, we provide the first explicit average-case convergence rates showing a tighter complexity not captured by traditional worst-case analysis. Finally, numerical simulations suggest this universality property holds for a more general class of algorithms and problems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes