Explorations on high dimensional landscapes
This addresses the challenge of optimization in high-dimensional spaces for researchers in machine learning and optimization, though it appears incremental by building on prior theoretical work.
The paper investigates the structure of critical points in high-dimensional non-convex functions, finding that these points concentrate in a narrow band of values, contrasting with low-dimensional cases, and demonstrates this phenomenon in deep networks using MNIST, with gradient descent methods reaching this level efficiently.
Finding minima of a real valued non-convex function over a high dimensional space is a major challenge in science. We provide evidence that some such functions that are defined on high dimensional domains have a narrow band of values whose pre-image contains the bulk of its critical points. This is in contrast with the low dimensional picture in which this band is wide. Our simulations agree with the previous theoretical work on spin glasses that proves the existence of such a band when the dimension of the domain tends to infinity. Furthermore our experiments on teacher-student networks with the MNIST dataset establish a similar phenomenon in deep networks. We finally observe that both the gradient descent and the stochastic gradient descent methods can reach this level within the same number of steps.