Black holes and the loss landscape in machine learning
This work provides a novel theoretical analogy for analyzing loss landscapes in machine learning, potentially offering insights for optimization, but it is incremental as it applies an existing physical concept to a known problem without new empirical results.
The paper tackles the problem of understanding the loss landscape in machine learning by drawing an analogy to black holes in string theory, which exhibit exponentially many low-lying local minima similar to neural networks, and finds that Stochastic Gradient Descent can discover a significant fraction of these minima.
Understanding the loss landscape is an important problem in machine learning. One key feature of the loss function, common to many neural network architectures, is the presence of exponentially many low lying local minima. Physical systems with similar energy landscapes may provide useful insights. In this work, we point out that black holes naturally give rise to such landscapes, owing to the existence of black hole entropy. For definiteness, we consider 1/8 BPS black holes in $\mathcal{N} = 8$ string theory. These provide an infinite family of potential landscapes arising in the microscopic descriptions of corresponding black holes. The counting of minima amounts to black hole microstate counting. Moreover, the exact numbers of the minima for these landscapes are a priori known from dualities in string theory. Some of the minima are connected by paths of low loss values, resembling mode connectivity. We estimate the number of runs needed to find all the solutions. Initial explorations suggest that Stochastic Gradient Descent can find a significant fraction of the minima.