LGAIMar 12, 2024

Do Deep Neural Network Solutions Form a Star Domain?

arXiv:2403.07968v25 citationsh-index: 35Has CodeICLR
AI Analysis

This work addresses a foundational problem in machine learning by providing insights into the structure of neural network optimization landscapes, which could impact training and generalization methods, though it appears incremental as it builds on prior convexity conjectures.

The authors tackled the problem of understanding the geometry of neural network solution sets reachable via SGD, conjecturing that these sets form a 'star domain' with a central 'star model' linearly connected to all other solutions. They proposed the Starlight algorithm to find such a star model and validated it by showing linear connectivity with independently found solutions, also demonstrating improved uncertainty estimates and potential as substitutes for model ensembles.

It has recently been conjectured that neural network solution sets reachable via stochastic gradient descent (SGD) are convex, considering permutation invariances (Entezari et al., 2022). This means that a linear path can connect two independent solutions with low loss, given the weights of one of the models are appropriately permuted. However, current methods to test this theory often require very wide networks to succeed. In this work, we conjecture that more generally, the SGD solution set is a "star domain" that contains a "star model" that is linearly connected to all the other solutions via paths with low loss values, modulo permutations. We propose the Starlight algorithm that finds a star model of a given learning task. We validate our claim by showing that this star model is linearly connected with other independently found solutions. As an additional benefit of our study, we demonstrate better uncertainty estimates on the Bayesian Model Averaging over the obtained star domain. Further, we demonstrate star models as potential substitutes for model ensembles. Our code is available at https://github.com/aktsonthalia/starlight.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes