Wasserstein Barycenter-based Model Fusion and Linear Mode Connectivity of Neural Networks
This work addresses model fusion and connectivity in neural networks, offering a versatile framework that is incremental in extending barycenter concepts to diverse architectures.
The authors tackled the problem of neural network model fusion and linear mode connectivity by proposing a unified mathematical framework based on Wasserstein and Gromov-Wasserstein barycenters, enabling layer-wise fusion across various architectures like CNNs and ResNets, and providing empirical evidence that SGD solutions lie on the same loss basin after weight permutation.
Based on the concepts of Wasserstein barycenter (WB) and Gromov-Wasserstein barycenter (GWB), we propose a unified mathematical framework for neural network (NN) model fusion and utilize it to reveal new insights about the linear mode connectivity of SGD solutions. In our framework, the fusion occurs in a layer-wise manner and builds on an interpretation of a node in a network as a function of the layer preceding it. The versatility of our mathematical framework allows us to talk about model fusion and linear mode connectivity for a broad class of NNs, including fully connected NN, CNN, ResNet, RNN, and LSTM, in each case exploiting the specific structure of the network architecture. We present extensive numerical experiments to: 1) illustrate the strengths of our approach in relation to other model fusion methodologies and 2) from a certain perspective, provide new empirical evidence for recent conjectures which say that two local minima found by gradient-based methods end up lying on the same basin of the loss landscape after a proper permutation of weights is applied to one of the models.