LGOct 13, 2022

Wasserstein Barycenter-based Model Fusion and Linear Mode Connectivity of Neural Networks

arXiv:2210.06671v116 citationsh-index: 18
Originality Incremental advance
AI Analysis

This work addresses model fusion and connectivity in neural networks, offering a versatile framework that is incremental in extending barycenter concepts to diverse architectures.

The authors tackled the problem of neural network model fusion and linear mode connectivity by proposing a unified mathematical framework based on Wasserstein and Gromov-Wasserstein barycenters, enabling layer-wise fusion across various architectures like CNNs and ResNets, and providing empirical evidence that SGD solutions lie on the same loss basin after weight permutation.

Based on the concepts of Wasserstein barycenter (WB) and Gromov-Wasserstein barycenter (GWB), we propose a unified mathematical framework for neural network (NN) model fusion and utilize it to reveal new insights about the linear mode connectivity of SGD solutions. In our framework, the fusion occurs in a layer-wise manner and builds on an interpretation of a node in a network as a function of the layer preceding it. The versatility of our mathematical framework allows us to talk about model fusion and linear mode connectivity for a broad class of NNs, including fully connected NN, CNN, ResNet, RNN, and LSTM, in each case exploiting the specific structure of the network architecture. We present extensive numerical experiments to: 1) illustrate the strengths of our approach in relation to other model fusion methodologies and 2) from a certain perspective, provide new empirical evidence for recent conjectures which say that two local minima found by gradient-based methods end up lying on the same basin of the loss landscape after a proper permutation of weights is applied to one of the models.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes