LGJul 13, 2023

Layer-wise Linear Mode Connectivity

arXiv:2307.06966v322 citationsh-index: 18
Originality Incremental advance
AI Analysis

This addresses the challenge of fusing models trained on different datasets in federated learning, offering an incremental improvement over existing averaging methods.

The paper investigates layer-wise averaging of neural network parameters to improve model fusion, particularly in federated learning, and finds that deep networks exhibit layer-wise linear connectivity without barriers, enabling effective averaging.

Averaging neural network parameters is an intuitive method for fusing the knowledge of two independent models. It is most prominently used in federated learning. If models are averaged at the end of training, this can only lead to a good performing model if the loss surface of interest is very particular, i.e., the loss in the midpoint between the two models needs to be sufficiently low. This is impossible to guarantee for the non-convex losses of state-of-the-art networks. For averaging models trained on vastly different datasets, it was proposed to average only the parameters of particular layers or combinations of layers, resulting in better performing models. To get a better understanding of the effect of layer-wise averaging, we analyse the performance of the models that result from averaging single layers, or groups of layers. Based on our empirical and theoretical investigation, we introduce a novel notion of the layer-wise linear connectivity, and show that deep networks do not have layer-wise barriers between them.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes