Improving Model Fusion by Training-time Neuron Alignment with Fixed Neuron Anchors
This work addresses model fusion challenges for applications like federated learning and foundation models, offering a novel training-time approach that is incremental but effective in heterogeneous settings.
The paper tackles the problem of model fusion hindered by neuron permutation differences, introducing TNA-PFN, a training-time alignment method that uses fixed neuron anchors to reduce permutation issues, improving linear mode connectivity and achieving state-of-the-art results in federated learning with methods like FedPFN and FedPNU.
Model fusion aims to integrate several deep neural network (DNN) models' knowledge into one by fusing parameters, and it has promising applications, such as improving the generalization of foundation models and parameter averaging in federated learning. However, models under different settings (data, hyperparameter, etc.) have diverse neuron permutations; in other words, from the perspective of loss landscape, they reside in different loss basins, thus hindering model fusion performances. To alleviate this issue, previous studies highlighted the role of permutation invariance and have developed methods to find correct network permutations for neuron alignment after training. Orthogonal to previous attempts, this paper studies training-time neuron alignment, improving model fusion without the need for post-matching. Training-time alignment is cheaper than post-alignment and is applicable in various model fusion scenarios. Starting from fundamental hypotheses and theorems, a simple yet lossless algorithm called TNA-PFN is introduced. TNA-PFN utilizes partially fixed neuron weights as anchors to reduce the potential of training-time permutations, and it is empirically validated in reducing the barriers of linear mode connectivity and multi-model fusion. It is also validated that TNA-PFN can improve the fusion of pretrained models under the setting of model soup (vision transformers) and ColD fusion (pretrained language models). Based on TNA-PFN, two federated learning methods, FedPFN and FedPNU, are proposed, showing the prospects of training-time neuron alignment. FedPFN and FedPNU reach state-of-the-art performances in federated learning under heterogeneous settings and can be compatible with the server-side algorithm.