LGCVFeb 1, 2025

Beyond the Permutation Symmetry of Transformers: The Role of Rotation for Model Fusion

arXiv:2502.00264v214 citationsh-index: 32Has CodeICML
Originality Highly original
AI Analysis

This work addresses the challenge of efficiently combining pre-trained models for practitioners in machine learning, offering a novel symmetry-based approach that enhances fusion outcomes.

The paper tackles the problem of model fusion in transformers by introducing rotation symmetry, a continuous generalization of permutation symmetry, and proposes an optimal parameter matching algorithm that significantly improves fusion performance across diverse NLP and vision tasks.

Symmetry in the parameter space of deep neural networks (DNNs) has proven beneficial for various deep learning applications. A well-known example is the permutation symmetry in Multi-Layer Perceptrons (MLPs), where permuting the rows of weight matrices in one layer and applying the inverse permutation to adjacent layers yields a functionally equivalent model. While permutation symmetry fully characterizes the equivalence set for MLPs, its discrete nature limits its utility for transformers. In this paper, we introduce rotation symmetry, a novel form of parameter space symmetry for transformers that generalizes permutation symmetry by rotating parameter matrices in self-attention layers. Unlike permutation symmetry, rotation symmetry operates in a continuous domain, thereby significantly expanding the equivalence set for transformers. Based on this property, we propose a theoretically optimal parameter matching algorithm as a plug-and-play module to enhance model fusion. We evaluate our approach using pre-trained transformers across diverse natural language and vision tasks. Experimental results demonstrate that our rotation symmetry-based matching algorithm substantially improves model fusion, highlighting the potential of parameter space symmetry to facilitate model fusion. Our code is available on https://github.com/zhengzaiyi/RotationSymmetry.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes