LGMay 28, 2025

Update Your Transformer to the Latest Release: Re-Basin of Task Vectors

arXiv:2505.22697v117 citationsh-index: 20Has CodeICML
Originality Incremental advance
AI Analysis

This addresses the inefficiency of retraining fine-tuned models when base models are updated, offering a practical solution for AI practitioners, though it is incremental as it builds on existing re-basin techniques.

The paper tackles the problem of transferring fine-tuned models to updated pre-trained backbones without retraining, proposing a data-free method based on weight permutations and spectral theory to achieve seamless transfer across visual and textual tasks.

Foundation models serve as the backbone for numerous specialized models developed through fine-tuning. However, when the underlying pretrained model is updated or retrained (e.g., on larger and more curated datasets), the fine-tuned model becomes obsolete, losing its utility and requiring retraining. This raises the question: is it possible to transfer fine-tuning to a new release of the model? In this work, we investigate how to transfer fine-tuning to a new checkpoint without having to re-train, in a data-free manner. To do so, we draw principles from model re-basin and provide a recipe based on weight permutations to re-base the modifications made to the original base model, often called task vector. In particular, our approach tailors model re-basin for Transformer models, taking into account the challenges of residual connections and multi-head attention layers. Specifically, we propose a two-level method rooted in spectral theory, initially permuting the attention heads and subsequently adjusting parameters within select pairs of heads. Through extensive experiments on visual and textual tasks, we achieve the seamless transfer of fine-tuned knowledge to new pre-trained backbones without relying on a single training step or datapoint. Code is available at https://github.com/aimagelab/TransFusion.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes