LGAIDec 11, 2024

Revisiting Weight Averaging for Model Merging

arXiv:2412.12153v222 citationsh-index: 5
AI Analysis

This work addresses the challenge of building multi-task learners efficiently for AI practitioners, but it is incremental as it builds on existing weight averaging methods.

The paper tackled the problem of suboptimal performance in model merging due to task interference by showing that weight averaging implicitly induces task vectors, and applying low-rank approximation to centered vectors improves merging, with robust performance on vision benchmarks and competitive results in NLP tasks.

Model merging aims to build a multi-task learner by combining the parameters of individually fine-tuned models without additional training. While a straightforward approach is to average model parameters across tasks, this often results in suboptimal performance due to interference among parameters across tasks. In this paper, we present intriguing results that weight averaging implicitly induces task vectors centered around the weight averaging itself and that applying a low-rank approximation to these centered task vectors significantly improves merging performance. Our analysis shows that centering the task vectors effectively reduces task interference and most of task-specific knowledge is concentrated in the top singular vectors. Our method demonstrates robust and scalable performance on vision benchmarks across varying numbers of tasks and model sizes. Furthermore, we observe that our approach is applicable to natural language processing tasks with competitive performance.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes