CLAICVLGMAOct 16, 2024

Exploring Model Kinship for Merging Large Language Models

arXiv:2410.12613v35 citationsh-index: 37Has CodeEMNLP
Originality Incremental advance
AI Analysis

This work addresses the need for principled methods in model merging to enhance LLM capabilities, offering incremental improvements for the AI research community.

The paper tackles the problem of understanding and improving model merging for Large Language Models by introducing the concept of model kinship, showing it correlates with performance gains and proposing a new merging strategy that enhances benchmark results.

Model merging has emerged as a key technique for enhancing the capabilities and efficiency of Large Language Models (LLMs). The open-source community has driven model evolution by iteratively merging existing models, yet a principled understanding of the gains and underlying factors in model merging remains limited. In this work, we study model evolution through iterative merging, drawing an analogy to biological evolution, and introduce the concept of model kinship, the degree of similarity or relatedness between LLMs. Through comprehensive empirical analysis, we show that model kinship is closely linked to the performance improvements achieved by merging, providing a useful criterion for selecting candidate models. Building on this insight, we propose a new model merging strategy: Top-k Greedy Merging with Model Kinship, which can improve benchmark performance. Specifically, we discover that incorporating model kinship as a guiding criterion enables continuous merging while mitigating performance degradation caused by local optima, thereby facilitating more effective model evolution. Code is available at https://github.com/zjunlp/ModelKinship.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes