CLAIFeb 5

Transport and Merge: Cross-Architecture Merging for Large Language Models

arXiv:2602.05495v2h-index: 28
AI Analysis

This work is significant for practitioners deploying LLMs in resource-constrained environments, enabling knowledge transfer to smaller, architecturally diverse models.

This paper addresses the challenge of transferring knowledge from large, high-resource language models to smaller, low-resource models with different architectures. It proposes a cross-architecture merging framework using optimal transport to align activations and guide weight-space fusion, leading to consistent improvements in low-resource languages and specialized domains.

Large language models (LLMs) achieve strong capabilities by scaling model capacity and training data, yet many real-world deployments rely on smaller models trained or adapted from low-resource data. This gap motivates the need for mechanisms to transfer knowledge from large, high-resource models to smaller, low-resource targets. While model merging provides an effective transfer mechanism, most existing approaches assume architecture-compatible models and therefore cannot directly transfer knowledge from large high-resource LLMs to heterogeneous low-resource targets. In this work, we propose a cross-architecture merging framework based on optimal transport (OT) that aligns activations to infer cross-neuron correspondences between heterogeneous models. The resulting transport plans are then used to guide direct weight-space fusion, enabling effective high-resource to low-resource transfer using only a small set of inputs. Extensive experiments across low-resource languages and specialized domains demonstrate consistent improvements over target models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes