LGAICLOct 15, 2025

K-Merge: Online Continual Merging of Adapters for On-device Large Language Models

arXiv:2510.13537v12 citationsh-index: 19
Originality Incremental advance
AI Analysis

This addresses the problem of managing diverse tasks on resource-constrained mobile devices for users needing incremental LLM support, representing an incremental improvement over existing model merging techniques.

The paper tackles the challenge of on-device online continual merging of Low-Rank Adapters (LoRAs) for Large Language Models, where new adapters are incrementally added under storage constraints, and proposes a data-free, efficient strategy that outperforms alternatives in real-world tasks while adhering to device limitations.

On-device deployment of Large Language Models (LLMs) frequently leverages Low-Rank Adapters (LoRAs) to support diverse downstream tasks under tight resource constraints. To address the limited storage capacity of mobile devices, recent works have explored model merging techniques to fuse multiple LoRAs into a single one. In practice, however, LoRAs are often delivered incrementally, as users request support for new tasks (e.g., novel problem types or languages). This scenario introduces a new challenge: on-device online continual merging, where the objective is to incorporate new LoRAs while preserving the performance on previously supported tasks. In this paper, we propose a data-free and computationally efficient strategy for selecting and merging LoRAs when a new one becomes available, assuming the device can store only a limited number of adapters. Extensive experiments across real-world tasks demonstrate the superiority of our approach compared to alternative strategies while adhering to the storage budget and compute limitations of on-device settings.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes