CLApr 17, 2025

ImPart: Importance-Aware Delta-Sparsification for Improved Model Compression and Merging in LLMs

Yan Yang, Yixia Li, Hongru Wang, Xuetao Wei, Jianqiao Yu, Yun Chen, Guanhua Chen

arXiv:2504.13237v15 citationsh-index: 6Has CodeACL

Originality Highly original

AI Analysis

This work addresses resource challenges in deploying multiple task-specific LLMs, offering an incremental improvement in delta compression and merging techniques.

The paper tackles the problem of compressing task-specific large language models by introducing ImPart, an importance-aware delta sparsification method that dynamically adjusts sparsity ratios based on singular vector importance, achieving a 2x higher compression ratio than baselines while maintaining performance.

With the proliferation of task-specific large language models, delta compression has emerged as a method to mitigate the resource challenges of deploying numerous such models by effectively compressing the delta model parameters. Previous delta-sparsification methods either remove parameters randomly or truncate singular vectors directly after singular value decomposition (SVD). However, these methods either disregard parameter importance entirely or evaluate it with too coarse a granularity. In this work, we introduce ImPart, a novel importance-aware delta sparsification approach. Leveraging SVD, it dynamically adjusts sparsity ratios of different singular vectors based on their importance, effectively retaining crucial task-specific knowledge even at high sparsity ratios. Experiments show that ImPart achieves state-of-the-art delta sparsification performance, demonstrating $2\times$ higher compression ratio than baselines at the same performance level. When integrated with existing methods, ImPart sets a new state-of-the-art on delta quantization and model merging.

View on arXiv PDF Code

Similar