LG AI CLJun 5, 2025

Enhancing Delta Compression in LLMs via SVD-based Quantization Error Minimization

Boya Xiong, Shuo Wang, Weifeng Ge, Guanhua Chen, Yun Chen

arXiv:2506.11087v24.1h-index: 3

Originality Incremental advance

AI Analysis

This work addresses the challenge of efficiently storing and serving multiple fine-tuned LLMs in multi-tenant scenarios, representing an incremental advance in delta compression techniques.

The paper tackles the problem of inadequate performance in delta compression for LLMs at high compression ratios by introducing DeltaMix, an adaptive mixed-precision framework that minimizes quantization error in SVD space, resulting in performance improvements of up to 22.3% over baselines on tasks like AIME2024 and GQA for 7B parameter models.

Fine-tuning is a crucial process for adapting large language models (LLMs) to diverse applications. In certain scenarios, like multi-tenant serving, a large number of LLMs finetuned from the same base model are deployed to meet complex requirements for users. Recent works explore delta-compression approaches to quantize and compress the delta weights between the customized LLM and the corresponding base model. However, they exhibit inadequate performance at high compression ratios due to their empirical nature. In this work, we introduce DeltaMix, an adaptive mixed-precision delta-compression framework designed to minimize quantization error in the singular value decomposition (SVD) space without imposing additional assumptions. DeltaMix provides a theoretical justification for the necessity of mixed-precision compression and presents a practical quantization solution that involves solving a 0/1 linear integer programming problem alongside a reconstruction target correction method. Experimental results across multiple models and benchmarks illustrate that DeltaMix consistently outperforms all baseline methods. Notably, on tasks such as AIME2024 and GQA, DeltaMix exceeds the performance of the best baseline, Delta-CoMe, by 22.3\% and 6.1\% for 7B parameter models, respectively.

View on arXiv PDF

Similar