LG AIJul 4, 2025

MGAA: Multi-Granular Adaptive Allocation fof Low-Rank Compression of LLMs

Guangyan Li, Yongqiang Tang, Wensheng Zhang

arXiv:2507.03294v14.1h-index: 16

Originality Incremental advance

AI Analysis

This addresses the computational resource demands for deploying LLMs, offering a more efficient compression method, though it is incremental as it builds on existing low-rank techniques.

The paper tackles the problem of inefficient uniform compression in low-rank approximation for large language models (LLMs) by proposing a Multi-Granular Adaptive Allocation (MGAA) method that adaptively allocates parameters between and within sublayers, resulting in superior performance across multiple LLMs and benchmark datasets, with notable improvements in multimodal models like LLaVA.

The enormous parameter scale of large language models (LLMs) has made model compression a research hotspot, which aims to alleviate computational resource demands during deployment and inference. As a promising direction, low-rank approximation technique has made remarkable achievements. Nevertheless, unfortunately, the vast majority of studies to low-rank approximation compression generally apply uniform compression ratios across all weight matrices, while disregarding their inherently differentiated impacts on the model's performance. Although a few recent work attempts to employ heuristic search strategies to achieve the optimal parameter allocation, such strategies are computationally inefficient and lose the generalization ability in the era of LLMs. In this study, we propose a novel parameter Multi-Granular Adaptive Allocation (MGAA) method, which can adaptively allocate parameters between and within sublayers without task-specific evaluations in the compression process. MGAA consists of two components: 1) Among different sublayers, it assigns compression ratios based on their cosine similarity between inputs and outputs, allowing for a more tailored compression in sublayers with varying degrees of importance, and 2) Within each sublayer, it allocates different compression ratios to weight matrices based on their energy distribution characteristics, ensuring a consistent energy retention ratio while optimizing compression efficiency. Comprehensive evaluations of MGAA across multiple LLMs backbone models and benchmark datasets demonstrate its superior performance. Additionally, we apply our MGAA to multimodal model LLaVA, exhibiting remarkable performance improvements.

View on arXiv PDF

Similar