CLLGOct 7, 2025

Activation-Informed Pareto-Guided Low-Rank Compression for Efficient LLM/VLM

arXiv:2510.05544v12 citationsh-index: 9
Originality Highly original
AI Analysis

This addresses efficiency problems for users deploying LLMs/VLMs, with incremental improvements in compression methods.

The paper tackles the memory and computing challenges of deploying large language and vision-language models by proposing a novel low-rank compression framework, achieving better accuracy at the same compression levels and inference speedup.

Large language models (LLM) and vision-language models (VLM) have achieved state-of-the-art performance, but they impose significant memory and computing challenges in deployment. We present a novel low-rank compression framework to address this challenge. First, we upper bound the change of network loss via layer-wise activation-based compression errors, filling a theoretical gap in the literature. We then formulate low-rank model compression as a bi-objective optimization and prove that a single uniform tolerance yields surrogate Pareto-optimal heterogeneous ranks. Based on our theoretical insights, we propose Pareto-Guided Singular Value Decomposition (PGSVD), a zero-shot pipeline that improves activation-aware compression via Pareto-guided rank selection and alternating least-squares implementation. We apply PGSVD to both LLM and VLM, showing better accuracy at the same compression levels and inference speedup.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes