CL LGOct 7, 2025

Activation-Informed Pareto-Guided Low-Rank Compression for Efficient LLM/VLM

Ryan Solgi, Parsa Madinei, Jiayi Tian, Rupak Swaminathan, Jing Liu, Nathan Susanj, Zheng Zhang

arXiv:2510.05544v16.72 citationsh-index: 9

Originality Highly original

AI Analysis

This addresses efficiency problems for users deploying LLMs/VLMs, with incremental improvements in compression methods.

The paper tackles the memory and computing challenges of deploying large language and vision-language models by proposing a novel low-rank compression framework, achieving better accuracy at the same compression levels and inference speedup.

Large language models (LLM) and vision-language models (VLM) have achieved state-of-the-art performance, but they impose significant memory and computing challenges in deployment. We present a novel low-rank compression framework to address this challenge. First, we upper bound the change of network loss via layer-wise activation-based compression errors, filling a theoretical gap in the literature. We then formulate low-rank model compression as a bi-objective optimization and prove that a single uniform tolerance yields surrogate Pareto-optimal heterogeneous ranks. Based on our theoretical insights, we propose Pareto-Guided Singular Value Decomposition (PGSVD), a zero-shot pipeline that improves activation-aware compression via Pareto-guided rank selection and alternating least-squares implementation. We apply PGSVD to both LLM and VLM, showing better accuracy at the same compression levels and inference speedup.

View on arXiv PDF

Similar