CL LGMay 20, 2025

Saten: Sparse Augmented Tensor Networks for Post-Training Compression of Large Language Models

Ryan Solgi, Kai Zhen, Rupak Vignesh Swaminathan, Nathan Susanj, Athanasios Mouchtaris, Siegfried Kunzmann, Zheng Zhang

arXiv:2505.14871v26.72 citationsh-index: 9EMNLP

Originality Incremental advance

AI Analysis

This addresses the problem of efficient LLM deployment for users with limited resources, representing an incremental improvement in compression methods.

The paper tackles the challenge of compressing pre-trained large language models for deployment on resource-constrained devices by proposing Sparse Augmented Tensor Networks (Saten), which enhance accuracy and compression efficiency, achieving state-of-the-art performance.

The efficient implementation of large language models (LLMs) is crucial for deployment on resource-constrained devices. Low-rank tensor compression techniques, such as tensor-train (TT) networks, have been widely studied for over-parameterized neural networks. However, their applications to compress pre-trained large language models (LLMs) for downstream tasks (post-training) remains challenging due to the high-rank nature of pre-trained LLMs and the lack of access to pretraining data. In this study, we investigate low-rank tensorized LLMs during fine-tuning and propose sparse augmented tensor networks (Saten) to enhance their performance. The proposed Saten framework enables full model compression. Experimental results demonstrate that Saten enhances both accuracy and compression efficiency in tensorized language models, achieving state-of-the-art performance.

View on arXiv PDF

Similar