LG AR CLMay 24, 2024

Basis Selection: Low-Rank Decomposition of Pretrained Large Language Models for Target Applications

Yang Li, Daniel Agyei Asante, Changsheng Zhao, Ernie Chang, Yangyang Shi, Vikas Chandra

arXiv:2405.15877v34.61 citationsh-index: 14

Originality Incremental advance

AI Analysis

This work addresses the problem of deploying LLMs on resource-limited devices like personal computers and mobile/wearable devices, though it is incremental as it builds on existing low-rank compression methods.

The paper tackles the computational and energy demands of large language models (LLMs) by introducing a low-rank decomposition method to compress them for specific applications, achieving significant model size reduction while maintaining comparable accuracy to state-of-the-art compression techniques on Llama 2 models.

Large language models (LLMs) significantly enhance the performance of various applications, but they are computationally intensive and energy-demanding. This makes it challenging to deploy them on devices with limited resources, such as personal computers and mobile/wearable devices, and results in substantial inference costs in resource-rich environments like cloud servers. To extend the use of LLMs, we introduce a low-rank decomposition approach to effectively compress these models, tailored to the requirements of specific applications. We observe that LLMs pretrained on general datasets contain many redundant components not needed for particular applications. Our method focuses on identifying and removing these redundant parts, retaining only the necessary elements for the target applications. Specifically, we represent the weight matrices of LLMs as a linear combination of base components. We then prune the irrelevant bases and enhance the model with new bases beneficial for specific applications. Deep compression results on the Llama 2-7b and -13B models, conducted on target applications including mathematical reasoning and code generation, show that our method significantly reduces model size while maintaining comparable accuracy to state-of-the-art low-rank compression techniques.

View on arXiv PDF

Similar