CPSVD: Enhancing Large Language Model Compression via Column-Preserving Singular Value Decomposition
This addresses the critical bottleneck of LLM size for deployment, though it appears incremental as it refines an existing SVD approach rather than introducing a fundamentally new compression paradigm.
The paper tackles the problem of compressing large language models by proposing CPSVD, a method that improves upon standard SVD by selectively preserving columns with high decomposition errors and applying compression only to columns with low errors, achieving lower perplexity and higher accuracy on zero-shot tasks compared to existing SVD-based methods.
The rapid advancement of Large Language Models (LLMs) faces a critical bottleneck in their immense size, necessitating efficient compression techniques. While Singular Value Decomposition (SVD) is a promising approach, existing SVD-based methods treat the entire parameter matrix uniformly, overlooking that SVD approximation errors vary significantly across different matrix parts, which often leads to suboptimal compression. To address this, we propose \textbf{C}olumn-\textbf{P}reserving \textbf{S}ingular \textbf{V}alue \textbf{D}ecomposition (CPSVD), a novel method that refines SVD-based LLM compression by intelligently segmenting the parameter matrix. Unlike traditional SVD, CPSVD identifies and directly preserves matrix columns with high decomposition errors, applying SVD only to columns with low decomposition errors, while precisely determining the optimal balance point between these two strategies to minimize error. Furthermore, leveraging the inherent heterogeneity in decomposition errors across different matrices within an LLM, CPSVD adaptively allocates non-uniform compression rates to modules within that layer, while adhering to a target layer-wise compression ratio, thereby further enhancing compression performance. Extensive experiments demonstrate that CPSVD consistently outperforms state-of-the-art SVD-based LLM compression methods, achieving lower perplexity and higher accuracy on zero-shot tasks.