Rethinking the Role of Tensor Decompositions in Post-Training LLM Compression
For practitioners deploying LLMs under resource constraints, this work clarifies when tensor compression is viable, revealing its limitations at large scale.
This paper systematically evaluates tensor decompositions for post-training LLM compression, identifying a fundamental mismatch between tensor methods and modern LLM representations, and delineating their practical limits.
Post-training compression is essential for deploying large language models (LLMs) under tight resource constraints. Tensor decompositions have emerged as a promising direction, offering compact parameterizations well suited to Transformer weight structures. However, existing studies evaluate these methods in narrow settings, leaving unclear whether tensorization is effective at large-scale deployment. We systematically evaluate tensor compression across dense and MoE architectures, establishing performance trade-offs grounded in both empirical analysis and theoretical analysis. We identify a fundamental mismatch between the shared subspaces assumed by tensor decompositions and the heterogeneous representations learned by modern LLMs, thereby delineating their practical limits and clarifying their viable role in large-scale deployment. The code is available at https://github.com/brain-lab-research/TT-LLM.