Globally optimized SVD compression of LLMs via Fermi-function-based rank selection and gauge fixing
This work addresses computational resource demands for LLMs, offering incremental improvements to SVD compression methods.
The paper tackles the problem of compressing Large Language Models (LLMs) via Singular Value Decomposition (SVD) by introducing two physics-inspired improvements: FermiGrad for globally optimal rank selection and PivGa for lossless compression of low-rank factors, resulting in enhanced compression efficiency.
Large Language Models (LLMs) are very demanding in terms of their computational resources. Low-rank decompositions of LLM weights, e.g. via Singular Value Decomposition (SVD), is a promising approach for LLM compression, but presents several practical hurdles, e.g. selecting appropriate layer-wise ranks and getting rid of its parameter redundancy. In this work, we present two physics-inspired improvements to SVD LLM compression: (1) \textbf{FermiGrad}, a gradient-descent algorithm that determines globally optimal layer-wise ranks by relaxing the discrete singular-value truncation into a continuous optimization using the Fermi function; (2) \textbf{PivGa}, an additional \textit{lossless} compression of the low-rank factors that exploits the intrinsic gauge freedom in their parametrization.