Low-Rank Prehab: Preparing Neural Networks for SVD Compression
This work addresses the challenge of compressing large models like LLMs and Transformers more effectively, though it is incremental as it builds on existing SVD compression methods.
The paper tackles the problem of neural network compression via low-rank approximation by introducing a pre-compression fine-tuning stage called Low-Rank Prehab, which reduces the immediate accuracy drop after compression and improves post-finetuning performance, outperforming state-of-the-art SVD-based techniques across various compression ratios.
Low-rank approximation methods such as singular value decomposition (SVD) and its variants (e.g., Fisher-weighted SVD, Activation SVD) have recently emerged as effective tools for neural network compression. In this setting, decomposition acts as a "surgical" intervention, followed by fine-tuning that serves as "rehab" to recover accuracy. Inspired by prehabilitation in surgery, we introduce a pre-compression fine-tuning stage, Low-Rank Prehab, that explicitly encourages low-rank structure in weight matrices while preserving task performance. By conditioning the model before SVD, Prehab steers weights toward spectrally compact regions of the parameter space, enabling smoother low-rank approximation and improved recovery. Experiments on large language models (LLMs) and other Transformer-based architectures, including Vision Transformers (ViTs), show that Prehab substantially reduces the immediate accuracy drop after compression and consistently improves post-finetuning performance. Across a wide range of compression ratios, our method outperforms state-of-the-art SVD-based techniques such as SVD-LLM, highlighting the importance of preparing models for compression rather than only improving the compression and recovery stages. Source code is available at https://github.com/niqretnuh/PREHAB-SVD