LGMay 15

IO-SVD: Input-Output Whitened SVD for Adaptive-Rank LLM Compression

Ali Abbasi, Chayne Thrash, Haoran Qin, Hamed Pirsiavash, Soheil Kolouri

arXiv:2605.1562687.7Has Code

Predicted impact top 11% in LG · last 90 daysOriginality Incremental advance

AI Analysis

For practitioners deploying large models under resource constraints, IO-SVD provides a more effective post-training compression method that preserves model quality better than existing SVD-based approaches.

IO-SVD introduces a KL-aware double-sided whitening space and heterogeneous rank allocation for SVD-based LLM compression, achieving minimal performance degradation with practical inference speedups across diverse LLM and VLM families.

Large language models deliver strong performance across language and reasoning tasks, but their storage and compute costs remain major barriers to deployment in resource-constrained and latency-sensitive settings. SVD-based post-training compression offers a hardware-agnostic way to reduce model size and improve inference efficiency through low-rank factorization. However, existing methods often rely on input-only whitening spaces, homogeneous rank allocation, or loss-agnostic allocation heuristics, limiting their ability to preserve model quality under aggressive compression. We propose Input-Output Whitened SVD (IO-SVD), a post-training compression method that forms a KL-aware double-sided whitening space for model weights. Using a second-order expansion of the KL loss over the top-K token probabilities, IO-SVD constructs an output-side metric that captures predictive sensitivity, while input whitening captures activation statistics. We further introduce an efficient heterogeneous rank-allocation strategy that scores whitened singular components using first-order calibration loss estimates and prunes the least sensitive components under a global budget. Inspired by prior work that combines SVD truncation with quantization, we improve hybrid SVD-quantization compression through loss-aware remapping, which selects low-rank factor rows for 8-bit quantization based on the predicted loss change incurred by quantizing them. Extensive experiments across diverse LLM and VLM families, and inference-time analysis shows that IO-SVD compresses LLMs with minimal performance degradation while delivering practical inference speedups. Code is available at https://github.com/mint-vu/IO-SVD.git

View on arXiv PDF Code

Similar