LGApr 20

Predicting LLM Compression Degradation from Spectral Statistics

arXiv:2604.1808558.6

Predicted impact top 39% in LG · last 90 daysOriginality Synthesis-oriented

AI Analysis

For practitioners needing to compress LLMs, this provides a cheap predictor to avoid expensive trial-and-error, though it is incremental over known spectral statistics.

The paper shows that stable rank and information density predict LLM compression degradation, with the interaction term γ·ρ̄_s achieving Pearson correlations of 0.890 for attention layers and 0.839 for MLP layers across Qwen3 and Gemma3 families.

Matrix-level low-rank compression is a promising way to reduce the cost of large language models, but running compression and evaluating the resulting models on language tasks can be prohibitively expensive. Can compression-induced degradation be predicted before committing to this compute? We systematically analyze the Qwen3 and Gemma3 model families across four representative low-rank compression methods: vanilla SVD, two ASVD variants, and SVD-LLM. We find that stable rank and information density, measured in bits per parameter, dominate performance degradation. The interaction term $γ\cdot \barρ_s$, defined as compression ratio times stable rank, is a robust predictor of accuracy degradation, achieving leave-one-out cross-validation Pearson correlations of $0.890$ for attention layers and $0.839$ for MLP layers. We provide theoretical intuition for why this predictor succeeds by connecting it to standard SVD truncation bounds and error composition mechanisms in transformer layers. These findings enable a predict-then-compress workflow: compute $γ\cdot \barρ_s$ from weights, estimate degradation, and invest compute only in desirable configurations.

View on arXiv PDF

Similar