LGITOct 1, 2025

Spectral Scaling Laws in Language Models: How Effectively Do Feed-Forward Networks Use Their Latent Space?

arXiv:2510.00537v15 citationsh-index: 9EMNLP
Originality Incremental advance
AI Analysis

This provides concrete guidance for inference-efficient LLM design by recasting width selection as a trade-off between tail and dominant-mode capacity, addressing a domain-specific bottleneck in model scaling.

The paper tackled the problem of how effectively feed-forward networks (FFNs) in large language models utilize their latent space as they scale, finding an asymmetric spectral scaling law where soft rank grows linearly with width but hard rank grows sublinearly, indicating that widening mostly adds low-energy directions while dominant subspaces saturate early.

As large language models (LLMs) scale, the question is not only how large they become, but how much of their capacity is effectively utilized. Existing scaling laws relate model size to loss, yet overlook how components exploit their latent space. We study feed-forward networks (FFNs) and recast width selection as a spectral utilization problem. Using a lightweight diagnostic suite -- Hard Rank (participation ratio), Soft Rank (Shannon rank), Spectral Concentration, and the composite Spectral Utilization Index (SUI) -- we quantify how many latent directions are meaningfully activated across LLaMA, GPT-2, and nGPT families. Our key finding is an asymmetric spectral scaling law: soft rank follows an almost perfect power law with FFN width, while hard rank grows only sublinearly and with high variance. This asymmetry suggests that widening FFNs mostly adds low-energy tail directions, while dominant-mode subspaces saturate early. Moreover, at larger widths, variance further collapses into a narrow subspace, leaving much of the latent space under-utilized. These results recast FFN width selection as a principled trade-off between tail capacity and dominant-mode capacity, offering concrete guidance for inference-efficient LLM design.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes