LGMay 7

Structural Correspondence and Universal Approximation in Diagonal plus Low-Rank Neural Networks

arXiv:2605.0565949.2
Predicted impact top 51% in LG · last 90 daysOriginality Highly original
AI Analysis

For deep learning practitioners, this provides theoretical justification for using parameter-efficient low-rank structures (like LoRA) without sacrificing expressive power, addressing a key bottleneck in scaling models.

The paper shows that neural networks constrained to low-rank manifolds fail at function approximation, but adding a minimal sparse diagonal component (DLoR) restores universal approximation. It proves that DLoR networks can exactly reconstruct any full-rank transformation and achieve superior parameter-to-expressivity scaling with depth.

The massive computational costs of scaling modern deep learning architectures have driven the widespread use of parameter-efficient low-rank structures, such as LoRA and low-rank factorization. However, theoretical guarantees for their expressive power are less explored, often relying on restrictive priors like a pretrained base matrix, ReLU activations or non-verifiable singularity conditions. We first investigate the limits of neural networks constrained strictly to low-rank manifolds without pretrained dense priors. We demonstrate a theoretical paradox: while purely rank-1 layers can exactly interpolate arbitrary scalar datasets, they collapse for function approximations. To overcome this bottleneck without surrendering parameter efficiency, we introduce a unified \textit{Structural Correspondence} framework. We prove that augmenting low-rank layers with only a minimal sparse diagonal component, say a Diagonal plus Low-Rank (DLoR) structure, is sufficient to reach Universal Approximation. We show that any full-rank transformation can be exactly reconstructed using these DLoR components by trading off network width (additive decomposition) or depth (multiplicative decomposition). By tracking asymptotic Taylor remainders, we prove that DLoR neural networks fully restore the Universal Approximation Theorem for general activation functions. Finally, we establish that multiplicative depth provides superior parameter-to-expressivity scaling compared to additive width. Our results show that dense matrices and specific activation functions are not topological prerequisites for universal expressivity.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes