A Generalized Singular Value Theory for Neural Networks
This work provides a theoretical framework for understanding neural network representations, with potential applications to adversarial robustness, bias, and invertibility, but the results are theoretical with limited empirical validation.
The authors prove that most modern neural networks admit a generalized SVD representation where they are left-invertible before the final linear layer, with norm-preserving nonlinear activations. They provide an algorithm to estimate this representation and show it can detect adversarial perturbations.
Building on the abstract Generalized Singular Value Decomposition (GSVD) theory of Brown et al. [2025], we prove that most modern neural architectures admit a generalized SVD representation in which they are left-invertible before a final linear layer, with no change in input-output behavior. Furthermore, the left-invertible nonlinear portion of the input-output behavior can be made to be \emph{norm preserving}, meaning that perturbations in the left-invertible ``embedding'' (the activations prior to the final linear layer in this representation) correspond proportionally to changes in the input, i.e., distance in feature space can be calibrated directly to distance in input space. We provide a data-driven algorithm for estimating this representation from trained models and propose a model architecture that naturally facilitates the decomposition. We then provide a proof-of-concept that the learned representation can be used to identify adversarial perturbations to model inputs, and develop the theory necessary for future applications to areas such as model bias and invertibility.