A Unified Matrix-Spectral Framework for Stability and Interpretability in Deep Learning
This work addresses stability and interpretability issues in deep learning, offering practical diagnostics and regularization for robustness-aware model design, but it is incremental as it builds on existing spectral analysis methods.
The authors tackled the problem of analyzing stability and interpretability in deep neural networks by developing a unified matrix-spectral framework, resulting in modest spectral regularization that substantially improves attribution stability on datasets like MNIST, CIFAR-10, and CIFAR-100.
We develop a unified matrix-spectral framework for analyzing stability and interpretability in deep neural networks. Representing networks as data-dependent products of linear operators reveals spectral quantities governing sensitivity to input perturbations, label noise, and training dynamics. We introduce a Global Matrix Stability Index that aggregates spectral information from Jacobians, parameter gradients, Neural Tangent Kernel operators, and loss Hessians into a single stability scale controlling forward sensitivity, attribution robustness, and optimization conditioning. We further show that spectral entropy refines classical operator-norm bounds by capturing typical, rather than purely worst-case, sensitivity. These quantities yield computable diagnostics and stability-oriented regularization principles. Synthetic experiments and controlled studies on MNIST, CIFAR-10, and CIFAR-100 confirm that modest spectral regularization substantially improves attribution stability even when global spectral summaries change little. The results establish a precise connection between spectral concentration and analytic stability, providing practical guidance for robustness-aware model design and training.