Revealing the Structure of Deep Neural Networks via Convex Duality
This provides theoretical insights into neural network optimization, explaining phenomena like Neural Collapse, but is incremental as it builds on existing convex duality methods.
The paper tackled the problem of understanding the structure of optimal hidden layers in regularized deep neural networks by introducing a convex analytic framework, proving that optimal weights align via duality and that deep ReLU networks yield spline interpolation for one-dimensional datasets.
We study regularized deep neural networks (DNNs) and introduce a convex analytic framework to characterize the structure of the hidden layers. We show that a set of optimal hidden layer weights for a norm regularized DNN training problem can be explicitly found as the extreme points of a convex set. For the special case of deep linear networks, we prove that each optimal weight matrix aligns with the previous layers via duality. More importantly, we apply the same characterization to deep ReLU networks with whitened data and prove the same weight alignment holds. As a corollary, we also prove that norm regularized deep ReLU networks yield spline interpolation for one-dimensional datasets which was previously known only for two-layer networks. Furthermore, we provide closed-form solutions for the optimal layer weights when data is rank-one or whitened. The same analysis also applies to architectures with batch normalization even for arbitrary data. Therefore, we obtain a complete explanation for a recent empirical observation termed Neural Collapse where class means collapse to the vertices of a simplex equiangular tight frame.