MLLGOCFeb 2, 2023

Over-parameterised Shallow Neural Networks with Asymmetrical Node Scaling: Global Convergence Guarantees and Feature Learning

arXiv:2302.01002v24 citationsh-index: 20
AI Analysis

This work addresses the challenge of feature learning in over-parameterized neural networks, offering theoretical guarantees for practitioners, though it is incremental as it builds on existing NTK frameworks.

The paper tackles the problem of gradient-based optimization for wide, shallow neural networks with non-identical scaling parameters, proving that gradient flow and descent converge to a global minimum and can learn features, unlike in the NTK parameterization, with experiments illustrating these results.

We consider gradient-based optimisation of wide, shallow neural networks, where the output of each hidden node is scaled by a positive parameter. The scaling parameters are non-identical, differing from the classical Neural Tangent Kernel (NTK) parameterisation. We prove that for large such neural networks, with high probability, gradient flow and gradient descent converge to a global minimum and can learn features in some sense, unlike in the NTK parameterisation. We perform experiments illustrating our theoretical results and discuss the benefits of such scaling in terms of prunability and transfer learning.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes