ML LG OCFeb 2, 2023

Over-parameterised Shallow Neural Networks with Asymmetrical Node Scaling: Global Convergence Guarantees and Feature Learning

Francois Caron, Fadhel Ayed, Paul Jung, Hoil Lee, Juho Lee, Hongseok Yang

arXiv:2302.01002v28.64 citationsh-index: 20Has Code

Originality Incremental advance

AI Analysis

This work addresses the challenge of feature learning in over-parameterized neural networks, offering theoretical guarantees for practitioners, though it is incremental as it builds on existing NTK frameworks.

The paper tackles the problem of gradient-based optimization for wide, shallow neural networks with non-identical scaling parameters, proving that gradient flow and descent converge to a global minimum and can learn features, unlike in the NTK parameterization, with experiments illustrating these results.

We consider gradient-based optimisation of wide, shallow neural networks, where the output of each hidden node is scaled by a positive parameter. The scaling parameters are non-identical, differing from the classical Neural Tangent Kernel (NTK) parameterisation. We prove that for large such neural networks, with high probability, gradient flow and gradient descent converge to a global minimum and can learn features in some sense, unlike in the NTK parameterisation. We perform experiments illustrating our theoretical results and discuss the benefits of such scaling in terms of prunability and transfer learning.

View on arXiv PDF Code

Similar