LGAIOct 19, 2023

Provable Guarantees for Neural Networks via Gradient Feature Learning

arXiv:2310.12408v114 citationsh-index: 21
Originality Highly original
AI Analysis

This work addresses a foundational gap in machine learning theory by providing a unified framework to explain feature learning in neural networks, which is incremental but broadens theoretical insights.

The authors tackled the problem of inadequate theoretical understanding of neural networks' success by proposing a unified analysis framework for two-layer networks trained via gradient descent, demonstrating its effectiveness in problems like mixtures of Gaussians and parity functions.

Neural networks have achieved remarkable empirical performance, while the current theoretical analysis is not adequate for understanding their success, e.g., the Neural Tangent Kernel approach fails to capture their key feature learning ability, while recent analyses on feature learning are typically problem-specific. This work proposes a unified analysis framework for two-layer networks trained by gradient descent. The framework is centered around the principle of feature learning from gradients, and its effectiveness is demonstrated by applications in several prototypical problems, such as mixtures of Gaussians and parity functions. The framework also sheds light on interesting network learning phenomena such as feature learning beyond kernels and the lottery ticket hypothesis.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes