MLLGJun 11, 2021

On Learnability via Gradient Method for Two-Layer ReLU Neural Networks in Teacher-Student Setting

arXiv:2106.06251v216 citations
Originality Highly original
AI Analysis

This provides a theoretical foundation for deep learning training dynamics, addressing a core problem for researchers in machine learning theory, though it is incremental as it builds on existing teacher-student frameworks.

The paper tackles the theoretical understanding of training two-layer ReLU neural networks in a teacher-student regression model, showing that with specific regularization and over-parameterization, gradient descent can identify the teacher's parameters with high probability despite non-convexity.

Deep learning empirically achieves high performance in many applications, but its training dynamics has not been fully understood theoretically. In this paper, we explore theoretical analysis on training two-layer ReLU neural networks in a teacher-student regression model, in which a student network learns an unknown teacher network through its outputs. We show that with a specific regularization and sufficient over-parameterization, the student network can identify the parameters of the teacher network with high probability via gradient descent with a norm dependent stepsize even though the objective function is highly non-convex. The key theoretical tool is the measure representation of the neural networks and a novel application of a dual certificate argument for sparse estimation on a measure space. We analyze the global minima and global convergence property in the measure space.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes