LGJul 7, 2022

Learning and generalization of one-hidden-layer neural networks, going beyond standard Gaussian data

arXiv:2207.03615v210 citationsh-index: 99
Originality Incremental advance
AI Analysis

This work addresses the challenge of neural network generalization for researchers in machine learning theory, focusing on non-standard data distributions, but it is incremental as it extends existing analyses to more complex input models.

The paper tackles the problem of training a one-hidden-layer neural network with non-Gaussian input data, specifically Gaussian mixture models, and proves linear convergence to a critical point with guaranteed generalization error, characterizing the impact of input distributions on sample complexity and learning rate.

This paper analyzes the convergence and generalization of training a one-hidden-layer neural network when the input features follow the Gaussian mixture model consisting of a finite number of Gaussian distributions. Assuming the labels are generated from a teacher model with an unknown ground truth weight, the learning problem is to estimate the underlying teacher model by minimizing a non-convex risk function over a student neural network. With a finite number of training samples, referred to the sample complexity, the iterations are proved to converge linearly to a critical point with guaranteed generalization error. In addition, for the first time, this paper characterizes the impact of the input distributions on the sample complexity and the learning rate.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes