Learning and generalization of one-hidden-layer neural networks, going beyond standard Gaussian data
This work addresses the challenge of neural network generalization for researchers in machine learning theory, focusing on non-standard data distributions, but it is incremental as it extends existing analyses to more complex input models.
The paper tackles the problem of training a one-hidden-layer neural network with non-Gaussian input data, specifically Gaussian mixture models, and proves linear convergence to a critical point with guaranteed generalization error, characterizing the impact of input distributions on sample complexity and learning rate.
This paper analyzes the convergence and generalization of training a one-hidden-layer neural network when the input features follow the Gaussian mixture model consisting of a finite number of Gaussian distributions. Assuming the labels are generated from a teacher model with an unknown ground truth weight, the learning problem is to estimate the underlying teacher model by minimizing a non-convex risk function over a student neural network. With a finite number of training samples, referred to the sample complexity, the iterations are proved to converge linearly to a critical point with guaranteed generalization error. In addition, for the first time, this paper characterizes the impact of the input distributions on the sample complexity and the learning rate.