ML LGFeb 2

Learning Beyond the Gaussian Data: Learning Dynamics of Neural Networks on an Expressive and Cumulant-Controllable Data Model

arXiv:2602.02153v11.7h-index: 3

Originality Incremental advance

AI Analysis

This work addresses the gap between simplified data assumptions and real-world complexity for machine learning researchers, offering a framework to study distributional effects, though it is incremental in extending existing data modeling approaches.

The authors tackled the problem of understanding how high-order data statistics affect neural network learning by using a controllable non-Gaussian data model, revealing that networks learn low-order statistics first and progressively capture high-order cumulants, with experiments on Fashion-MNIST confirming these findings.

We study the effect of high-order statistics of data on the learning dynamics of neural networks (NNs) by using a moment-controllable non-Gaussian data model. Considering the expressivity of two-layer neural networks, we first construct the data model as a generative two-layer NN where the activation function is expanded by using Hermite polynomials. This allows us to achieve interpretable control over high-order cumulants such as skewness and kurtosis through the Hermite coefficients while keeping the data model realistic. Using samples generated from the data model, we perform controlled online learning experiments with a two-layer NN. Our results reveal a moment-wise progression in training: networks first capture low-order statistics such as mean and covariance, and progressively learn high-order cumulants. Finally, we pretrain the generative model on the Fashion-MNIST dataset and leverage the generated samples for further experiments. The results of these additional experiments confirm our conclusions and show the utility of the data model in a real-world scenario. Overall, our proposed approach bridges simplified data assumptions and practical data complexity, which offers a principled framework for investigating distributional effects in machine learning and signal processing.

View on arXiv PDF

Similar