A Fourier perspective on the learning dynamics of neural networks: from sample complexities to mechanistic insights

Fabiola Ricci, Claudia Merger, Sebastian Goldt

arXiv:2605.1691389.0

AI Analysis

For researchers studying learning dynamics and generalization in neural networks, this work provides a theoretical and experimental framework linking Fourier properties of data to sample complexity and training speed.

This work studies the simplicity bias in neural networks from a Fourier perspective, showing that networks first learn amplitude information (pair-wise pixel correlations) before phase information (edges and higher-order correlations). The authors prove that for isotropic inputs, online SGD requires at least N^3 log^2 N steps to learn phase information, but power-law spectra can dramatically accelerate this learning, providing mechanistic insights into efficient learning of natural images.

Neural networks trained with gradient-based methods exhibit a strong simplicity bias: they learn simpler statistical features of their data before moving to more complex features. Previous analyses of this phenomenon have largely focused on settings with (quasi-)isotropic inputs. In this work, we study the simplicity bias from a Fourier perspective, which allows us to include two key features of natural images in the analysis: approximate translation-invariance and power-law spectra. We first show experimentally that simple neural networks trained on image classification tasks first rely on amplitude information -- related to pair-wise correlations between pixels -- before exploiting phase information, which encodes edges and higher-order correlations. In view of this, we introduce a synthetic data model for translation-invariant inputs that allows precise control over amplitudes and phases while remaining tractable. We rigorously establish that for isotropic and high-dimensional inputs, classification based on phase information alone is a genuinely hard task: online stochastic gradient descent (SGD) cannot distinguish the structured inputs from noise within $n \ll N^3$ steps, but needs at least $n \gg N^3 \log^2{N}$ steps. In contrast, we show both experimentally and theoretically that power-law spectra can dramatically accelerate the speed of learning phase information, even if the spectra do not help with classification. Simulations with two-layer networks trained on textures and with deep convolutional networks on ImageNet and CIFAR100 confirm this non-trivial interaction between amplitudes and phases, providing mechanistic insights into how deep neural networks can learn natural image distributions efficiently.

View on arXiv PDF

Similar