A Fourier perspective on the learning dynamics of neural networks: from sample complexities to mechanistic insights

arXiv:2605.1691389.0
AI Analysis

For researchers studying learning dynamics and generalization in neural networks, this work provides a theoretical and experimental framework linking Fourier properties of data to sample complexity and training speed.

This work studies the simplicity bias in neural networks from a Fourier perspective, showing that networks first learn amplitude information (pair-wise pixel correlations) before phase information (edges and higher-order correlations). The authors prove that for isotropic inputs, online SGD requires at least N^3 log^2 N steps to learn phase information, but power-law spectra can dramatically accelerate this learning, providing mechanistic insights into efficient learning of natural images.

Neural networks trained with gradient-based methods exhibit a strong simplicity bias: they learn simpler statistical features of their data before moving to more complex features. Previous analyses of this phenomenon have largely focused on settings with (quasi-)isotropic inputs. In this work, we study the simplicity bias from a Fourier perspective, which allows us to include two key features of natural images in the analysis: approximate translation-invariance and power-law spectra. We first show experimentally that simple neural networks trained on image classification tasks first rely on amplitude information -- related to pair-wise correlations between pixels -- before exploiting phase information, which encodes edges and higher-order correlations. In view of this, we introduce a synthetic data model for translation-invariant inputs that allows precise control over amplitudes and phases while remaining tractable. We rigorously establish that for isotropic and high-dimensional inputs, classification based on phase information alone is a genuinely hard task: online stochastic gradient descent (SGD) cannot distinguish the structured inputs from noise within $n \ll N^3$ steps, but needs at least $n \gg N^3 \log^2{N}$ steps. In contrast, we show both experimentally and theoretically that power-law spectra can dramatically accelerate the speed of learning phase information, even if the spectra do not help with classification. Simulations with two-layer networks trained on textures and with deep convolutional networks on ImageNet and CIFAR100 confirm this non-trivial interaction between amplitudes and phases, providing mechanistic insights into how deep neural networks can learn natural image distributions efficiently.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes