ML LGFeb 19, 2025

The Computational Advantage of Depth: Learning High-Dimensional Hierarchical Functions with Gradient Descent

Yatin Dandi, Luca Pesce, Lenka Zdeborová, Florent Krzakala

arXiv:2502.13961v415.56 citationsh-index: 6Has Code

Originality Incremental advance

AI Analysis

This work addresses a foundational problem in machine learning theory for researchers, providing analytical insights into the computational advantages of depth, though it is incremental as it builds on existing theoretical frameworks.

The paper tackles the theoretical challenge of understanding why deep neural networks outperform shallow ones by introducing a class of hierarchical target functions and analyzing learning dynamics with gradient descent. The main result shows that deep networks can learn these functions with significantly fewer samples than shallow networks, due to a feature learning process that reduces effective dimensionality.

Understanding the advantages of deep neural networks trained by gradient descent (GD) compared to shallow models remains an open theoretical challenge. In this paper, we introduce a class of target functions (single and multi-index Gaussian hierarchical targets) that incorporate a hierarchy of latent subspace dimensionalities. This framework enables us to analytically study the learning dynamics and generalization performance of deep networks compared to shallow ones in the high-dimensional limit. Specifically, our main theorem shows that feature learning with GD successively reduces the effective dimensionality, transforming a high-dimensional problem into a sequence of lower-dimensional ones. This enables learning the target function with drastically less samples than with shallow networks. While the results are proven in a controlled training setting, we also discuss more common training procedures and argue that they learn through the same mechanisms.

View on arXiv PDF Code

Similar