Pointwise Generalization in Deep Neural Networks

arXiv:2605.1859872.0

Predicted impact top 25% in LG · last 90 daysOriginality Highly original

AI Analysis

This work provides a new theoretical framework for understanding generalization in deep networks, offering tighter bounds and insights into feature compression and implicit bias, which is significant for the machine learning theory community.

The authors propose a pointwise generalization theory for deep neural networks that resolves barriers to characterizing the feature-learning regime. Their bounds, based on a pointwise Riemannian Dimension, are orders of magnitude tighter than existing approaches in both theory and experiments.

We address the fundamental question of why deep neural networks generalize by establishing a pointwise generalization theory for fully connected networks. This framework resolves long-standing barriers to characterizing the rich nonlinear feature-learning regime and builds a new statistical foundation for representation learning. For each trained model, we characterize the hypothesis via a pointwise Riemannian Dimension, derived from the eigenvalues of the learned feature representations across layers. This establishes a principled framework for deriving hypothesis-dependent, representation-aware generalization bounds. These bounds offer a systematic upgrade over approaches based on model size, products of norms, and infinite-width linearizations, yielding guarantees that are orders of magnitude tighter in both theory and experiment. Analytically, we identify the structural properties and mathematical principles that explain the tractability of deep networks. Empirically, the pointwise Riemannian Dimension exhibits substantial feature compression, decreases with increased over-parameterization, and captures the implicit bias of optimizers. Taken together, our results indicate that deep networks are mathematically tractable in practical regimes and that their generalization is sharply explained by pointwise, feature-spectrum-aware complexity.

View on arXiv PDF

Similar