Understanding Generalization from Embedding Dimension and Distributional Convergence
This provides a novel representation-centric perspective on generalization, addressing a fundamental challenge in machine learning theory for researchers and practitioners.
The paper tackles the problem of understanding generalization in deep neural networks by analyzing the geometry of learned embeddings, showing that population risk can be bounded by the intrinsic dimension of embeddings and the sensitivity of downstream mappings, with experiments validating the theory.
Deep neural networks often generalize well despite heavy over-parameterization, challenging classical parameter-based analyses. We study generalization from a representation-centric perspective and analyze how the geometry of learned embeddings controls predictive performance for a fixed trained model. We show that population risk can be bounded by two factors: (i) the intrinsic dimension of the embedding distribution, which determines the convergence rate of empirical embedding distribution to the population distribution in Wasserstein distance, and (ii) the sensitivity of the downstream mapping from embeddings to predictions, characterized by Lipschitz constants. Together, these yield an embedding-dependent error bound that does not rely on parameter counts or hypothesis class complexity. At the final embedding layer, architectural sensitivity vanishes and the bound is dominated by embedding dimension, explaining its strong empirical correlation with generalization performance. Experiments across architectures and datasets validate the theory and demonstrate the utility of embedding-based diagnostics.