LGITJun 28, 2023

On information captured by neural networks: connections with memorization and generalization

arXiv:2306.15918v1h-index: 11
Originality Incremental advance
AI Analysis

This work addresses the fundamental challenge of neural network generalization for researchers and practitioners, but it appears incremental as it builds on existing information-theoretic perspectives without introducing a major new paradigm.

The paper tackles the problem of understanding when, how, and why neural networks generalize by studying information captured during training, deriving algorithms to limit label noise, defining unique sample informativeness, and relating this to generalization with nonvacuous bounds.

Despite the popularity and success of deep learning, there is limited understanding of when, how, and why neural networks generalize to unseen examples. Since learning can be seen as extracting information from data, we formally study information captured by neural networks during training. Specifically, we start with viewing learning in presence of noisy labels from an information-theoretic perspective and derive a learning algorithm that limits label noise information in weights. We then define a notion of unique information that an individual sample provides to the training of a deep network, shedding some light on the behavior of neural networks on examples that are atypical, ambiguous, or belong to underrepresented subpopulations. We relate example informativeness to generalization by deriving nonvacuous generalization gap bounds. Finally, by studying knowledge distillation, we highlight the important role of data and label complexity in generalization. Overall, our findings contribute to a deeper understanding of the mechanisms underlying neural network generalization.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes