LGITSep 30, 2025

A Generalized Information Bottleneck Theory of Deep Learning

arXiv:2509.26327v25 citationsh-index: 10
Originality Incremental advance
AI Analysis

This work addresses theoretical ambiguities and estimation challenges in the Information Bottleneck framework for neural networks, offering a more practical and interpretable approach, though it appears incremental as it builds on existing IB theory.

The authors tackled the limitations of the Information Bottleneck (IB) principle in deep learning by introducing a Generalized Information Bottleneck (GIB) framework based on synergy, showing that synergistic functions achieve superior generalization compared to non-synergistic ones, with GIB consistently exhibiting compression phases across various architectures where standard IB fails.

The Information Bottleneck (IB) principle offers a compelling theoretical framework to understand how neural networks (NNs) learn. However, its practical utility has been constrained by unresolved theoretical ambiguities and significant challenges in accurate estimation. In this paper, we present a \textit{Generalized Information Bottleneck (GIB)} framework that reformulates the original IB principle through the lens of synergy, i.e., the information obtainable only through joint processing of features. We provide theoretical and empirical evidence demonstrating that synergistic functions achieve superior generalization compared to their non-synergistic counterparts. Building on these foundations we re-formulate the IB using a computable definition of synergy based on the average interaction information (II) of each feature with those remaining. We demonstrate that the original IB objective is upper bounded by our GIB in the case of perfect estimation, ensuring compatibility with existing IB theory while addressing its limitations. Our experimental results demonstrate that GIB consistently exhibits compression phases across a wide range of architectures (including those with \textit{ReLU} activations where the standard IB fails), while yielding interpretable dynamics in both CNNs and Transformers and aligning more closely with our understanding of adversarial robustness.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes