LG IT MLApr 18, 2018

Understanding Convolutional Neural Networks with Information Theory: An Initial Exploration

Shujian Yu, Kristoffer Wickstrøm, Robert Jenssen, Jose C. Principe

arXiv:1804.06537v517.886 citations

Originality Incremental advance

AI Analysis

This work provides a novel theoretical framework for understanding CNN representations, which could benefit researchers in deep learning and information theory, though it appears incremental as it builds on existing entropy estimators.

The authors tackled the problem of measuring information flow in convolutional neural networks (CNNs) without approximations, using matrix-based Renyi's α-entropy estimators, and introduced partial information decomposition to analyze synergy and redundancy in layer representations, validating data processing inequalities and revealing fundamental training properties.

The matrix-based Renyi's α-entropy functional and its multivariate extension were recently developed in terms of the normalized eigenspectrum of a Hermitian matrix of the projected data in a reproducing kernel Hilbert space (RKHS). However, the utility and possible applications of these new estimators are rather new and mostly unknown to practitioners. In this paper, we first show that our estimators enable straightforward measurement of information flow in realistic convolutional neural networks (CNN) without any approximation. Then, we introduce the partial information decomposition (PID) framework and develop three quantities to analyze the synergy and redundancy in convolutional layer representations. Our results validate two fundamental data processing inequalities and reveal some fundamental properties concerning the training of CNN.

View on arXiv PDF

Similar