LGMay 27, 2021

Learning Structures for Deep Neural Networks

Jinhui Yuan, Fei Pan, Chunting Zhou, Tao Qin, Tie-Yan Liu

arXiv:2105.13905v11.6

Originality Incremental advance

AI Analysis

This work addresses the challenge of automatically designing neural network architectures without labeled data, which could reduce manual effort in model development, though it is incremental as it builds on existing principles like sparse coding.

The paper tackles the problem of unsupervised structure learning for deep neural networks by applying the efficient coding principle to maximize output entropy, which is linked to better classification accuracy. Experiments on an image classification dataset show that the algorithm learns structures achieving accuracy comparable to expert-designed convolutional neural networks.

In this paper, we focus on the unsupervised setting for structure learning of deep neural networks and propose to adopt the efficient coding principle, rooted in information theory and developed in computational neuroscience, to guide the procedure of structure learning without label information. This principle suggests that a good network structure should maximize the mutual information between inputs and outputs, or equivalently maximize the entropy of outputs under mild assumptions. We further establish connections between this principle and the theory of Bayesian optimal classification, and empirically verify that larger entropy of the outputs of a deep neural network indeed corresponds to a better classification accuracy. Then as an implementation of the principle, we show that sparse coding can effectively maximize the entropy of the output signals, and accordingly design an algorithm based on global group sparse coding to automatically learn the inter-layer connection and determine the depth of a neural network. Our experiments on a public image classification dataset demonstrate that using the structure learned from scratch by our proposed algorithm, one can achieve a classification accuracy comparable to the best expert-designed structure (i.e., convolutional neural networks (CNN)). In addition, our proposed algorithm successfully discovers the local connectivity (corresponding to local receptive fields in CNN) and invariance structure (corresponding to pulling in CNN), as well as achieves a good tradeoff between marginal performance gain and network depth.

View on arXiv PDF

Similar