LGDIS-NNHEP-THMLJun 13, 2024

Opening the Black Box: predicting the trainability of deep neural networks with reconstruction entropy

arXiv:2406.12916v3
Originality Incremental advance
AI Analysis

This addresses the challenge of efficiently identifying trainable initial conditions for deep neural networks, which is incremental as it builds on existing methods for analyzing network behavior.

The paper tackles the problem of predicting trainability of deep neural networks by using reconstruction entropy from shallow auxiliary networks, achieving a significant reduction in training time with a single epoch of training across multiple datasets.

An important challenge in machine learning is to predict the initial conditions under which a given neural network will be trainable. We present a method for predicting the trainable regime in parameter space for deep feedforward neural networks (DNNs) based on reconstructing the input from subsequent activation layers via a cascade of single-layer auxiliary networks. We show that a single epoch of training of the shallow cascade networks is sufficient to predict the trainability of the deep feedforward network on a range of datasets (MNIST, CIFAR10, FashionMNIST, and white noise), thereby providing a significant reduction in overall training time. We achieve this by computing the relative entropy between reconstructed images and the original inputs, and show that this probe of information loss is sensitive to the phase behaviour of the network. We further demonstrate that this method generalizes to residual neural networks (ResNets) and convolutional neural networks (CNNs). Moreover, our method illustrates the network's decision making process by displaying the changes performed on the input data at each layer, which we demonstrate for both a DNN trained on MNIST and the vgg16 CNN trained on the ImageNet dataset. Our results provide a technique for significantly accelerating the training of large neural networks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes