Mario Krell

2papers

2 Papers

NEOct 4, 2018
A Practical Approach to Sizing Neural Networks

Gerald Friedland, Alfredo Metere, Mario Krell

Memorization is worst-case generalization. Based on MacKay's information theoretic model of supervised machine learning, this article discusses how to practically estimate the maximum size of a neural network given a training data set. First, we present four easily applicable rules to analytically determine the capacity of neural network architectures. This allows the comparison of the efficiency of different network architectures independently of a task. Second, we introduce and experimentally validate a heuristic method to estimate the neural network capacity requirement for a given dataset and labeling. This allows an estimate of the required size of a neural network for a given problem. We conclude the article with a discussion on the consequences of sizing the network wrongly, which includes both increased computation effort for training as well as reduced generalization capability.

NEAug 20, 2017
A Capacity Scaling Law for Artificial Neural Networks

Gerald Friedland, Mario Krell

We derive the calculation of two critical numbers predicting the behavior of perceptron networks. First, we derive the calculation of what we call the lossless memory (LM) dimension. The LM dimension is a generalization of the Vapnik--Chervonenkis (VC) dimension that avoids structured data and therefore provides an upper bound for perfectly fitting almost any training data. Second, we derive what we call the MacKay (MK) dimension. This limit indicates a 50% chance of not being able to train a given function. Our derivations are performed by embedding a neural network into Shannon's communication model which allows to interpret the two points as capacities measured in bits. We present a proof and practical experiments that validate our upper bounds with repeatable experiments using different network configurations, diverse implementations, varying activation functions, and several learning algorithms. The bottom line is that the two capacity points scale strictly linear with the number of weights. Among other practical applications, our result allows to compare and benchmark different neural network implementations independent of a concrete learning task. Our results provide insight into the capabilities and limits of neural networks and generate valuable know how for experimental design decisions.