LG NE OCMar 7, 2019

Limiting Network Size within Finite Bounds for Optimization

arXiv:1903.02809v11.02 citations

Originality Incremental advance

AI Analysis

This work addresses the need for efficient network sizing in shallow neural networks to prevent overfitting and minimize training effort, though it is incremental as it builds on existing VC Dimension theory.

The paper tackles the problem of determining the optimal network size for binary classification tasks to avoid overfitting and reduce computational complexity, by providing theoretical bounds on the hidden layer width in single-layered feed-forward networks and validating these findings experimentally on three datasets.

Largest theoretical contribution to Neural Networks comes from VC Dimension which characterizes the sample complexity of classification model in a probabilistic view and are widely used to study the generalization error. So far in the literature the VC Dimension has only been used to approximate the generalization error bounds on different Neural Network architectures. VC Dimension has not yet been implicitly or explicitly stated to fix the network size which is important as the wrong configuration could lead to high computation effort in training and leads to over fitting. So there is a need to bound these units so that task can be computed with only sufficient number of parameters. For binary classification tasks shallow networks are used as they have universal approximation property and it is enough to size the hidden layer width for such networks. The paper brings out a theoretical justification on required attribute size and its corresponding hidden layer dimension for a given sample set that gives an optimal binary classification results with minimum training complexity in a single layered feed forward network framework. The paper also establishes proof on the existence of bounds on the width of the hidden layer and its range subjected to certain conditions. Findings in this paper are experimentally analyzed on three different dataset using Mathlab 2018 (b) software.

View on arXiv PDF

Similar