LG NE MLSep 17, 2018

Self Configuration in Machine Learning

arXiv:1809.06463v11 citations

Originality Incremental advance

AI Analysis

This work addresses the challenge of efficient and automated deep learning model training, offering a novel approach that could reduce computational costs and manual tuning, though it appears incremental in the context of existing layer-wise methods.

The paper tackles the problem of training deep neural networks by introducing a layer-wise training algorithm that isolates each layer's optimization, resulting in a very fast training process. It also enables automatic network configuration, including determining the number of outputs per layer, allowing for the construction of fully trained networks from data alone.

In this paper we first present a class of algorithms for training multi-level neural networks with a quadratic cost function one layer at a time starting from the input layer. The algorithm is based on the fact that for any layer to be trained, the effect of a direct connection to an optimized linear output layer can be computed without the connection being made. Thus, starting from the input layer, we can train each layer in succession in isolation from the other layers. Once trained, the weights are kept fixed and the outputs of the trained layer then serve as the inputs to the next layer to be trained. The result is a very fast algorithm. The simplicity of this training arrangement allows the activation function and step size in weight adjustment to be adaptive and self-adjusting. Furthermore, the stability of the training process allows relatively large steps to be taken and thereby achieving in even greater speeds. Finally, in our context configuring the network means determining the number of outputs for each layer. By decomposing the overall cost function into separate components related to approximation and estimation, we obtain an optimization formula for determining the number of outputs for each layer. With the ability to self-configure and set parameters, we now have more than a fast training algorithm, but the ability to build automatically a fully trained deep neural network starting with nothing more than data.

View on arXiv PDF

Similar