LGMay 19, 2021

Multi-layer Perceptron Trainability Explained via Variability

arXiv:2105.08911v35.55 citations

Originality Incremental advance

AI Analysis

This work addresses a fundamental issue in deep learning for researchers, but it is incremental as it builds on existing trainability studies with a new metric.

The study tackled the problem of understanding trainability in multi-layer perceptrons (MLPs) by introducing a new notion called variability, which correlates with activation numbers and predicts trainability on a stylized model, showing that the absolute value function can offer better variability than ReLU.

Despite the tremendous successes of deep neural networks (DNNs) in various applications, many fundamental aspects of deep learning remain incompletely understood, including DNN trainability. In a trainability study, one aims to discern what makes one DNN model easier to train than another under comparable conditions. In particular, our study focuses on multi-layer perceptron (MLP) models equipped with the same number of parameters. We introduce a new notion called variability to help explain the benefits of deep learning and the difficulties in training very deep MLPs. Simply put, variability of a neural network represents the richness of landscape patterns in the data space with respect to well-scaled random weights. We empirically show that variability is positively correlated to the number of activations and negatively correlated to a phenomenon called "Collapse to Constant", which is related but not identical to the well-known vanishing gradient phenomenon. Experiments on a small stylized model problem confirm that variability can indeed accurately predict MLP trainability. In addition, we demonstrate that, as an activation function in MLP models, the absolute value function can offer better variability than the popular ReLU function can.

View on arXiv PDF

Similar