LG MLFeb 12, 2019

Improving learnability of neural networks: adding supplementary axes to disentangle data representation

arXiv:1902.04205v16.68 citations

Originality Incremental advance

AI Analysis

This addresses computational cost and overfitting issues in neural networks, particularly for medical images, but is incremental as it builds on prior work on adding informative nodes.

The paper tackles the problem of improving neural network learnability without increasing node count by adding supplementary axes to disentangle data representation, showing that models with concatenation achieve more robust and accurate training results compared to those without.

Over-parameterized deep neural networks have proven to be able to learn an arbitrary dataset with 100$\%$ training accuracy. Because of a risk of overfitting and computational cost issues, we cannot afford to increase the number of network nodes if we want achieve better training results for medical images. Previous deep learning research shows that the training ability of a neural network improves dramatically (for the same epoch of training) when a few nodes with supplementary information are added to the network. These few informative nodes allow the network to learn features that are otherwise difficult to learn by generating a disentangled data representation. This paper analyzes how concatenation of additional information as supplementary axes affects the training of the neural networks. This analysis was conducted for a simple multilayer perceptron (MLP) classification model with a rectified linear unit (ReLU) on two-dimensional training data. We compared the networks with and without concatenation of supplementary information to support our analysis. The model with concatenation showed more robust and accurate training results compared to the model without concatenation. We also confirmed that our findings are valid for deeper convolutional neural networks (CNN) using ultrasound images and for a conditional generative adversarial network (cGAN) using the MNIST data.

View on arXiv PDF

Similar