LG MLMar 20, 2019

Topology-based Representative Datasets to Reduce Neural Network Training Resources

Rocio Gonzalez-Diaz, Miguel A. Gutiérrez-Naranjo, Eduardo Paluzo-Hidalgo

arXiv:1903.08519v35.49 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses the resource-intensive training process for neural network practitioners, though it is incremental as it focuses on a specific architecture and loss function.

The paper tackles the problem of long neural network training times by proposing a method to create smaller representative datasets using persistence diagrams, proving that training on these datasets yields similar accuracy to the original for perceptrons with mean squared error loss.

One of the main drawbacks of the practical use of neural networks is the long time required in the training process. Such a training process consists of an iterative change of parameters trying to minimize a loss function. These changes are driven by a dataset, which can be seen as a set of labelled points in an n-dimensional space. In this paper, we explore the concept of are representative dataset which is a dataset smaller than the original one, satisfying a nearness condition independent of isometric transformations. Representativeness is measured using persistence diagrams (a computational topology tool) due to its computational efficiency. We prove that the accuracy of the learning process of a neural network on a representative dataset is "similar" to the accuracy on the original dataset when the neural network architecture is a perceptron and the loss function is the mean squared error. These theoretical results accompanied by experimentation open a door to reducing the size of the dataset to gain time in the training process of any neural network.

View on arXiv PDF Code

Similar