CNN training with graph-based sample preselection: application to handwritten character recognition
This work addresses efficiency in training for researchers and practitioners in computer vision, but it is incremental as it applies an existing graph method to a specific domain.
The paper tackles the problem of reducing training data size for CNNs without degrading accuracy by using a graph-based preselection method, achieving no accuracy loss while cutting the dataset size in handwritten character recognition tasks.
In this paper, we present a study on sample preselection in large training data set for CNN-based classification. To do so, we structure the input data set in a network representation, namely the Relative Neighbourhood Graph, and then extract some vectors of interest. The proposed preselection method is evaluated in the context of handwritten character recognition, by using two data sets, up to several hundred thousands of images. It is shown that the graph-based preselection can reduce the training data set without degrading the recognition accuracy of a non pretrained CNN shallow model.