CV LGJan 8, 2020

The Effect of Data Ordering in Image Classification

arXiv:2001.05857v12.31 citations

Originality Synthesis-oriented

AI Analysis

This addresses a subtle but potentially important factor for machine learning practitioners seeking to optimize model performance, though it appears incremental as it focuses on a specific aspect of training data.

The paper investigates how data ordering affects image classification performance on ImageNet, finding that certain orderings significantly improve accuracy regardless of model architecture, learning rate, or batch size, with results measured using NDCG, accuracy @1, and accuracy @5.

The success stories from deep learning models increase every day spanning different tasks from image classification to natural language understanding. With the increasing popularity of these models, scientists spend more and more time finding the optimal parameters and best model architectures for their tasks. In this paper, we focus on the ingredient that feeds these machines: the data. We hypothesize that the data ordering affects how well a model performs. To that end, we conduct experiments on an image classification task using ImageNet dataset and show that some data orderings are better than others in terms of obtaining higher classification accuracies. Experimental results show that independent of model architecture, learning rate and batch size, ordering of the data significantly affects the outcome. We show these findings using different metrics: NDCG, accuracy @ 1 and accuracy @ 5. Our goal here is to show that not only parameters and model architectures but also the data ordering has a say in obtaining better results.

View on arXiv PDF

Similar