LG CV MLNov 30, 2018

Are All Training Examples Created Equal? An Empirical Study

Kailas Vodrahalli, Ke Li, Jitendra Malik

arXiv:1811.12569v120.365 citations

Originality Incremental advance

AI Analysis

This addresses the problem of reducing training data needs for researchers and practitioners, but it is incremental as it builds on existing active learning and dataset analysis concepts.

The paper investigates whether a small, carefully selected subset of training data is sufficient for training computer vision models, finding that in some cases it works while in others importance differences are negligible.

Modern computer vision algorithms often rely on very large training datasets. However, it is conceivable that a carefully selected subsample of the dataset is sufficient for training. In this paper, we propose a gradient-based importance measure that we use to empirically analyze relative importance of training images in four datasets of varying complexity. We find that in some cases, a small subsample is indeed sufficient for training. For other datasets, however, the relative differences in importance are negligible. These results have important implications for active learning on deep networks. Additionally, our analysis method can be used as a general tool to better understand diversity of training examples in datasets.

View on arXiv PDF

Similar