CVAug 10, 2020

Measures of Complexity for Large Scale Image Datasets

Ameet Annasaheb Rahane, Anbumani Subramanian

arXiv:2008.04431v131 citations

AI Analysis

This work addresses the need for researchers and practitioners in machine learning to better understand dataset complexity, though it is incremental as it applies existing entropy metrics to new datasets.

The authors tackled the problem of quantitatively comparing the complexity of large-scale image datasets for deep learning by developing simple computational methods and visualizations, resulting in a rank-order complexity for four autonomous driving datasets using entropy-based metrics.

Large scale image datasets are a growing trend in the field of machine learning. However, it is hard to quantitatively understand or specify how various datasets compare to each other - i.e., if one dataset is more complex or harder to ``learn'' with respect to a deep-learning based network. In this work, we build a series of relatively computationally simple methods to measure the complexity of a dataset. Furthermore, we present an approach to demonstrate visualizations of high dimensional data, in order to assist with visual comparison of datasets. We present our analysis using four datasets from the autonomous driving research community - Cityscapes, IDD, BDD and Vistas. Using entropy based metrics, we present a rank-order complexity of these datasets, which we compare with an established rank-order with respect to deep learning.

View on arXiv PDF

Similar