CVLGMLMay 22, 2020

From ImageNet to Image Classification: Contextualizing Progress on Benchmarks

arXiv:2005.11295v1148 citationsHas Code
Originality Synthesis-oriented
AI Analysis

This work highlights dataset biases affecting model evaluation, which is crucial for researchers and practitioners in machine learning, though it is incremental in analyzing existing issues.

The study investigated how crowd-sourced data collection in ImageNet introduces biases that state-of-the-art models exploit, leading to misalignment between the benchmark and real-world tasks, and released refined annotations to aid further research.

Building rich machine learning datasets in a scalable manner often necessitates a crowd-sourced data collection pipeline. In this work, we use human studies to investigate the consequences of employing such a pipeline, focusing on the popular ImageNet dataset. We study how specific design choices in the ImageNet creation process impact the fidelity of the resulting dataset---including the introduction of biases that state-of-the-art models exploit. Our analysis pinpoints how a noisy data collection pipeline can lead to a systematic misalignment between the resulting benchmark and the real-world task it serves as a proxy for. Finally, our findings emphasize the need to augment our current model training and evaluation toolkit to take such misalignments into account. To facilitate further research, we release our refined ImageNet annotations at https://github.com/MadryLab/ImageNetMultiLabel.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes