CVNov 5, 2023

Benchmarking a Benchmark: How Reliable is MS-COCO?

Eric Zimmermann, Justin Szeto, Jerome Pasquero, Frederic Ratle

arXiv:2311.02709v16.87 citationsh-index: 6

Originality Synthesis-oriented

AI Analysis

This work addresses the reliability of widely used benchmarks for researchers and practitioners in computer vision, though it is incremental as it focuses on re-annotation rather than a new method.

The paper tackled the problem of potential biases in benchmark datasets by re-annotating MS-COCO to create Sama-COCO and using a shape analysis pipeline to discover biases, with results showing that annotation styles significantly impact model evaluation and that pipelines should align with the task of interest.

Benchmark datasets are used to profile and compare algorithms across a variety of tasks, ranging from image classification to segmentation, and also play a large role in image pretraining algorithms. Emphasis is placed on results with little regard to the actual content within the dataset. It is important to question what kind of information is being learned from these datasets and what are the nuances and biases within them. In the following work, Sama-COCO, a re-annotation of MS-COCO, is used to discover potential biases by leveraging a shape analysis pipeline. A model is trained and evaluated on both datasets to examine the impact of different annotation conditions. Results demonstrate that annotation styles are important and that annotation pipelines should closely consider the task of interest. The dataset is made publicly available at https://www.sama.com/sama-coco-dataset/ .

View on arXiv PDF

Similar