CVMay 14, 2024

Ambiguous Annotations: When is a Pedestrian not a Pedestrian?

arXiv:2405.08794v16 citationsh-index: 17
Originality Incremental advance
AI Analysis

This addresses data quality issues for researchers and practitioners in autonomous driving, but it is incremental as it builds on existing work on label quality.

The paper tackles the problem of ambiguous annotations in autonomous driving datasets by showing that excluding highly ambiguous data improves a state-of-the-art pedestrian detector's performance, with gains in LAMR, precision, and F1 score, while saving training time and annotation costs.

Datasets labelled by human annotators are widely used in the training and testing of machine learning models. In recent years, researchers are increasingly paying attention to label quality. However, it is not always possible to objectively determine whether an assigned label is correct or not. The present work investigates this ambiguity in the annotation of autonomous driving datasets as an important dimension of data quality. Our experiments show that excluding highly ambiguous data from the training improves model performance of a state-of-the-art pedestrian detector in terms of LAMR, precision and F1 score, thereby saving training time and annotation costs. Furthermore, we demonstrate that, in order to safely remove ambiguous instances and ensure the retained representativeness of the training data, an understanding of the properties of the dataset and class under investigation is crucial.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes