CVMay 23

Calibrating Probabilistic Object Detectors with Annotator Disagreement

arXiv:2605.2472239.0
Predicted impact top 79% in CV · last 90 daysOriginality Incremental advance
AI Analysis

This work provides a practical solution for object detection in ambiguous settings (e.g., medical imaging) where ground truth is unavailable, enabling calibrated uncertainty estimates.

The paper addresses the challenge of learning object detectors without ground truth annotations due to annotator disagreement, and proposes a framework to calibrate probabilistic detectors to align with annotator distributions. The framework achieves superior calibration performance on medical and natural image datasets with YOLO and two-stage detectors.

High degrees of disagreement among annotators can exist for ambiguous objects, e.g. in medical images, underscoring the challenges of establishing ground truth annotations in object detection tasks. Despite this, all existing object detectors implicitly require access to ground truth annotations for either training or evaluation. The fundamental questions we target are: How can we learn an object detector with multiple annotators' annotations but without objective ground truth annotations due to object ambiguity, and how can we enable the learned detector to express meaningful model predictive uncertainties in detecting ambiguous objects? To answer these questions, we present an interpretable approach to calibrate probabilistic object detectors, where the calibration goal is to align the class confidence and bounding box variance estimates to the annotators' annotation distribution. We introduce an efficient yet effective framework to calibrate probabilistic object detectors by designing four evaluation metrics to measure calibration errors regarding classification and localization, and proposing a train-time calibration and post-hoc calibrator, all without the need to access any ground truth. This framework is generalizable to many existing probabilistic object detectors, such as the YOLO families and two-stage detectors. Empirical results with real-world and synthetic datasets of medical and natural images demonstrate the superior performance of the proposed framework with three popular object detectors.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes