CVSep 16, 2025

Cumulative Consensus Score: Label-Free and Model-Agnostic Evaluation of Object Detectors in Deployment

Avinaash Manoharan, Xiangyu Yin, Domenik Helm, Chih-Hong Cheng

arXiv:2509.12871v13.6

Originality Incremental advance

AI Analysis

This provides a practical solution for DevOps-style monitoring of object detectors in real-world settings where annotations are unavailable.

The paper tackles the problem of evaluating object detection models in deployment without ground-truth annotations by introducing the Cumulative Consensus Score (CCS), a label-free metric that measures spatial consistency through test-time data augmentation. In experiments on Open Images and KITTI, CCS achieved over 90% congruence with established metrics like F1-score.

Evaluating object detection models in deployment is challenging because ground-truth annotations are rarely available. We introduce the Cumulative Consensus Score (CCS), a label-free metric that enables continuous monitoring and comparison of detectors in real-world settings. CCS applies test-time data augmentation to each image, collects predicted bounding boxes across augmented views, and computes overlaps using Intersection over Union. Maximum overlaps are normalized and averaged across augmentation pairs, yielding a measure of spatial consistency that serves as a proxy for reliability without annotations. In controlled experiments on Open Images and KITTI, CCS achieved over 90% congruence with F1-score, Probabilistic Detection Quality, and Optimal Correction Cost. The method is model-agnostic, working across single-stage and two-stage detectors, and operates at the case level to highlight under-performing scenarios. Altogether, CCS provides a robust foundation for DevOps-style monitoring of object detectors.

View on arXiv PDF

Similar