Cumulative Consensus Score: Label-Free and Model-Agnostic Evaluation of Object Detectors in Deployment
This provides a practical solution for DevOps-style monitoring of object detectors in real-world settings where annotations are unavailable.
The paper tackles the problem of evaluating object detection models in deployment without ground-truth annotations by introducing the Cumulative Consensus Score (CCS), a label-free metric that measures spatial consistency through test-time data augmentation. In experiments on Open Images and KITTI, CCS achieved over 90% congruence with established metrics like F1-score.
Evaluating object detection models in deployment is challenging because ground-truth annotations are rarely available. We introduce the Cumulative Consensus Score (CCS), a label-free metric that enables continuous monitoring and comparison of detectors in real-world settings. CCS applies test-time data augmentation to each image, collects predicted bounding boxes across augmented views, and computes overlaps using Intersection over Union. Maximum overlaps are normalized and averaged across augmentation pairs, yielding a measure of spatial consistency that serves as a proxy for reliability without annotations. In controlled experiments on Open Images and KITTI, CCS achieved over 90% congruence with F1-score, Probabilistic Detection Quality, and Optimal Correction Cost. The method is model-agnostic, working across single-stage and two-stage detectors, and operates at the case level to highlight under-performing scenarios. Altogether, CCS provides a robust foundation for DevOps-style monitoring of object detectors.