SortedAP: Rethinking evaluation metrics for instance segmentation
This work addresses the need for more reliable evaluation metrics in instance segmentation, which is incremental as it builds on prior metric designs.
The paper tackles the problem of limited resolution and misleading sensitivity in existing instance segmentation metrics by proposing sortedAP, a new metric that strictly decreases with imperfections and provides uninterrupted penalization, with evaluation toolkit and code provided.
Designing metrics for evaluating instance segmentation revolves around comprehensively considering object detection and segmentation accuracy. However, other important properties, such as sensitivity, continuity, and equality, are overlooked in the current study. In this paper, we reveal that most existing metrics have a limited resolution of segmentation quality. They are only conditionally sensitive to the change of masks or false predictions. For certain metrics, the score can change drastically in a narrow range which could provide a misleading indication of the quality gap between results. Therefore, we propose a new metric called sortedAP, which strictly decreases with both object- and pixel-level imperfections and has an uninterrupted penalization scale over the entire domain. We provide the evaluation toolkit and experiment code at https://www.github.com/looooongChen/sortedAP.