RankSEG: A Consistent Ranking-based Framework for Segmentation
This work provides a foundational theoretical improvement for segmentation tasks in computer vision and NLP, addressing a core inconsistency that affects all methods using Dice/IoU metrics.
The paper addresses the theoretical inconsistency of existing thresholding-based segmentation methods with Dice/IoU metrics, proving they lead to suboptimal solutions, and proposes a novel ranking-based framework (RankDice/RankIoU) that is Dice-/IoU-calibrated, achieving state-of-the-art results on datasets like CityScapes, Pascal VOC, and Kvasir-SEG.
Segmentation has emerged as a fundamental field of computer vision and natural language processing, which assigns a label to every pixel/feature to extract regions of interest from an image/text. To evaluate the performance of segmentation, the Dice and IoU metrics are used to measure the degree of overlap between the ground truth and the predicted segmentation. In this paper, we establish a theoretical foundation of segmentation with respect to the Dice/IoU metrics, including the Bayes rule and Dice-/IoU-calibration, analogous to classification-calibration or Fisher consistency in classification. We prove that the existing thresholding-based framework with most operating losses are not consistent with respect to the Dice/IoU metrics, and thus may lead to a suboptimal solution. To address this pitfall, we propose a novel consistent ranking-based framework, namely RankDice/RankIoU, inspired by plug-in rules of the Bayes segmentation rule. Three numerical algorithms with GPU parallel execution are developed to implement the proposed framework in large-scale and high-dimensional segmentation. We study statistical properties of the proposed framework. We show it is Dice-/IoU-calibrated, and its excess risk bounds and the rate of convergence are also provided. The numerical effectiveness of RankDice/mRankDice is demonstrated in various simulated examples and Fine-annotated CityScapes, Pascal VOC and Kvasir-SEG datasets with state-of-the-art deep learning architectures.