IVCVLGFeb 10, 2022

Towards a Guideline for Evaluation Metrics in Medical Image Segmentation

arXiv:2202.05273v1547 citations
Originality Synthesis-oriented
AI Analysis

This work tackles the problem of inconsistent and biased evaluation in medical image segmentation research, which is incremental as it synthesizes existing metrics into a standardized framework.

The paper addresses the lack of reliable performance assessment in medical image segmentation by providing an overview and interpretation guide for key metrics, proposing a guideline to improve evaluation quality, reproducibility, and comparability.

In the last decade, research on artificial intelligence has seen rapid growth with deep learning models, especially in the field of medical image segmentation. Various studies demonstrated that these models have powerful prediction capabilities and achieved similar results as clinicians. However, recent studies revealed that the evaluation in image segmentation studies lacks reliable model performance assessment and showed statistical bias by incorrect metric implementation or usage. Thus, this work provides an overview and interpretation guide on the following metrics for medical image segmentation evaluation in binary as well as multi-class problems: Dice similarity coefficient, Jaccard, Sensitivity, Specificity, Rand index, ROC curves, Cohen's Kappa, and Hausdorff distance. As a summary, we propose a guideline for standardized medical image segmentation evaluation to improve evaluation quality, reproducibility, and comparability in the research field.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes