CVQMAPOct 26, 2022

How precise are performance estimates for typical medical image segmentation tasks?

arXiv:2210.14677v37 citationsh-index: 54
Originality Synthesis-oriented
AI Analysis

This addresses the problem of unreliable performance reporting for researchers in medical image processing, though it is incremental as it applies existing statistical methods to this domain.

The paper investigates the precision of performance estimates in medical image segmentation, finding that small test sets result in wide confidence intervals, such as approximately 8 points of Dice for 20 samples with a standard deviation of 10.

An important issue in medical image processing is to be able to estimate not only the performances of algorithms but also the precision of the estimation of these performances. Reporting precision typically amounts to reporting standard-error of the mean (SEM) or equivalently confidence intervals. However, this is rarely done in medical image segmentation studies. In this paper, we aim to estimate what is the typical confidence that can be expected in such studies. To that end, we first perform experiments for Dice metric estimation using a standard deep learning model (U-net) and a classical task from the Medical Segmentation Decathlon. We extensively study precision estimation using both Gaussian assumption and bootstrapping (which does not require any assumption on the distribution). We then perform simulations for other test set sizes and performance spreads. Overall, our work shows that small test sets lead to wide confidence intervals (e.g. $\sim$8 points of Dice for 20 samples with $σ\simeq 10$).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes