CVOct 21, 2024

Label Filling via Mixed Supervision for Medical Image Segmentation from Noisy Annotations

arXiv:2410.16057v13.71 citationsh-index: 11

Originality Incremental advance

AI Analysis

This work addresses the challenge of high inter-rater variability in medical image labeling, which is crucial for improving segmentation accuracy in clinical applications, though it is incremental as it builds on existing noisy annotation methods.

The paper tackles the problem of medical image segmentation with noisy annotations by proposing a Label Filling framework (LF-Net) that uses trustworthy pixels and mixed supervision, achieving up to a 7% improvement in Dice Similarity Coefficient (DSC) for MS lesion segmentation across five datasets.

The success of medical image segmentation usually requires a large number of high-quality labels. But since the labeling process is usually affected by the raters' varying skill levels and characteristics, the estimated masks provided by different raters usually suffer from high inter-rater variability. In this paper, we propose a simple yet effective Label Filling framework, termed as LF-Net, predicting the groundtruth segmentation label given only noisy annotations during training. The fundamental idea of label filling is to supervise the segmentation model by a subset of pixels with trustworthy labels, meanwhile filling labels of other pixels by mixed supervision. More concretely, we propose a qualified majority voting strategy, i.e., a threshold voting scheme is designed to model agreement among raters and the majority-voted labels of the selected subset of pixels are regarded as supervision. To fill labels of other pixels, two types of mixed auxiliary supervision are proposed: a soft label learned from intrinsic structures of noisy annotations, and raters' characteristics labels which propagate individual rater's characteristics information. LF-Net has two main advantages. 1) Training with trustworthy pixels incorporates training with confident supervision, guiding the direction of groundtruth label learning. 2) Two types of mixed supervision prevent over-fitting issues when the network is supervised by a subset of pixels, and guarantee high fidelity with the true label. Results on five datasets of diverse imaging modalities show that our LF-Net boosts segmentation accuracy in all datasets compared with state-of-the-art methods, with even a 7% improvement in DSC for MS lesion segmentation.

View on arXiv PDF

Similar