Automatic quality control framework for more reliable integration of machine learning-based image segmentation into medical workflows
This work addresses the challenge of integrating AI into clinical workflows by improving segmentation reliability, though it is incremental as it builds on existing quality control approaches.
The paper tackled the problem of unreliable machine learning-based medical image segmentation by analyzing automatic quality control methods to detect failures, showing that uncertainty and Dice prediction aggregation improved mean Dice from 0.82 to 0.84 in white matter hyperintensity segmentation.
Machine learning algorithms underpin modern diagnostic-aiding software, which has proved valuable in clinical practice, particularly in radiology. However, inaccuracies, mainly due to the limited availability of clinical samples for training these algorithms, hamper their wider applicability, acceptance, and recognition amongst clinicians. We present an analysis of state-of-the-art automatic quality control (QC) approaches that can be implemented within these algorithms to estimate the certainty of their outputs. We validated the most promising approaches on a brain image segmentation task identifying white matter hyperintensities (WMH) in magnetic resonance imaging data. WMH are a correlate of small vessel disease common in mid-to-late adulthood and are particularly challenging to segment due to their varied size, and distributional patterns. Our results show that the aggregation of uncertainty and Dice prediction were most effective in failure detection for this task. Both methods independently improved mean Dice from 0.82 to 0.84. Our work reveals how QC methods can help to detect failed segmentation cases and therefore make automatic segmentation more reliable and suitable for clinical practice.