CVMar 22, 2024
Anytime, Anywhere, Anyone: Investigating the Feasibility of Segment Anything Model for Crowd-Sourcing Medical Image AnnotationsPranav Kulkarni, Adway Kanhere, Dharmam Savani et al.
Curating annotations for medical image segmentation is a labor-intensive and time-consuming task that requires domain expertise, resulting in "narrowly" focused deep learning (DL) models with limited translational utility. Recently, foundation models like the Segment Anything Model (SAM) have revolutionized semantic segmentation with exceptional zero-shot generalizability across various domains, including medical imaging, and hold a lot of promise for streamlining the annotation process. However, SAM has yet to be evaluated in a crowd-sourced setting to curate annotations for training 3D DL segmentation models. In this work, we explore the potential of SAM for crowd-sourcing "sparse" annotations from non-experts to generate "dense" segmentation masks for training 3D nnU-Net models, a state-of-the-art DL segmentation model. Our results indicate that while SAM-generated annotations exhibit high mean Dice scores compared to ground-truth annotations, nnU-Net models trained on SAM-generated annotations perform significantly worse than nnU-Net models trained on ground-truth annotations ($p<0.001$, all).
LGFeb 8, 2024
Hidden in Plain Sight: Undetectable Adversarial Bias Attacks on Vulnerable Patient PopulationsPranav Kulkarni, Andrew Chan, Nithya Navarathna et al.
The proliferation of artificial intelligence (AI) in radiology has shed light on the risk of deep learning (DL) models exacerbating clinical biases towards vulnerable patient populations. While prior literature has focused on quantifying biases exhibited by trained DL models, demographically targeted adversarial bias attacks on DL models and its implication in the clinical environment remains an underexplored field of research in medical imaging. In this work, we demonstrate that demographically targeted label poisoning attacks can introduce undetectable underdiagnosis bias in DL models. Our results across multiple performance metrics and demographic groups like sex, age, and their intersectional subgroups show that adversarial bias attacks demonstrate high-selectivity for bias in the targeted group by degrading group model performance without impacting overall model performance. Furthermore, our results indicate that adversarial bias attacks result in biased DL models that propagate prediction bias even when evaluated with external datasets.
CVApr 5, 2019
Automatic detection of lesion load change in Multiple Sclerosis using convolutional neural networks with segmentation confidenceRichard McKinley, Lorenz Grunder, Rik Wepfer et al.
The detection of new or enlarged white-matter lesions in multiple sclerosis is a vital task in the monitoring of patients undergoing disease-modifying treatment for multiple sclerosis. However, the definition of 'new or enlarged' is not fixed, and it is known that lesion-counting is highly subjective, with high degree of inter- and intra-rater variability. Automated methods for lesion quantification hold the potential to make the detection of new and enlarged lesions consistent and repeatable. However, the majority of lesion segmentation algorithms are not evaluated for their ability to separate progressive from stable patients, despite this being a pressing clinical use-case. In this paper we show that change in volumetric measurements of lesion load alone is not a good method for performing this separation, even for highly performing segmentation methods. Instead, we propose a method for identifying lesion changes of high certainty, and establish on a dataset of longitudinal multiple sclerosis cases that this method is able to separate progressive from stable timepoints with a very high level of discrimination (AUC = 0.99), while changes in lesion volume are much less able to perform this separation (AUC = 0.71). Validation of the method on a second external dataset confirms that the method is able to generalize beyond the setting in which it was trained, achieving an accuracy of 83% in separating stable and progressive timepoints. Both lesion volume and count have previously been shown to be strong predictors of disease course across a population. However, we demonstrate that for individual patients, changes in these measures are not an adequate means of establishing no evidence of disease activity. Meanwhile, directly detecting tissue which changes, with high confidence, from non-lesion to lesion is a feasible methodology for identifying radiologically active patients.
CVJan 22, 2019
Simultaneous lesion and neuroanatomy segmentation in Multiple Sclerosis using deep neural networksRichard McKinley, Rik Wepfer, Fabian Aschwanden et al.
Segmentation of white matter lesions and deep grey matter structures is an important task in the quantification of magnetic resonance imaging in multiple sclerosis. In this paper we explore segmentation solutions based on convolutional neural networks (CNNs) for providing fast, reliable segmentations of lesions and grey-matter structures in multi-modal MR imaging, and the performance of these methods when applied to out-of-centre data. We trained two state-of-the-art fully convolutional CNN architectures on the 2016 MSSEG training dataset, which was annotated by seven independent human raters: a reference implementation of a 3D Unet, and a more recently proposed 3D-to-2D architecture (DeepSCAN). We then retrained those methods on a larger dataset from a single centre, with and without labels for other brain structures. We quantified changes in performance owing to dataset shift, and changes in performance by adding the additional brain-structure labels. We also compared performance with freely available reference methods. Both fully-convolutional CNN methods substantially outperform other approaches in the literature when trained and evaluated in cross-validation on the MSSEG dataset, showing agreement with human raters in the range of human inter-rater variability. Both architectures showed drops in performance when trained on single-centre data and tested on the MSSEG dataset. When trained with the addition of weak anatomical labels derived from Freesurfer, the performance of the 3D Unet degraded, while the performance of the DeepSCAN net improved. Overall, the DeepSCAN network predicting both lesion and anatomical labels was the best-performing network examined.