Meike W. Vernooij

h-index100

17papers

352citations

Novelty55%

AI Score39

Ranked #76,970 of 194,257 authors (top 40%)#26,075 in CV (top 44%)

17 Papers

5.3IVJun 18, 2023

Prior-knowledge-informed deep learning for lacune detection and quantification using multi-site brain MRI

Bo Li, Jeroen de Bresser, Wiro Niessen et al.

Lacunes of presumed vascular origin, also referred to as lacunar infarcts, are important to assess cerebral small vessel disease and cognitive diseases such as dementia. However, visual rating of lacunes from imaging data is challenging, time-consuming, and rater-dependent, owing to their small size, sparsity, and mimics. Whereas recent developments in automatic algorithms have shown to make the detection of lacunes faster while preserving sensitivity, they also showed a large number of false positives, which makes them impractical for use in clinical practice or large-scale studies. Here, we develop a novel framework that, in addition to lacune detection, outputs a categorical burden score. This score could provide a more practical estimate of lacune presence that simplifies and effectively accelerates the imaging assessment of lacunes. We hypothesize that the combination of detection and the categorical score makes the procedure less sensitive to noisy labels.

2.8CVOct 3, 2023

AI-based association analysis for medical imaging using latent-space geometric confounder correction

Xianjing Liu, Bo Li, Meike W. Vernooij et al.

This study addresses the challenges of confounding effects and interpretability in artificial-intelligence-based medical image analysis. Whereas existing literature often resolves confounding by removing confounder-related information from latent representations, this strategy risks affecting image reconstruction quality in generative models, thus limiting their applicability in feature visualization. To tackle this, we propose a different strategy that retains confounder-related information in latent representations while finding an alternative confounder-free representation of the image data. Our approach views the latent space of an autoencoder as a vector space, where imaging-related variables, such as the learning target (t) and confounder (c), have a vector capturing their variability. The confounding problem is addressed by searching a confounder-free vector which is orthogonal to the confounder-related vector but maximally collinear to the target-related vector. To achieve this, we introduce a novel correlation-based loss that not only performs vector searching in the latent space, but also encourages the encoder to generate latent representations linearly correlated with the variables. Subsequently, we interpret the confounder-free representation by sampling and reconstructing images along the confounder-free vector. The efficacy and flexibility of our proposed method are demonstrated across three applications, accommodating multiple confounders and utilizing diverse image modalities. Results affirm the method's effectiveness in reducing confounder influences, preventing wrong or misleading associations, and offering a unique visual interpretation for in-depth investigations by clinical and epidemiological researchers. The code is released in the following GitLab repository: https://gitlab.com/radiology/compopbio/ai_based_association_analysis}

2.7CLNov 3, 2025

Evaluating Open-Weight Large Language Models for Structured Data Extraction from Narrative Medical Reports Across Multiple Use Cases and Languages

Douwe J. Spaanderman, Karthik Prathaban, Petr Zelina et al.

Large language models (LLMs) are increasingly used to extract structured information from free-text clinical records, but prior work often focuses on single tasks, limited models, and English-language reports. We evaluated 15 open-weight LLMs on pathology and radiology reports across six use cases, colorectal liver metastases, liver tumours, neurodegenerative diseases, soft-tissue tumours, melanomas, and sarcomas, at three institutes in the Netherlands, UK, and Czech Republic. Models included general-purpose and medical-specialised LLMs of various sizes, and six prompting strategies were compared: zero-shot, one-shot, few-shot, chain-of-thought, self-consistency, and prompt graph. Performance was assessed using task-appropriate metrics, with consensus rank aggregation and linear mixed-effects models quantifying variance. Top-ranked models achieved macro-average scores close to inter-rater agreement across tasks. Small-to-medium general-purpose models performed comparably to large models, while tiny and specialised models performed worse. Prompt graph and few-shot prompting improved performance by ~13%. Task-specific factors, including variable complexity and annotation variability, influenced results more than model size or prompting strategy. These findings show that open-weight LLMs can extract structured data from clinical reports across diseases, languages, and institutions, offering a scalable approach for clinical data curation.

4.4IVJul 20, 2021

Automated Segmentation and Volume Measurement of Intracranial Carotid Artery Calcification on Non-Contrast CT

Gerda Bortsova, Daniel Bos, Florian Dubost et al.

Purpose: To evaluate a fully-automated deep-learning-based method for assessment of intracranial carotid artery calcification (ICAC). Methods: Two observers manually delineated ICAC in non-contrast CT scans of 2,319 participants (mean age 69 (SD 7) years; 1154 women) of the Rotterdam Study, prospectively collected between 2003 and 2006. These data were used to retrospectively develop and validate a deep-learning-based method for automated ICAC delineation and volume measurement. To evaluate the method, we compared manual and automatic assessment (computed using ten-fold cross-validation) with respect to 1) the agreement with an independent observer's assessment (available in a random subset of 47 scans); 2) the accuracy in delineating ICAC as judged via blinded visual comparison by an expert; 3) the association with first stroke incidence from the scan date until 2012. All method performance metrics were computed using 10-fold cross-validation. Results: The automated delineation of ICAC reached sensitivity of 83.8% and positive predictive value (PPV) of 88%. The intraclass correlation between automatic and manual ICAC volume measures was 0.98 (95% CI: 0.97, 0.98; computed in the entire dataset). Measured between the assessments of independent observers, sensitivity was 73.9%, PPV was 89.5%, and intraclass correlation was 0.91 (95% CI: 0.84, 0.95; computed in the 47-scan subset). In the blinded visual comparisons, automatic delineations were more accurate than manual ones (p-value = 0.01). The association of ICAC volume with incident stroke was similarly strong for both automated (hazard ratio, 1.38 (95% CI: 1.12, 1.75) and manually measured volumes (hazard ratio, 1.48 (95% CI: 1.20, 1.87)). Conclusions: The developed model was capable of automated segmentation and volume quantification of ICAC with accuracy comparable to human experts.

5.0CVDec 28, 2020

Longitudinal diffusion MRI analysis using Segis-Net: a single-step deep-learning framework for simultaneous segmentation and registration

Bo Li, Wiro J. Niessen, Stefan Klein et al.

This work presents a single-step deep-learning framework for longitudinal image analysis, coined Segis-Net. To optimally exploit information available in longitudinal data, this method concurrently learns a multi-class segmentation and nonlinear registration. Segmentation and registration are modeled using a convolutional neural network and optimized simultaneously for their mutual benefit. An objective function that optimizes spatial correspondence for the segmented structures across time-points is proposed. We applied Segis-Net to the analysis of white matter tracts from N=8045 longitudinal brain MRI datasets of 3249 elderly individuals. Segis-Net approach showed a significant increase in registration accuracy, spatio-temporal segmentation consistency, and reproducibility comparing with two multistage pipelines. This also led to a significant reduction in the sample-size that would be required to achieve the same statistical power in analyzing tract-specific measures. Thus, we expect that Segis-Net can serve as a new reliable tool to support longitudinal imaging studies to investigate macro- and microstructural brain changes over time.

7.2CVNov 3, 2020

Learning unbiased group-wise registration (LUGR) and joint segmentation: evaluation on longitudinal diffusion MRI

Bo Li, Wiro J. Niessen, Stefan Klein et al.

Analysis of longitudinal changes in imaging studies often involves both segmentation of structures of interest and registration of multiple timeframes. The accuracy of such analysis could benefit from a tailored framework that jointly optimizes both tasks to fully exploit the information available in the longitudinal data. Most learning-based registration algorithms, including joint optimization approaches, currently suffer from bias due to selection of a fixed reference frame and only support pairwise transformations. We here propose an analytical framework based on an unbiased learning strategy for group-wise registration that simultaneously registers images to the mean space of a group to obtain consistent segmentations. We evaluate the proposed method on longitudinal analysis of a white matter tract in a brain MRI dataset with 2-3 time-points for 3249 individuals, i.e., 8045 images in total. The reproducibility of the method is evaluated on test-retest data from 97 individuals. The results confirm that the implicit reference image is an average of the input image. In addition, the proposed framework leads to consistent segmentations and significantly lower processing bias than that of a pair-wise fixed-reference approach. This processing bias is even smaller than those obtained when translating segmentations by only one voxel, which can be attributed to subtle numerical instabilities and interpolation. Therefore, we postulate that the proposed mean-space learning strategy could be widely applied to learning-based registration tasks. In addition, this group-wise framework introduces a novel way for learning-based longitudinal studies by direct construction of an unbiased within-subject template and allowing reliable and efficient analysis of spatio-temporal imaging biomarkers.

10.6IVMay 26, 2020

Neuro4Neuro: A neural network approach for neural tract segmentation using large-scale population-based diffusion imaging

Bo Li, Marius de Groot, Rebecca M. E. Steketee et al.

Subtle changes in white matter (WM) microstructure have been associated with normal aging and neurodegeneration. To study these associations in more detail, it is highly important that the WM tracts can be accurately and reproducibly characterized from brain diffusion MRI. In addition, to enable analysis of WM tracts in large datasets and in clinical practice it is essential to have methodology that is fast and easy to apply. This work therefore presents a new approach for WM tract segmentation: Neuro4Neuro, that is capable of direct extraction of WM tracts from diffusion tensor images using convolutional neural network (CNN). This 3D end-to-end method is trained to segment 25 WM tracts in aging individuals from a large population-based study (N=9752, 1.5T MRI). The proposed method showed good segmentation performance and high reproducibility, i.e., a high spatial agreement (Cohen's kappa, k = 0.72 ~ 0.83) and a low scan-rescan error in tract-specific diffusion measures (e.g., fractional anisotropy: error = 1% ~ 5%). The reproducibility of the proposed method was higher than that of a tractography-based segmentation algorithm, while being orders of magnitude faster (0.5s to segment one tract). In addition, we showed that the method successfully generalizes to diffusion scans from an external dementia dataset (N=58, 3T MRI). In two proof-of-principle experiments, we associated WM microstructure obtained using the proposed method with age in a normal elderly population, and with disease subtypes in a dementia cohort. In concordance with the literature, results showed a widespread reduction of microstructural organization with aging and substantial group-wise microstructure differences between dementia subtypes. In conclusion, we presented a highly reproducible and fast method for WM tract segmentation that has the potential of being used in large-scale studies and clinical practice.

2.0IVApr 12, 2020

When Weak Becomes Strong: Robust Quantification of White Matter Hyperintensities in Brain MRI scans

Oliver Werner, Kimberlin M. H. van Wijnen, Wiro J. Niessen et al.

To measure the volume of specific image structures, a typical approach is to first segment those structures using a neural network trained on voxel-wise (strong) labels and subsequently compute the volume from the segmentation. A more straightforward approach would be to predict the volume directly using a neural network based regression approach, trained on image-level (weak) labels indicating volume. In this article, we compared networks optimized with weak and strong labels, and study their ability to generalize to other datasets. We experimented with white matter hyperintensity (WMH) volume prediction in brain MRI scans. Neural networks were trained on a large local dataset and their performance was evaluated on four independent public datasets. We showed that networks optimized using only weak labels reflecting WMH volume generalized better for WMH volume prediction than networks optimized with voxel-wise segmentations of WMH. The attention maps of networks trained with weak labels did not seem to delineate WMHs, but highlighted instead areas with smooth contours around or near WMHs. By correcting for possible confounders we showed that networks trained on weak labels may have learnt other meaningful features that are more suited to generalization to unseen data. Our results suggest that for imaging biomarkers that can be derived from segmentations, training networks to predict the biomarker directly may provide more robust results than solving an intermediate segmentation step.

8.5IVAug 26, 2019

Reproducible White Matter Tract Segmentation Using 3D U-Net on a Large-scale DTI Dataset

Bo Li, Marius de Groot, Meike Vernooij et al.

Tract-specific diffusion measures, as derived from brain diffusion MRI, have been linked to white matter tract structural integrity and neurodegeneration. As a consequence, there is a large interest in the automatic segmentation of white matter tract in diffusion tensor MRI data. Methods based on the tractography are popular for white matter tract segmentation. However, because of the limited consistency and long processing time, such methods may not be suitable for clinical practice. We therefore developed a novel convolutional neural network based method to directly segment white matter tract trained on a low-resolution dataset of 9149 DTI images. The method is optimized on input, loss function and network architecture selections. We evaluated both segmentation accuracy and reproducibility, and reproducibility of determining tract-specific diffusion measures. The reproducibility of the method is higher than that of the reference standard and the determined diffusion measures are consistent. Therefore, we expect our method to be applicable in clinical practice and in longitudinal analysis of white matter microstructure.

14.5IVAug 26, 2019

A hybrid deep learning framework for integrated segmentation and registration: evaluation on longitudinal white matter tract changes

Bo Li, Wiro Niessen, Stefan Klein et al.

To accurately analyze changes of anatomical structures in longitudinal imaging studies, consistent segmentation across multiple time-points is required. Existing solutions often involve independent registration and segmentation components. Registration between time-points is used either as a prior for segmentation in a subsequent time point or to perform segmentation in a common space. In this work, we propose a novel hybrid convolutional neural network (CNN) that integrates segmentation and registration into a single procedure. We hypothesize that the joint optimization leads to increased performance on both tasks. The hybrid CNN is trained by minimizing an integrated loss function composed of four different terms, measuring segmentation accuracy, similarity between registered images, deformation field smoothness, and segmentation consistency. We applied this method to the segmentation of white matter tracts, describing functionally grouped axonal fibers, using N=8045 longitudinal brain MRI data of 3249 individuals. The proposed method was compared with two multistage pipelines using two existing segmentation methods combined with a conventional deformable registration algorithm. In addition, we assessed the added value of the joint optimization for segmentation and registration separately. The hybrid CNN yielded significantly higher accuracy, consistency and reproducibility of segmentation than the multistage pipelines, and was orders of magnitude faster. Therefore, we expect it can serve as a novel tool to support clinical and epidemiological analyses on understanding microstructural brain changes over time.

6.3IVJul 29, 2019

Automated Lesion Detection by Regressing Intensity-Based Distance with a Neural Network

Kimberlin M. H. van Wijnen, Florian Dubost, Pinar Yilmaz et al.

Localization of focal vascular lesions on brain MRI is an important component of research on the etiology of neurological disorders. However, manual annotation of lesions can be challenging, time-consuming and subject to observer bias. Automated detection methods often need voxel-wise annotations for training. We propose a novel approach for automated lesion detection that can be trained on scans only annotated with a dot per lesion instead of a full segmentation. From the dot annotations and their corresponding intensity images we compute various distance maps (DMs), indicating the distance to a lesion based on spatial distance, intensity distance, or both. We train a fully convolutional neural network (FCN) to predict these DMs for unseen intensity images. The local optima in the predicted DMs are expected to correspond to lesion locations. We show the potential of this approach to detect enlarged perivascular spaces in white matter on a large brain MRI dataset with an independent test set of 1000 scans. Our method matches the intra-rater performance of the expert rater that was computed on an independent set. We compare the different types of distance maps, showing that incorporating intensity information in the distance maps used to train an FCN greatly improves performance.

7.5IVJul 1, 2019

Multi-atlas image registration of clinical data with automated quality assessment using ventricle segmentation

Florian Dubost, Marleen de Bruijne, Marco Nardin et al.

Registration is a core component of many imaging pipelines. In case of clinical scans, with lower resolution and sometimes substantial motion artifacts, registration can produce poor results. Visual assessment of registration quality in large clinical datasets is inefficient. In this work, we propose to automatically assess the quality of registration to an atlas in clinical FLAIR MRI scans of the brain. The method consists of automatically segmenting the ventricles of a given scan using a neural network, and comparing the segmentation to the atlas' ventricles propagated to image space. We used the proposed method to improve clinical image registration to a general atlas by computing multiple registrations and then selecting the registration that yielded the highest ventricle overlap. Methods were evaluated in a single-site dataset of more than 1000 scans, as well as a multi-center dataset comprising 142 clinical scans from 12 sites. The automated ventricle segmentation reached a Dice coefficient with manual annotations of 0.89 in the single-site dataset, and 0.83 in the multi-center dataset. Registration via age-specific atlases could improve ventricle overlap compared to a direct registration to the general atlas (Dice similarity coefficient increase up to 0.15). Experiments also showed that selecting scans with the registration quality assessment method could improve the quality of average maps of white matter hyperintensity burden, instead of using all scans for the computation of the white matter hyperintensity map. In this work, we demonstrated the utility of an automated tool for assessing image registration quality in clinical scans. This image quality assessment step could ultimately assist in the translation of automated neuroimaging pipelines to the clinic.

9.4CVJun 5, 2019

Weakly Supervised Object Detection with 2D and 3D Regression Neural Networks

Florian Dubost, Hieab Adams, Pinar Yilmaz et al.

Finding automatically multiple lesions in large images is a common problem in medical image analysis. Solving this problem can be challenging if, during optimization, the automated method cannot access information about the location of the lesions nor is given single examples of the lesions. We propose a new weakly supervised detection method using neural networks, that computes attention maps revealing the locations of brain lesions. These attention maps are computed using the last feature maps of a segmentation network optimized only with global image-level labels. The proposed method can generate attention maps at full input resolution without need for interpolation during preprocessing, which allows small lesions to appear in attention maps. For comparison, we modify state-of-the-art methods to compute attention maps for weakly supervised object detection, by using a global regression objective instead of the more conventional classification objective. This regression objective optimizes the number of occurrences of the target object in an image, e.g. the number of brain lesions in a scan, or the number of digits in an image. We study the behavior of the proposed method in MNIST-based detection datasets, and evaluate it for the challenging detection of enlarged perivascular spaces - a type of brain lesion - in a dataset of 2202 3D scans with point-wise annotations in the center of all lesions in four brain regions. In the brain dataset, the weakly supervised detection methods come close to the human intrarater agreement in each region. The proposed method reaches the best area under the curve in two out of four regions, and has the lowest number of false positive detections in all regions, while its average sensitivity over all regions is similar to that of the other best methods. The proposed method can facilitate epidemiological and clinical studies of enlarged perivascular spaces.

5.2CVJul 12, 2018

Hydranet: Data Augmentation for Regression Neural Networks

Florian Dubost, Gerda Bortsova, Hieab Adams et al.

Deep learning techniques are often criticized to heavily depend on a large quantity of labeled data. This problem is even more challenging in medical image analysis where the annotator expertise is often scarce. We propose a novel data-augmentation method to regularize neural network regressors that learn from a single global label per image. The principle of the method is to create new samples by recombining existing ones. We demonstrate the performance of our algorithm on two tasks: estimation of the number of enlarged perivascular spaces in the basal ganglia, and estimation of white matter hyperintensities volume. We show that the proposed method improves the performance over more basic data augmentation. The proposed method reached an intraclass correlation coefficient between ground truth and network predictions of 0.73 on the first task and 0.84 on the second task, only using between 25 and 30 scans with a single global label per scan for training. With the same number of training scans, more conventional data augmentation methods could only reach intraclass correlation coefficients of 0.68 on the first task, and 0.79 on the second task.

7.8CVFeb 16, 2018

3D Regression Neural Network for the Quantification of Enlarged Perivascular Spaces in Brain MRI

Florian Dubost, Hieab Adams, Gerda Bortsova et al.

Enlarged perivascular spaces (EPVS) in the brain are an emerging imaging marker for cerebral small vessel disease, and have been shown to be related to increased risk of various neurological diseases, including stroke and dementia. Automatic quantification of EPVS would greatly help to advance research into its etiology and its potential as a risk indicator of disease. We propose a convolutional network regression method to quantify the extent of EPVS in the basal ganglia from 3D brain MRI. We first segment the basal ganglia and subsequently apply a 3D convolutional regression network designed for small object detection within this region of interest. The network takes an image as input, and outputs a quantification score of EPVS. The network has significantly more convolution operations than pooling ones and no final activation, allowing it to span the space of real numbers. We validated our approach using a dataset of 2000 brain MRI scans scored visually. Experiments with varying sizes of training and test sets showed that a good performance can be achieved with a training set of only 200 scans. With a training set of 1000 scans, the intraclass correlation coefficient (ICC) between our scoring method and the expert's visual score was 0.74. Our method outperforms by a large margin - more than 0.10 - four more conventional automated approaches based on intensities, scale-invariant feature transform, and random forest. We show that the network learns the structures of interest and investigate the influence of hyper-parameters on the performance. We also evaluate the reproducibility of our network using a set of 60 subjects scanned twice (scan-rescan reproducibility). On this set our network achieves an ICC of 0.93, while the intrarater agreement reaches 0.80. Furthermore, the automatic EPVS scoring correlates similarly to age as visual scoring.

9.7CVMay 22, 2017

GP-Unet: Lesion Detection from Weak Labels with a 3D Regression Network

Florian Dubost, Gerda Bortsova, Hieab Adams et al.

We propose a novel convolutional neural network for lesion detection from weak labels. Only a single, global label per image - the lesion count - is needed for training. We train a regression network with a fully convolutional architecture combined with a global pooling layer to aggregate the 3D output into a scalar indicating the lesion count. When testing on unseen images, we first run the network to estimate the number of lesions. Then we remove the global pooling layer to compute localization maps of the size of the input image. We evaluate the proposed network on the detection of enlarged perivascular spaces in the basal ganglia in MRI. Our method achieves a sensitivity of 62% with on average 1.5 false positives per image. Compared with four other approaches based on intensity thresholding, saliency and class maps, our method has a 20% higher sensitivity.

1.7CVMar 15, 2017

Transfer Learning by Asymmetric Image Weighting for Segmentation across Scanners

Veronika Cheplygina, Annegreet van Opbroek, M. Arfan Ikram et al.

Supervised learning has been very successful for automatic segmentation of images from a single scanner. However, several papers report deteriorated performances when using classifiers trained on images from one scanner to segment images from other scanners. We propose a transfer learning classifier that adapts to differences between training and test images. This method uses a weighted ensemble of classifiers trained on individual images. The weight of each classifier is determined by the similarity between its training image and the test image. We examine three unsupervised similarity measures, which can be used in scenarios where no labeled data from a newly introduced scanner or scanning protocol is available. The measures are based on a divergence, a bag distance, and on estimating the labels with a clustering procedure. These measures are asymmetric. We study whether the asymmetry can improve classification. Out of the three similarity measures, the bag similarity measure is the most robust across different studies and achieves excellent results on four brain tissue segmentation datasets and three white matter lesion segmentation datasets, acquired at different centers and with different scanners and scanning protocols. We show that the asymmetry can indeed be informative, and that computing the similarity from the test image to the training images is more appropriate than the opposite direction.