Stefan Klein

IV
h-index95
39papers
1,546citations
Novelty40%
AI Score54

39 Papers

CYAug 11, 2023
FUTURE-AI: International consensus guideline for trustworthy and deployable artificial intelligence in healthcare

Karim Lekadir, Aasa Feragen, Abdul Joseph Fofanah et al. · eth-zurich

Despite major advances in artificial intelligence (AI) for medicine and healthcare, the deployment and adoption of AI technologies remain limited in real-world clinical practice. In recent years, concerns have been raised about the technical, clinical, ethical and legal risks associated with medical AI. To increase real world adoption, it is essential that medical AI tools are trusted and accepted by patients, clinicians, health organisations and authorities. This work describes the FUTURE-AI guideline as the first international consensus framework for guiding the development and deployment of trustworthy AI tools in healthcare. The FUTURE-AI consortium was founded in 2021 and currently comprises 118 inter-disciplinary experts from 51 countries representing all continents, including AI scientists, clinicians, ethicists, and social scientists. Over a two-year period, the consortium defined guiding principles and best practices for trustworthy AI through an iterative process comprising an in-depth literature review, a modified Delphi survey, and online consensus meetings. The FUTURE-AI framework was established based on 6 guiding principles for trustworthy AI in healthcare, i.e. Fairness, Universality, Traceability, Usability, Robustness and Explainability. Through consensus, a set of 28 best practices were defined, addressing technical, clinical, legal and socio-ethical dimensions. The recommendations cover the entire lifecycle of medical AI, from design, development and validation to regulation, deployment, and monitoring. FUTURE-AI is a risk-informed, assumption-free guideline which provides a structured approach for constructing medical AI tools that will be trusted, deployed and adopted in real-world practice. Researchers are encouraged to take the recommendations into account in proof-of-concept stages to facilitate future translation towards clinical practice of medical AI.

LGApr 22, 2022
Federated Learning Enables Big Data for Rare Cancer Boundary Detection

Sarthak Pati, Ujjwal Baid, Brandon Edwards et al.

Although machine learning (ML) has shown promise in numerous domains, there are concerns about generalizability to out-of-sample data. This is currently addressed by centrally sharing ample, and importantly diverse, data from multiple sites. However, such centralization is challenging to scale (or even not feasible) due to various limitations. Federated ML (FL) provides an alternative to train accurate and generalizable ML models, by only sharing numerical model updates. Here we present findings from the largest FL study to-date, involving data from 71 healthcare institutions across 6 continents, to generate an automatic tumor boundary detector for the rare disease of glioblastoma, utilizing the largest dataset of such patients ever used in the literature (25,256 MRI scans from 6,314 patients). We demonstrate a 33% improvement over a publicly trained model to delineate the surgically targetable tumor, and 23% improvement over the tumor's entire extent. We anticipate our study to: 1) enable more studies in healthcare informed by large and diverse data, ensuring meaningful results for rare diseases and underrepresented populations, 2) facilitate further quantitative analyses for glioblastoma via performance optimization of our consensus model for eventual public release, and 3) demonstrate the effectiveness of FL at such scale and task complexity as a paradigm shift for multi-site collaborations, alleviating the need for data sharing.

AIAug 22, 2024
AI in radiological imaging of soft-tissue and bone tumours: a systematic review evaluating against CLAIM and FUTURE-AI guidelines

Douwe J. Spaanderman, Matthew Marzetti, Xinyi Wan et al.

Soft-tissue and bone tumours (STBT) are rare, diagnostically challenging lesions with variable clinical behaviours and treatment approaches. This systematic review provides an overview of Artificial Intelligence (AI) methods using radiological imaging for diagnosis and prognosis of these tumours, highlighting challenges in clinical translation, and evaluating study alignment with the Checklist for AI in Medical Imaging (CLAIM) and the FUTURE-AI international consensus guidelines for trustworthy and deployable AI to promote the clinical translation of AI methods. The review covered literature from several bibliographic databases, including papers published before 17/07/2024. Original research in peer-reviewed journals focused on radiology-based AI for diagnosing or prognosing primary STBT was included. Exclusion criteria were animal, cadaveric, or laboratory studies, and non-English papers. Abstracts were screened by two of three independent reviewers for eligibility. Eligible papers were assessed against guidelines by one of three independent reviewers. The search identified 15,015 abstracts, from which 325 articles were included for evaluation. Most studies performed moderately on CLAIM, averaging a score of 28.9$\pm$7.5 out of 53, but poorly on FUTURE-AI, averaging 5.1$\pm$2.1 out of 30. Imaging-AI tools for STBT remain at the proof-of-concept stage, indicating significant room for improvement. Future efforts by AI developers should focus on design (e.g. define unmet clinical need, intended clinical setting and how AI would be integrated in clinical workflow), development (e.g. build on previous work, explainability), evaluation (e.g. evaluating and addressing biases, evaluating AI against best practices), and data reproducibility and availability (making documented code and data publicly available). Following these recommendations could improve clinical translation of AI methods.

CVJul 8, 2024Code
Evaluating the Fairness of Neural Collapse in Medical Image Classification

Kaouther Mouheb, Marawan Elbatel, Stefan Klein et al.

Deep learning has achieved impressive performance across various medical imaging tasks. However, its inherent bias against specific groups hinders its clinical applicability in equitable healthcare systems. A recently discovered phenomenon, Neural Collapse (NC), has shown potential in improving the generalization of state-of-the-art deep learning models. Nonetheless, its implications on bias in medical imaging remain unexplored. Our study investigates deep learning fairness through the lens of NC. We analyze the training dynamics of models as they approach NC when training using biased datasets, and examine the subsequent impact on test performance, specifically focusing on label bias. We find that biased training initially results in different NC configurations across subgroups, before converging to a final NC solution by memorizing all data samples. Through extensive experiments on three medical imaging datasets -- PAPILA, HAM10000, and CheXpert -- we find that in biased settings, NC can lead to a significant drop in F1 score across all subgroups. Our code is available at https://gitlab.com/radiology/neuro/neural-collapse-fairness

IVAug 15, 2023
An Interpretable Machine Learning Model with Deep Learning-based Imaging Biomarkers for Diagnosis of Alzheimer's Disease

Wenjie Kang, Bo Li, Janne M. Papma et al.

Machine learning methods have shown large potential for the automatic early diagnosis of Alzheimer's Disease (AD). However, some machine learning methods based on imaging data have poor interpretability because it is usually unclear how they make their decisions. Explainable Boosting Machines (EBMs) are interpretable machine learning models based on the statistical framework of generalized additive modeling, but have so far only been used for tabular data. Therefore, we propose a framework that combines the strength of EBM with high-dimensional imaging data using deep learning-based feature extraction. The proposed framework is interpretable because it provides the importance of each feature. We validated the proposed framework on the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset, achieving accuracy of 0.883 and area-under-the-curve (AUC) of 0.970 on AD and control classification. Furthermore, we validated the proposed framework on an external testing set, achieving accuracy of 0.778 and AUC of 0.887 on AD and subjective cognitive decline (SCD) classification. The proposed framework significantly outperformed an EBM model using volume biomarkers instead of deep learning-based features, as well as an end-to-end convolutional neural network (CNN) with optimized architecture.

LGJun 29, 2022
Computer-aided diagnosis and prediction in brain disorders

Vikram Venkatraghavan, Sebastian R. van der Voort, Daniel Bos et al.

Computer-aided methods have shown added value for diagnosing and predicting brain disorders and can thus support decision making in clinical care and treatment planning. This chapter will provide insight into the type of methods, their working, their input data - such as cognitive tests, imaging and genetic data - and the types of output they provide. We will focus on specific use cases for diagnosis, i.e. estimating the current 'condition' of the patient, such as early detection and diagnosis of dementia, differential diagnosis of brain tumours, and decision making in stroke. Regarding prediction, i.e. estimation of the future 'condition' of the patient, we will zoom in on use cases such as predicting the disease course in multiple sclerosis and predicting patient outcomes after treatment in brain cancer. Furthermore, based on these use cases, we will assess the current state-of-the-art methodology and highlight current efforts on benchmarking of these methods and the importance of open science therein. Finally, we assess the current clinical impact of computer-aided methods and discuss the required next steps to increase clinical impact.

37.7CVMay 15Code
TriALS: Triphasic-Aided Liver Lesion Segmentation Benchmark in Non-Contrast CT

Marawan Elbatel, Mohamed Ghonim, Jiaji Mao et al.

Automated segmentation of liver lesions on non-contrast computed tomography (NCCT) is clinically important but fundamentally challenging, particularly in low-resource settings across Africa and Asia where contrast agents are frequently unavailable. Progress has been limited by the absence of annotated NCCT benchmarks. Here we describe the TriALS challenge for automated liver lesion segmentation under contrast-limited conditions, supported by a multi-centre dataset of 150 cases with four-phase CT acquisitions (600 volumes) from Egyptian and Chinese institutions. Algorithms were evaluated on 70 cases from three institutions, including an independent external cohort. The top-performing method achieved a mean venous-phase Dice of 0.754, consistent with human-level performance, yet dropped to 0.57 on NCCT. On external validation, the leading method outperformed off-the-shelf models by up to 28% in Dice on NCCT. Algorithm performance was most strongly predicted by training data scale and pre-training strategy. A cross-year comparison exposed a persistent perceptual barrier on NCCT that scaling pre-training alone cannot overcome. Data, annotations, and code are available at https://github.com/xmed-lab/TriALS.

CLNov 3, 2025
Evaluating Open-Weight Large Language Models for Structured Data Extraction from Narrative Medical Reports Across Multiple Use Cases and Languages

Douwe J. Spaanderman, Karthik Prathaban, Petr Zelina et al.

Large language models (LLMs) are increasingly used to extract structured information from free-text clinical records, but prior work often focuses on single tasks, limited models, and English-language reports. We evaluated 15 open-weight LLMs on pathology and radiology reports across six use cases, colorectal liver metastases, liver tumours, neurodegenerative diseases, soft-tissue tumours, melanomas, and sarcomas, at three institutes in the Netherlands, UK, and Czech Republic. Models included general-purpose and medical-specialised LLMs of various sizes, and six prompting strategies were compared: zero-shot, one-shot, few-shot, chain-of-thought, self-consistency, and prompt graph. Performance was assessed using task-appropriate metrics, with consensus rank aggregation and linear mixed-effects models quantifying variance. Top-ranked models achieved macro-average scores close to inter-rater agreement across tasks. Small-to-medium general-purpose models performed comparably to large models, while tiny and specialised models performed worse. Prompt graph and few-shot prompting improved performance by ~13%. Task-specific factors, including variable complexity and annotation variability, influenced results more than model size or prompting strategy. These findings show that open-weight LLMs can extract structured data from clinical reports across diseases, languages, and institutions, offering a scalable approach for clinical data curation.

CVJul 23, 2024
qMRI Diffuser: Quantitative T1 Mapping of the Brain using a Denoising Diffusion Probabilistic Model

Shishuai Wang, Hua Ma, Juan A. Hernandez-Tamames et al.

Quantitative MRI (qMRI) offers significant advantages over weighted images by providing objective parameters related to tissue properties. Deep learning-based methods have demonstrated effectiveness in estimating quantitative maps from series of weighted images. In this study, we present qMRI Diffuser, a novel approach to qMRI utilising deep generative models. Specifically, we implemented denoising diffusion probabilistic models (DDPM) for T1 quantification in the brain, framing the estimation of quantitative maps as a conditional generation task. The proposed method is compared with the residual neural network (ResNet) and the recurrent inference machine (RIM) on both phantom and in vivo data. The results indicate that our method achieves improved accuracy and precision in parameter estimation, along with superior visual performance. Moreover, our method inherently incorporates stochasticity, enabling straightforward quantification of uncertainty. Hence, the proposed method holds significant promise for quantitative MR mapping.

CVNov 5, 2025
Robust Alignment of the Human Embryo in 3D Ultrasound using PCA and an Ensemble of Heuristic, Atlas-based and Learning-based Classifiers Evaluated on the Rotterdam Periconceptional Cohort

Nikolai Herrmann, Marcella C. Zijta, Stefan Klein et al.

Standardized alignment of the embryo in three-dimensional (3D) ultrasound images aids prenatal growth monitoring by facilitating standard plane detection, improving visualization of landmarks and accentuating differences between different scans. In this work, we propose an automated method for standardizing this alignment. Given a segmentation mask of the embryo, Principal Component Analysis (PCA) is applied to the mask extracting the embryo's principal axes, from which four candidate orientations are derived. The candidate in standard orientation is selected using one of three strategies: a heuristic based on Pearson's correlation assessing shape, image matching to an atlas through normalized cross-correlation, and a Random Forest classifier. We tested our method on 2166 images longitudinally acquired 3D ultrasound scans from 1043 pregnancies from the Rotterdam Periconceptional Cohort, ranging from 7+0 to 12+6 weeks of gestational age. In 99.0% of images, PCA correctly extracted the principal axes of the embryo. The correct candidate was selected by the Pearson Heuristic, Atlas-based and Random Forest in 97.4%, 95.8%, and 98.4% of images, respectively. A Majority Vote of these selection methods resulted in an accuracy of 98.5%. The high accuracy of this pipeline enables consistent embryonic alignment in the first trimester, enabling scalable analysis in both clinical and research settings. The code is publicly available at: https://gitlab.com/radiology/prenatal-image-analysis/pca-3d-alignment.

MED-PHDec 19, 2025
q3-MuPa: Quick, Quiet, Quantitative Multi-Parametric MRI using Physics-Informed Diffusion Models

Shishuai Wang, Florian Wiesinger, Noemi Sgambelluri et al.

The 3D fast silent multi-parametric mapping sequence with zero echo time (MuPa-ZTE) is a novel quantitative MRI (qMRI) acquisition that enables nearly silent scanning by using a 3D phyllotaxis sampling scheme. MuPa-ZTE improves patient comfort and motion robustness, and generates quantitative maps of T1, T2, and proton density using the acquired weighted image series. In this work, we propose a diffusion model-based qMRI mapping method that leverages both a deep generative model and physics-based data consistency to further improve the mapping performance. Furthermore, our method enables additional acquisition acceleration, allowing high-quality qMRI mapping from a fourfold-accelerated MuPa-ZTE scan (approximately 1 minute). Specifically, we trained a denoising diffusion probabilistic model (DDPM) to map MuPa-ZTE image series to qMRI maps, and we incorporated the MuPa-ZTE forward signal model as an explicit data consistency (DC) constraint during inference. We compared our mapping method against a baseline dictionary matching approach and a purely data-driven diffusion model. The diffusion models were trained entirely on synthetic data generated from digital brain phantoms, eliminating the need for large real-scan datasets. We evaluated on synthetic data, a NISM/ISMRM phantom, healthy volunteers, and a patient with brain metastases. The results demonstrated that our method produces 3D qMRI maps with high accuracy, reduced noise and better preservation of structural details. Notably, it generalised well to real scans despite training on synthetic data alone. The combination of the MuPa-ZTE acquisition and our physics-informed diffusion model is termed q3-MuPa, a quick, quiet, and quantitative multi-parametric mapping framework, and our findings highlight its strong clinical potential.

CVDec 19, 2025
Self-Supervised Weighted Image Guided Quantitative MRI Super-Resolution

Alireza Samadifardheris, Dirk H. J. Poot, Florian Wiesinger et al.

High-resolution (HR) quantitative MRI (qMRI) relaxometry provides objective tissue characterization but remains clinically underutilized due to lengthy acquisition times. We propose a physics-informed, self-supervised framework for qMRI super-resolution that uses routinely acquired HR weighted MRI (wMRI) scans as guidance, thus, removing the necessity for HR qMRI ground truth during training. We formulate super-resolution as Bayesian maximum a posteriori inference, minimizing two discrepancies: (1) between HR images synthesized from super-resolved qMRI maps and acquired wMRI guides via forward signal models, and (2) between acquired LR qMRI and downsampled predictions. This physics-informed objective allows the models to learn from clinical wMRI without HR qMRI supervision. To validate the concept, we generate training data by synthesizing wMRI guides from HR qMRI using signal equations, then degrading qMRI resolution via k-space truncation. A deep neural network learns the super-resolution mapping. Ablation experiments demonstrate that T1-weighted images primarily enhance T1 maps, T2-weighted images improve T2 maps, and combined guidance optimally enhances all parameters simultaneously. Validation on independently acquired in-vivo data from a different qMRI sequence confirms cross-qMRI sequence generalizability. Models trained on synthetic data can produce super-resolved maps from a 1-minute acquisition with quality comparable to a 5-minute reference scan, leveraging the scanner-independent nature of relaxometry parameters. By decoupling training from HR qMRI requirement, our framework enables fast qMRI acquisitions enhanced via routine clinical images, offering a practical pathway for integrating quantitative relaxometry into clinical workflows with acceptable additional scan time.

CVJan 20, 2025Code
GL-ICNN: An End-To-End Interpretable Convolutional Neural Network for the Diagnosis and Prediction of Alzheimer's Disease

Wenjie Kang, Lize Jiskoot, Peter De Deyn et al.

Deep learning methods based on Convolutional Neural Networks (CNNs) have shown great potential to improve early and accurate diagnosis of Alzheimer's disease (AD) dementia based on imaging data. However, these methods have yet to be widely adopted in clinical practice, possibly due to the limited interpretability of deep learning models. The Explainable Boosting Machine (EBM) is a glass-box model but cannot learn features directly from input imaging data. In this study, we propose a novel interpretable model that combines CNNs and EBMs for the diagnosis and prediction of AD. We develop an innovative training strategy that alternatingly trains the CNN component as a feature extractor and the EBM component as the output block to form an end-to-end model. The model takes imaging data as input and provides both predictions and interpretable feature importance measures. We validated the proposed model on the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset and the Health-RI Parelsnoer Neurodegenerative Diseases Biobank (PND) as an external testing set. The proposed model achieved an area-under-the-curve (AUC) of 0.956 for AD and control classification, and 0.694 for the prediction of conversion of mild cognitive impairment (MCI) to AD on the ADNI cohort. The proposed model is a glass-box model that achieves a comparable performance with other state-of-the-art black-box models. Our code is publicly available at: https://anonymous.4open.science/r/GL-ICNN.

IVFeb 12, 2024
Minimally Interactive Segmentation of Soft-Tissue Tumors on CT and MRI using Deep Learning

Douwe J. Spaanderman, Martijn P. A. Starmans, Gonnie C. M. van Erp et al.

Segmentations are crucial in medical imaging to obtain morphological, volumetric, and radiomics biomarkers. Manual segmentation is accurate but not feasible in the radiologist's clinical workflow, while automatic segmentation generally obtains sub-par performance. We therefore developed a minimally interactive deep learning-based segmentation method for soft-tissue tumors (STTs) on CT and MRI. The method requires the user to click six points near the tumor's extreme boundaries. These six points are transformed into a distance map and serve, with the image, as input for a Convolutional Neural Network. For training and validation, a multicenter dataset containing 514 patients and nine STT types in seven anatomical locations was used, resulting in a Dice Similarity Coefficient (DSC) of 0.85$\pm$0.11 (mean $\pm$ standard deviation (SD)) for CT and 0.84$\pm$0.12 for T1-weighted MRI, when compared to manual segmentations made by expert radiologists. Next, the method was externally validated on a dataset including five unseen STT phenotypes in extremities, achieving 0.81$\pm$0.08 for CT, 0.84$\pm$0.09 for T1-weighted MRI, and 0.88\pm0.08 for previously unseen T2-weighted fat-saturated (FS) MRI. In conclusion, our minimally interactive segmentation method effectively segments different types of STTs on CT and MRI, with robust generalization to previously unseen phenotypes and imaging modalities.

IVMar 10, 2025
The 4D Human Embryonic Brain Atlas: spatiotemporal atlas generation for rapid anatomical changes using first-trimester ultrasound from the Rotterdam Periconceptional Cohort

Wietske A. P. Bastiaansen, Melek Rousian, Anton H. J. Koning et al.

Early brain development is crucial for lifelong neurodevelopmental health. However, current clinical practice offers limited knowledge of normal embryonic brain anatomy on ultrasound, despite the brain undergoing rapid changes within the time-span of days. To provide detailed insights into normal brain development and identify deviations, we created the 4D Human Embryonic Brain Atlas using a deep learning-based approach for groupwise registration and spatiotemporal atlas generation. Our method introduced a time-dependent initial atlas and penalized deviations from it, ensuring age-specific anatomy was maintained throughout rapid development. The atlas was generated and validated using 831 3D ultrasound images from 402 subjects in the Rotterdam Periconceptional Cohort, acquired between gestational weeks 8 and 12. We evaluated the effectiveness of our approach with an ablation study, which demonstrated that incorporating a time-dependent initial atlas and penalization produced anatomically accurate results. In contrast, omitting these adaptations led to anatomically incorrect atlas. Visual comparisons with an existing ex-vivo embryo atlas further confirmed the anatomical accuracy of our atlas. In conclusion, the proposed method successfully captures the rapid anotomical development of the embryonic brain. The resulting 4D Human Embryonic Brain Atlas provides a unique insights into this crucial early life period and holds the potential for improving the detection, prevention, and treatment of prenatal neurodevelopmental disorders.

CVAug 29, 2025
Federated Fine-tuning of SAM-Med3D for MRI-based Dementia Classification

Kaouther Mouheb, Marawan Elbatel, Janne Papma et al.

While foundation models (FMs) offer strong potential for AI-based dementia diagnosis, their integration into federated learning (FL) systems remains underexplored. In this benchmarking study, we systematically evaluate the impact of key design choices: classification head architecture, fine-tuning strategy, and aggregation method, on the performance and efficiency of federated FM tuning using brain MRI data. Using a large multi-cohort dataset, we find that the architecture of the classification head substantially influences performance, freezing the FM encoder achieves comparable results to full fine-tuning, and advanced aggregation methods outperform standard federated averaging. Our results offer practical insights for deploying FMs in decentralized clinical settings and highlight trade-offs that should guide future method development.

IVJun 19, 2024
Recurrent Inference Machine for Medical Image Registration

Yi Zhang, Yidong Zhao, Hui Xue et al.

Image registration is essential for medical image applications where alignment of voxels across multiple images is needed for qualitative or quantitative analysis. With recent advancements in deep neural networks and parallel computing, deep learning-based medical image registration methods become competitive with their flexible modelling and fast inference capabilities. However, compared to traditional optimization-based registration methods, the speed advantage may come at the cost of registration performance at inference time. Besides, deep neural networks ideally demand large training datasets while optimization-based methods are training-free. To improve registration accuracy and data efficiency, we propose a novel image registration method, termed Recurrent Inference Image Registration (RIIR) network. RIIR is formulated as a meta-learning solver to the registration problem in an iterative manner. RIIR addresses the accuracy and data efficiency issues, by learning the update rule of optimization, with implicit regularization combined with explicit gradient input. We evaluated RIIR extensively on brain MRI and quantitative cardiac MRI datasets, in terms of both registration accuracy and training data efficiency. Our experiments showed that RIIR outperformed a range of deep learning-based methods, even with only $5\%$ of the training data, demonstrating high data efficiency. Key findings from our ablation studies highlighted the important added value of the hidden states introduced in the recurrent inference framework for meta-learning. Our proposed RIIR offers a highly data-efficient framework for deep learning-based medical image registration.

IVMay 4, 2023
Comparison of retinal regions-of-interest imaged by OCT for the classification of intermediate AMD

Danilo A. Jesus, Eric F. Thee, Tim Doekemeijer et al.

To study whether it is possible to differentiate intermediate age-related macular degeneration (AMD) from healthy controls using partial optical coherence tomography (OCT) data, that is, restricting the input B-scans to certain pre-defined regions of interest (ROIs). A total of 15744 B-scans from 269 intermediate AMD patients and 115 normal subjects were used in this study (split on subject level in 80% train, 10% validation and 10% test). From each OCT B-scan, three ROIs were extracted: retina, complex between retinal pigment epithelium (RPE) and Bruch membrane (BM), and choroid (CHO). These ROIs were obtained using two different methods: masking and cropping. In addition to the six ROIs, the whole OCT B-scan and the binary mask corresponding to the segmentation of the RPE-BM complex were used. For each subset, a convolutional neural network (based on VGG16 architecture and pre-trained on ImageNet) was trained and tested. The performance of the models was evaluated using the area under the receiver operating characteristic (AUROC), accuracy, sensitivity, and specificity. All trained models presented an AUROC, accuracy, sensitivity, and specificity equal to or higher than 0.884, 0.816, 0.685, and 0.644, respectively. The model trained on the whole OCT B-scan presented the best performance (AUROC = 0.983, accuracy = 0.927, sensitivity = 0.862, specificity = 0.913). The models trained on the ROIs obtained with the cropping method led to significantly higher outcomes than those obtained with masking, with the exception of the retinal tissue, where no statistically significant difference was observed between cropping and masking (p = 0.47). This study demonstrated that while using the complete OCT B-scan provided the highest accuracy in classifying intermediate AMD, models trained on specific ROIs such as the RPE-BM complex or the choroid can still achieve high performance.

LGDec 15, 2021
Ten years of image analysis and machine learning competitions in dementia

Esther E. Bron, Stefan Klein, Annika Reinke et al.

Machine learning methods exploiting multi-parametric biomarkers, especially based on neuroimaging, have huge potential to improve early diagnosis of dementia and to predict which individuals are at-risk of developing dementia. To benchmark algorithms in the field of machine learning and neuroimaging in dementia and assess their potential for use in clinical practice and clinical trials, seven grand challenges have been organized in the last decade. The seven grand challenges addressed questions related to screening, clinical status estimation, prediction and monitoring in (pre-clinical) dementia. There was little overlap in clinical questions, tasks and performance metrics. Whereas this aids providing insight on a broad range of questions, it also limits the validation of results across challenges. The validation process itself was mostly comparable between challenges, using similar methods for ensuring objective comparison, uncertainty estimation and statistical testing. In general, winning algorithms performed rigorous data preprocessing and combined a wide range of input features. Despite high state-of-the-art performances, most of the methods evaluated by the challenges are not clinically used. To increase impact, future challenges could pay more attention to statistical analysis of which factors relate to higher performance, to clinical questions beyond Alzheimer's disease, and to using testing data beyond the Alzheimer's Disease Neuroimaging Initiative. Grand challenges would be an ideal venue for assessing the generalizability of algorithm performance to unseen data of other cohorts. Key for increasing impact in this way are larger testing data sizes, which could be reached by sharing algorithms rather than data to exploit data that cannot be shared.

IVSep 6, 2021
Automatic Segmentation of the Optic Nerve Head Region in Optical Coherence Tomography: A Methodological Review

Rita Marques, Danilo Andrade De Jesus, João Barbosa Breda et al.

The optic nerve head represents the intraocular section of the optic nerve (ONH), which is prone to damage by intraocular pressure. The advent of optical coherence tomography (OCT) has enabled the evaluation of novel optic nerve head parameters, namely the depth and curvature of the lamina cribrosa (LC). Together with the Bruch's membrane opening minimum-rim-width, these seem to be promising optic nerve head parameters for diagnosis and monitoring of retinal diseases such as glaucoma. Nonetheless, these optical coherence tomography derived biomarkers are mostly extracted through manual segmentation, which is time-consuming and prone to bias, thus limiting their usability in clinical practice. The automatic segmentation of optic nerve head in OCT scans could further improve the current clinical management of glaucoma and other diseases. This review summarizes the current state-of-the-art in automatic segmentation of the ONH in OCT. PubMed and Scopus were used to perform a systematic review. Additional works from other databases (IEEE, Google Scholar and ARVO IOVS) were also included, resulting in a total of 27 reviewed studies. For each algorithm, the methods, the size and type of dataset used for validation, and the respective results were carefully analyzed. The results show that deep learning-based algorithms provide the highest accuracy, sensitivity and specificity for segmenting the different structures of the ONH including the LC. However, a lack of consensus regarding the definition of segmented regions, extracted parameters and validation approaches has been observed, highlighting the importance and need of standardized methodologies for ONH segmentation.

IVAug 19, 2021
An automated machine learning framework to optimize radiomics model construction validated on twelve clinical applications

Martijn P. A. Starmans, Sebastian R. van der Voort, Thomas Phil et al.

Predicting clinical outcomes from medical images using quantitative features (``radiomics'') requires many method design choices, Currently, in new clinical applications, finding the optimal radiomics method out of the wide range of methods relies on a manual, heuristic trial-and-error process. We introduce a novel automated framework that optimizes radiomics workflow construction per application by standardizing the radiomics workflow in modular components, including a large collection of algorithms for each component, and formulating a combined algorithm selection and hyperparameter optimization problem. To solve it, we employ automated machine learning through two strategies (random search and Bayesian optimization) and three ensembling approaches. Results show that a medium-sized random search and straight-forward ensembling perform similar to more advanced methods while being more efficient. Validated across twelve clinical applications, our approach outperforms both a radiomics baseline and human experts. Concluding, our framework improves and streamlines radiomics research by fully automatically optimizing radiomics workflow construction. To facilitate reproducibility, we publicly release six datasets, software of the method, and code to reproduce this study.

IVJul 20, 2021
Data synthesis and adversarial networks: A review and meta-analysis in cancer imaging

Richard Osuala, Kaisar Kushibar, Lidia Garrucho et al.

Despite technological and medical advances, the detection, interpretation, and treatment of cancer based on imaging data continue to pose significant challenges. These include inter-observer variability, class imbalance, dataset shifts, inter- and intra-tumour heterogeneity, malignancy determination, and treatment effect uncertainty. Given the recent advancements in Generative Adversarial Networks (GANs), data synthesis, and adversarial training, we assess the potential of these technologies to address a number of key challenges of cancer imaging. We categorise these challenges into (a) data scarcity and imbalance, (b) data access and privacy, (c) data annotation and segmentation, (d) cancer detection and diagnosis, and (e) tumour profiling, treatment planning and monitoring. Based on our analysis of 164 publications that apply adversarial training techniques in the context of cancer imaging, we highlight multiple underexplored solutions with research potential. We further contribute the Synthesis Study Trustworthiness Test (SynTRUST), a meta-analysis framework for assessing the validation rigour of medical image synthesis studies. SynTRUST is based on 26 concrete measures of thoroughness, reproducibility, usefulness, scalability, and tenability. Based on SynTRUST, we analyse 16 of the most promising cancer imaging challenge solutions and observe a high validation rigour in general, but also several desirable improvements. With this work, we strive to bridge the gap between the needs of the clinical cancer imaging community and the current and prospective research on data synthesis and adversarial networks in the artificial intelligence community.

IVMar 22, 2021
Evaluating glioma growth predictions as a forward ranking problem

Karin A. van Garderen, Sebastian R. van der Voort, Maarten M. J. Wijnenga et al.

The problem of tumor growth prediction is challenging, but promising results have been achieved with both model-driven and statistical methods. In this work, we present a framework for the evaluation of growth predictions that focuses on the spatial infiltration patterns, and specifically evaluating a prediction of future growth. We propose to frame the problem as a ranking problem rather than a segmentation problem. Using the average precision as a metric, we can evaluate the results with segmentations while using the full spatiotemporal prediction. Furthermore, by separating the model goodness-of-fit from future predictive performance, we show that in some cases, a better fit of model parameters does not guarantee a better the predictive power.

CVDec 28, 2020
Longitudinal diffusion MRI analysis using Segis-Net: a single-step deep-learning framework for simultaneous segmentation and registration

Bo Li, Wiro J. Niessen, Stefan Klein et al.

This work presents a single-step deep-learning framework for longitudinal image analysis, coined Segis-Net. To optimally exploit information available in longitudinal data, this method concurrently learns a multi-class segmentation and nonlinear registration. Segmentation and registration are modeled using a convolutional neural network and optimized simultaneously for their mutual benefit. An objective function that optimizes spatial correspondence for the segmented structures across time-points is proposed. We applied Segis-Net to the analysis of white matter tracts from N=8045 longitudinal brain MRI datasets of 3249 elderly individuals. Segis-Net approach showed a significant increase in registration accuracy, spatio-temporal segmentation consistency, and reproducibility comparing with two multistage pipelines. This also led to a significant reduction in the sample-size that would be required to achieve the same statistical power in analyzing tract-specific measures. Thus, we expect that Segis-Net can serve as a new reliable tool to support longitudinal imaging studies to investigate macro- and microstructural brain changes over time.

IVDec 16, 2020
Cross-Cohort Generalizability of Deep and Conventional Machine Learning for MRI-based Diagnosis and Prediction of Alzheimer's Disease

Esther E. Bron, Stefan Klein, Janne M. Papma et al.

This work validates the generalizability of MRI-based classification of Alzheimer's disease (AD) patients and controls (CN) to an external data set and to the task of prediction of conversion to AD in individuals with mild cognitive impairment (MCI). We used a conventional support vector machine (SVM) and a deep convolutional neural network (CNN) approach based on structural MRI scans that underwent either minimal pre-processing or more extensive pre-processing into modulated gray matter (GM) maps. Classifiers were optimized and evaluated using cross-validation in the ADNI (334 AD, 520 CN). Trained classifiers were subsequently applied to predict conversion to AD in ADNI MCI patients (231 converters, 628 non-converters) and in the independent Health-RI Parelsnoer data set. From this multi-center study representing a tertiary memory clinic population, we included 199 AD patients, 139 participants with subjective cognitive decline, 48 MCI patients converting to dementia, and 91 MCI patients who did not convert to dementia. AD-CN classification based on modulated GM maps resulted in a similar AUC for SVM (0.940) and CNN (0.933). Application to conversion prediction in MCI yielded significantly higher performance for SVM (0.756) than for CNN (0.742). In external validation, performance was slightly decreased. For AD-CN, it again gave similar AUCs for SVM (0.896) and CNN (0.876). For prediction in MCI, performances decreased for both SVM (0.665) and CNN (0.702). Both with SVM and CNN, classification based on modulated GM maps significantly outperformed classification based on minimally processed images. Deep and conventional classifiers performed equally well for AD classification and their performance decreased only slightly when applied to the external cohort. We expect that this work on external validation contributes towards translation of machine learning to clinical practice.

CVNov 3, 2020
Learning unbiased group-wise registration (LUGR) and joint segmentation: evaluation on longitudinal diffusion MRI

Bo Li, Wiro J. Niessen, Stefan Klein et al.

Analysis of longitudinal changes in imaging studies often involves both segmentation of structures of interest and registration of multiple timeframes. The accuracy of such analysis could benefit from a tailored framework that jointly optimizes both tasks to fully exploit the information available in the longitudinal data. Most learning-based registration algorithms, including joint optimization approaches, currently suffer from bias due to selection of a fixed reference frame and only support pairwise transformations. We here propose an analytical framework based on an unbiased learning strategy for group-wise registration that simultaneously registers images to the mean space of a group to obtain consistent segmentations. We evaluate the proposed method on longitudinal analysis of a white matter tract in a brain MRI dataset with 2-3 time-points for 3249 individuals, i.e., 8045 images in total. The reproducibility of the method is evaluated on test-retest data from 97 individuals. The results confirm that the implicit reference image is an average of the input image. In addition, the proposed framework leads to consistent segmentations and significantly lower processing bias than that of a pair-wise fixed-reference approach. This processing bias is even smaller than those obtained when translating segmentations by only one voxel, which can be attributed to subtle numerical instabilities and interpolation. Therefore, we postulate that the proposed mean-space learning strategy could be widely applied to learning-based registration tasks. In addition, this group-wise framework introduces a novel way for learning-based longitudinal studies by direct construction of an unbiased within-subject template and allowing reliable and efficient analysis of spatio-temporal imaging biomarkers.

IVOct 14, 2020
Differential diagnosis and molecular stratification of gastrointestinal stromal tumors on CT images using a radiomics approach

Martijn P. A. Starmans, Milea J. M. Timbergen, Melissa Vos et al.

Distinguishing gastrointestinal stromal tumors (GISTs) from other intra-abdominal tumors and GISTs molecular analysis is necessary for treatment planning, but challenging due to its rarity. The aim of this study was to evaluate radiomics for distinguishing GISTs from other intra-abdominal tumors, and in GISTs, predict the c-KIT, PDGFRA,BRAF mutational status and mitotic index (MI). All 247 included patients (125 GISTS, 122 non-GISTs) underwent a contrast-enhanced venous phase CT. The GIST vs. non-GIST radiomics model, including imaging, age, sex and location, had a mean area under the curve (AUC) of 0.82. Three radiologists had an AUC of 0.69, 0.76, and 0.84, respectively. The radiomics model had an AUC of 0.52 for c-KIT, 0.56 for c-KIT exon 11, and 0.52 for the MI. Hence, our radiomics model was able to distinguish GIST from non-GISTS with a performance similar to three radiologists, but was not able to predict the c-KIT mutation or MI.

IVOct 9, 2020
WHO 2016 subtyping and automated segmentation of glioma using multi-task deep learning

Sebastian R. van der Voort, Fatih Incekara, Maarten M. J. Wijnenga et al.

Accurate characterization of glioma is crucial for clinical decision making. A delineation of the tumor is also desirable in the initial decision stages but is a time-consuming task. Leveraging the latest GPU capabilities, we developed a single multi-task convolutional neural network that uses the full 3D, structural, pre-operative MRI scans to can predict the IDH mutation status, the 1p/19q co-deletion status, and the grade of a tumor, while simultaneously segmenting the tumor. We trained our method using the largest, most diverse patient cohort to date containing 1508 glioma patients from 16 institutes. We tested our method on an independent dataset of 240 patients from 13 different institutes, and achieved an IDH-AUC of 0.90, 1p/19q-AUC of 0.85, grade-AUC of 0.81, and a mean whole tumor DICE score of 0.84. Thus, our method non-invasively predicts multiple, clinically relevant parameters and generalizes well to the broader clinical population.

LGSep 15, 2020
Analyzing the effect of APOE on Alzheimer's disease progression using an event-based model for stratified populations

Vikram Venkatraghavan, Stefan Klein, Lana Fani et al.

Alzheimer's disease (AD) is the most common form of dementia and is phenotypically heterogeneous. APOE is a triallelic gene which correlates with phenotypic heterogeneity in AD. In this work, we determined the effect of APOE alleles on the disease progression timeline of AD using a discriminative event-based model (DEBM). Since DEBM is a data-driven model, stratification into smaller disease subgroups would lead to more inaccurate models as compared to fitting the model on the entire dataset. Hence our secondary aim is to propose and evaluate novel approaches in which we split the different steps of DEBM into group-aspecific and group-specific parts, where the entire dataset is used to train the group-aspecific parts and only the data from a specific group is used to train the group-specific parts of the DEBM. We performed simulation experiments to benchmark the accuracy of the proposed approaches and to select the optimal approach. Subsequently, the chosen approach was applied to the baseline data of 417 cognitively normal, 235 mild cognitively impaired who convert to AD within 3 years, and 342 AD patients from the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset to gain new insights into the effect of APOE carriership on the disease progression timeline of AD. The presented models could aid understanding of the disease, and in selecting homogeneous group of presymptomatic subjects at-risk of developing symptoms for clinical trials.

IVMay 13, 2020
Towards segmentation and spatial alignment of the human embryonic brain using deep learning for atlas-based registration

Wietske A. P. Bastiaansen, Melek Rousian, Régine P. M. Steegers-Theunissen et al.

We propose an unsupervised deep learning method for atlas based registration to achieve segmentation and spatial alignment of the embryonic brain in a single framework. Our approach consists of two sequential networks with a specifically designed loss function to address the challenges in 3D first trimester ultrasound. The first part learns the affine transformation and the second part learns the voxelwise nonrigid deformation between the target image and the atlas. We trained this network end-to-end and validated it against a ground truth on synthetic datasets designed to resemble the challenges present in 3D first trimester ultrasound. The method was tested on a dataset of human embryonic ultrasound volumes acquired at 9 weeks gestational age, which showed alignment of the brain in some cases and gave insight in open challenges for the proposed method. We conclude that our method is a promising approach towards fully automated spatial alignment and segmentation of embryonic brains in 3D ultrasound.

IVSep 25, 2019
Towards continuous learning for glioma segmentation with elastic weight consolidation

Karin van Garderen, Sebastian van der Voort, Fatih Incekara et al.

When finetuning a convolutional neural network (CNN) on data from a new domain, catastrophic forgetting will reduce performance on the original training data. Elastic Weight Consolidation (EWC) is a recent technique to prevent this, which we evaluated while training and re-training a CNN to segment glioma on two different datasets. The network was trained on the public BraTS dataset and finetuned on an in-house dataset with non-enhancing low-grade glioma. EWC was found to decrease catastrophic forgetting in this case, but was also found to restrict adaptation to the new domain.

CVSep 25, 2019
Multi-modal segmentation with missing MR sequences using pre-trained fusion networks

Karin van Garderen, Marion Smits, Stefan Klein

Missing data is a common problem in machine learning and in retrospective imaging research it is often encountered in the form of missing imaging modalities. We propose to take into account missing modalities in the design and training of neural networks, to ensure that they are capable of providing the best possible prediction even when multiple images are not available. The proposed network combines three modifications to the standard 3D UNet architecture: a training scheme with dropout of modalities, a multi-pathway architecture with fusion layer in the final stage, and the separate pre-training of these pathways. These modifications are evaluated incrementally in terms of performance on full and missing data, using the BraTS multi-modal segmentation challenge. The final model shows significant improvement with respect to the state of the art on missing data and requires less memory during training.

SPSep 19, 2019
APIR-Net: Autocalibrated Parallel Imaging Reconstruction using a Neural Network

Chaoping Zhang, Florian Dubost, Marleen de Bruijne et al.

Deep learning has been successfully demonstrated in MRI reconstruction of accelerated acquisitions. However, its dependence on representative training data limits the application across different contrasts, anatomies, or image sizes. To address this limitation, we propose an unsupervised, auto-calibrated k-space completion method, based on a uniquely designed neural network that reconstructs the full k-space from an undersampled k-space, exploiting the redundancy among the multiple channels in the receive coil in a parallel imaging acquisition. To achieve this, contrary to common convolutional network approaches, the proposed network has a decreasing number of feature maps of constant size. In contrast to conventional parallel imaging methods such as GRAPPA that estimate the prediction kernel from the fully sampled autocalibration signals in a linear way, our method is able to learn nonlinear relations between sampled and unsampled positions in k-space. The proposed method was compared to the start-of-the-art ESPIRiT and RAKI methods in terms of noise amplification and visual image quality in both phantom and in-vivo experiments. The experiments indicate that APIR-Net provides a promising alternative to the conventional parallel imaging methods, and results in improved image quality especially for low SNR acquisitions.

IVAug 26, 2019
A hybrid deep learning framework for integrated segmentation and registration: evaluation on longitudinal white matter tract changes

Bo Li, Wiro Niessen, Stefan Klein et al.

To accurately analyze changes of anatomical structures in longitudinal imaging studies, consistent segmentation across multiple time-points is required. Existing solutions often involve independent registration and segmentation components. Registration between time-points is used either as a prior for segmentation in a subsequent time point or to perform segmentation in a common space. In this work, we propose a novel hybrid convolutional neural network (CNN) that integrates segmentation and registration into a single procedure. We hypothesize that the joint optimization leads to increased performance on both tasks. The hybrid CNN is trained by minimizing an integrated loss function composed of four different terms, measuring segmentation accuracy, similarity between registered images, deformation field smoothness, and segmentation consistency. We applied this method to the segmentation of white matter tracts, describing functionally grouped axonal fibers, using N=8045 longitudinal brain MRI data of 3249 individuals. The proposed method was compared with two multistage pipelines using two existing segmentation methods combined with a conventional deformable registration algorithm. In addition, we assessed the added value of the joint optimization for segmentation and registration separately. The hybrid CNN yielded significantly higher accuracy, consistency and reproducibility of segmentation than the multistage pipelines, and was orders of magnitude faster. Therefore, we expect it can serve as a novel tool to support clinical and epidemiological analyses on understanding microstructural brain changes over time.

IVAug 12, 2019
Improving Robustness of Deep Learning Based Knee MRI Segmentation: Mixup and Adversarial Domain Adaptation

Egor Panfilov, Aleksei Tiulpin, Stefan Klein et al.

Degeneration of articular cartilage (AC) is actively studied in knee osteoarthritis (OA) research via magnetic resonance imaging (MRI). Segmentation of AC tissues from MRI data is an essential step in quantification of their damage. Deep learning (DL) based methods have shown potential in this realm and are the current state-of-the-art, however, their robustness to heterogeneity of MRI acquisition settings remains an open problem. In this study, we investigated two modern regularization techniques -- mixup and adversarial unsupervised domain adaptation (UDA) -- to improve the robustness of DL-based knee cartilage segmentation to new MRI acquisition settings. Our validation setup included two datasets produced by different MRI scanners and using distinct data acquisition protocols. We assessed the robustness of automatic segmentation by comparing mixup and UDA approaches to a strong baseline method at different OA severity stages and, additionally, in relation to anatomical locations. Our results showed that for moderate changes in knee MRI data acquisition settings both approaches may provide notable improvements in the robustness, which are consistent for all stages of the disease and affect the clinically important areas of the knee joint. However, mixup may be considered as a recommended approach, since it is more computationally efficient and does not require additional data from the target acquisition setup.

CVApr 12, 2019
Multimodal Machine Learning-based Knee Osteoarthritis Progression Prediction from Plain Radiographs and Clinical Data

Aleksei Tiulpin, Stefan Klein, Sita M. A. Bierma-Zeinstra et al.

Knee osteoarthritis (OA) is the most common musculoskeletal disease without a cure, and current treatment options are limited to symptomatic relief. Prediction of OA progression is a very challenging and timely issue, and it could, if resolved, accelerate the disease modifying drug development and ultimately help to prevent millions of total joint replacement surgeries performed annually. Here, we present a multi-modal machine learning-based OA progression prediction model that utilizes raw radiographic data, clinical examination results and previous medical history of the patient. We validated this approach on an independent test set of 3,918 knee images from 2,129 subjects. Our method yielded area under the ROC curve (AUC) of 0.79 (0.78-0.81) and Average Precision (AP) of 0.68 (0.66-0.70). In contrast, a reference approach, based on logistic regression, yielded AUC of 0.75 (0.74-0.77) and AP of 0.62 (0.60-0.64). The proposed method could significantly improve the subject selection process for OA drug-development trials and help the development of personalized therapeutic plans.

LGMar 8, 2019
Event-Based Modeling with High-Dimensional Imaging Biomarkers for Estimating Spatial Progression of Dementia

Vikram Venkatraghavan, Florian Dubost, Esther E. Bron et al.

Event-based models (EBM) are a class of disease progression models that can be used to estimate temporal ordering of neuropathological changes from cross-sectional data. Current EBMs only handle scalar biomarkers, such as regional volumes, as inputs. However, regional aggregates are a crude summary of the underlying high-resolution images, potentially limiting the accuracy of EBM. Therefore, we propose a novel method that exploits high-dimensional voxel-wise imaging biomarkers: n-dimensional discriminative EBM (nDEBM). nDEBM is based on an insight that mixture modeling, which is a key element of conventional EBMs, can be replaced by a more scalable semi-supervised support vector machine (SVM) approach. This SVM is used to estimate the degree of abnormality of each region which is then used to obtain subject-specific disease progression patterns. These patterns are in turn used for estimating the mean ordering by fitting a generalized Mallows model. In order to validate the biomarker ordering obtained using nDEBM, we also present a framework for Simulation of Imaging Biomarkers' Temporal Evolution (SImBioTE) that mimics neurodegeneration in brain regions. SImBioTE trains variational auto-encoders (VAE) in different brain regions independently to simulate images at varying stages of disease progression. We also validate nDEBM clinically using data from the Alzheimer's Disease Neuroimaging Initiative (ADNI). In both experiments, nDEBM using high-dimensional features gave better performance than state-of-the-art EBM methods using regional volume biomarkers. This suggests that nDEBM is a promising approach for disease progression modeling.

LGAug 10, 2018
Disease Progression Timeline Estimation for Alzheimer's Disease using Discriminative Event Based Modeling

Vikram Venkatraghavan, Esther E. Bron, Wiro J. Niessen et al.

Alzheimer's Disease (AD) is characterized by a cascade of biomarkers becoming abnormal, the pathophysiology of which is very complex and largely unknown. Event-based modeling (EBM) is a data-driven technique to estimate the sequence in which biomarkers for a disease become abnormal based on cross-sectional data. It can help in understanding the dynamics of disease progression and facilitate early diagnosis and prognosis. In this work we propose a novel discriminative approach to EBM, which is shown to be more accurate than existing state-of-the-art EBM methods. The method first estimates for each subject an approximate ordering of events. Subsequently, the central ordering over all subjects is estimated by fitting a generalized Mallows model to these approximate subject-specific orderings. We also introduce the concept of relative distance between events which helps in creating a disease progression timeline. Subsequently, we propose a method to stage subjects by placing them on the estimated disease progression timeline. We evaluated the proposed method on Alzheimer's Disease Neuroimaging Initiative (ADNI) data and compared the results with existing state-of-the-art EBM methods. We also performed extensive experiments on synthetic data simulating the progression of Alzheimer's disease. The event orderings obtained on ADNI data seem plausible and are in agreement with the current understanding of progression of AD. The proposed patient staging algorithm performed consistently better than that of state-of-the-art EBM methods. Event orderings obtained in simulation experiments were more accurate than those of other EBM methods and the estimated disease progression timeline was observed to correlate with the timeline of actual disease progression. The results of these experiments are encouraging and suggest that discriminative EBM is a promising approach to disease progression modeling.

CVFeb 21, 2017
A Discriminative Event Based Model for Alzheimer's Disease Progression Modeling

Vikram Venkatraghavan, Esther Bron, Wiro Niessen et al.

The event-based model (EBM) for data-driven disease progression modeling estimates the sequence in which biomarkers for a disease become abnormal. This helps in understanding the dynamics of disease progression and facilitates early diagnosis by staging patients on a disease progression timeline. Existing EBM methods are all generative in nature. In this work we propose a novel discriminative approach to EBM, which is shown to be more accurate as well as computationally more efficient than existing state-of-the art EBM methods. The method first estimates for each subject an approximate ordering of events, by ranking the posterior probabilities of individual biomarkers being abnormal. Subsequently, the central ordering over all subjects is estimated by fitting a generalized Mallows model to these approximate subject-specific orderings based on a novel probabilistic Kendall's Tau distance. To evaluate the accuracy, we performed extensive experiments on synthetic data simulating the progression of Alzheimer's disease. Subsequently, the method was applied to the Alzheimer's Disease Neuroimaging Initiative (ADNI) data to estimate the central event ordering in the dataset. The experiments benchmark the accuracy of the new model under various conditions and compare it with existing state-of-the-art EBM methods. The results indicate that discriminative EBM could be a simple and elegant approach to disease progression modeling.