Jonas Richiardi

h-index31

13papers

117citations

Novelty40%

AI Score52

Ranked #15,695 of 194,257 authors (top 8%)#5,710 in CV (top 10%)

13 Papers

1.4CVOct 18, 2022

Weakly Supervised Learning with Automated Labels from Radiology Reports for Glioma Change Detection

Tommaso Di Noto, Meritxell Bach Cuadra, Chirine Atat et al.

Gliomas are the most frequent primary brain tumors in adults. Glioma change detection aims at finding the relevant parts of the image that change over time. Although Deep Learning (DL) shows promising performances in similar change detection tasks, the creation of large annotated datasets represents a major bottleneck for supervised DL applications in radiology. To overcome this, we propose a combined use of weak labels (imprecise, but fast-to-create annotations) and Transfer Learning (TL). Specifically, we explore inductive TL, where source and target domains are identical, but tasks are different due to a label shift: our target labels are created manually by three radiologists, whereas our source weak labels are generated automatically from radiology reports via NLP. We frame knowledge transfer as hyperparameter optimization, thus avoiding heuristic choices that are frequent in related works. We investigate the relationship between model size and TL, comparing a low-capacity VGG with a higher-capacity ResNeXt model. We evaluate our models on 1693 T2-weighted magnetic resonance imaging difference maps created from 183 patients, by classifying them into stable or unstable according to tumor evolution. The weak labels extracted from radiology reports allowed us to increase dataset size more than 3-fold, and improve VGG classification results from 75% to 82% AUC. Mixed training from scratch led to higher performance than fine-tuning or feature extraction. To assess generalizability, we ran inference on an open dataset (BraTS-2015: 15 patients, 51 difference maps), reaching up to 76% AUC. Overall, results suggest that medical imaging problems may benefit from smaller models and different TL strategies with respect to computer vision datasets, and that report-generated weak labels are effective in improving model performances. Code, in-house dataset and BraTS labels are released.

11.5CVApr 13

Towards Brain MRI Foundation Models for the Clinic: Findings from the FOMO25 Challenge

Asbjørn Munk, Stefano Cerri, Vardan Nersesjan et al.

Clinical deployment of automated brain MRI analysis faces a fundamental challenge: clinical data is heterogeneous and noisy, and high-quality labels are prohibitively costly to obtain. Self-supervised learning (SSL) can address this by leveraging the vast amounts of unlabeled data produced in clinical workflows to train robust \textit{foundation models} that adapt out-of-domain with minimal supervision. However, the development of foundation models for brain MRI has been limited by small pretraining datasets and in-domain benchmarking focused on high-quality, research-grade data. To address this gap, we organized the FOMO25 challenge as a satellite event at MICCAI 2025. FOMO25 provided participants with a large pretraining dataset, FOMO60K, and evaluated models on data sourced directly from clinical workflows in few-shot and out-of-domain settings. Tasks covered infarct classification, meningioma segmentation, and brain age regression, and considered both models trained on FOMO60K (method track) and any data (open track). Nineteen foundation models from sixteen teams were evaluated using a standardized containerized pipeline. Results show that (a) self-supervised pretraining improves generalization on clinical data under domain shift, with the strongest models trained \textit{out-of-domain} surpassing supervised baselines trained \textit{in-domain}. (b) No single pretraining objective benefits all tasks: MAE favors segmentation, hybrid reconstruction-contrastive objectives favor classification, and (c) strong performance was achieved by small pretrained models, and improvements from scaling model size and training duration did not yield reliable benefits.

5.2LGMay 4

Gradient Boosted Risk Scores

Costa Georgantas, Jonas Richiardi

Risk scores are an interpretable and actionable class of machine learning models with applications in medicine, insurance, and risk management. Unlike most computational methods, risk scores are designed to be computed by a human by attributing points to a data sample based on a limited set of criteria. The most common approaches for generating risk scores use linear regressions to estimate the effect of selected variables. We propose a simple and effective approach towards building compact and predictive risk scores. We provide an algorithm based on gradient boosting that is capable of modeling nonlinear effects, along with a C++ implementation with Python and R bindings. Through extensive empirical evaluation on twelve tabular datasets spanning regression, classification, and time-to-event tasks, we show that our method achieves competitive predictive performance while producing substantially more compact scores than regression-based alternatives, with 60% fewer rules for classification tasks and 16% fewer rules for time-to-event tasks on average, compared to AutoScore.

2.8CVJan 19Code

From 100,000+ images to winning the first brain MRI foundation model challenges: Sharing lessons and models

Pedro M. Gordaliza, Jaume Banus, Benoît Gérin et al.

Developing Foundation Models for medical image analysis is essential to overcome the unique challenges of radiological tasks. The first challenges of this kind for 3D brain MRI, SSL3D and FOMO25, were held at MICCAI 2025. Our solution ranked first in tracks of both contests. It relies on a U-Net CNN architecture combined with strategies leveraging anatomical priors and neuroimaging domain knowledge. Notably, our models trained 1-2 orders of magnitude faster and were 10 times smaller than competing transformer-based approaches. Models are available here: https://github.com/jbanusco/BrainFM4Challenges.

4.1LGSep 16, 2025

Spatiotemporal graph neural process for reconstruction, extrapolation, and classification of cardiac trajectories

Jaume Banus, Augustin C. Ogier, Roger Hullin et al.

We present a probabilistic framework for modeling structured spatiotemporal dynamics from sparse observations, focusing on cardiac motion. Our approach integrates neural ordinary differential equations (NODEs), graph neural networks (GNNs), and neural processes into a unified model that captures uncertainty, temporal continuity, and anatomical structure. We represent dynamic systems as spatiotemporal multiplex graphs and model their latent trajectories using a GNN-parameterized vector field. Given the sparse context observations at node and edge levels, the model infers a distribution over latent initial states and control variables, enabling both interpolation and extrapolation of trajectories. We validate the method on three synthetic dynamical systems (coupled pendulum, Lorenz attractor, and Kuramoto oscillators) and two real-world cardiac imaging datasets - ACDC (N=150) and UK Biobank (N=526) - demonstrating accurate reconstruction, extrapolation, and disease classification capabilities. The model accurately reconstructs trajectories and extrapolates future cardiac cycles from a single observed cycle. It achieves state-of-the-art results on the ACDC classification task (up to 99% accuracy), and detects atrial fibrillation in UK Biobank subjects with competitive performance (up to 67% accuracy). This work introduces a flexible approach for analyzing cardiac motion and offers a foundation for graph-based learning in structured biomedical spatiotemporal time-series data.

3.6CVSep 8, 2025

AI-based response assessment and prediction in longitudinal imaging for brain metastases treated with stereotactic radiosurgery

Lorenz Achim Kuhn, Daniel Abler, Jonas Richiardi et al.

Brain Metastases (BM) are a large contributor to mortality of patients with cancer. They are treated with Stereotactic Radiosurgery (SRS) and monitored with Magnetic Resonance Imaging (MRI) at regular follow-up intervals according to treatment guidelines. Analyzing and quantifying this longitudinal imaging represents an intractable workload for clinicians. As a result, follow-up images are not annotated and merely assessed by observation. Response to treatment in longitudinal imaging is being studied, to better understand growth trajectories and ultimately predict treatment success or toxicity as early as possible. In this study, we implement an automated pipeline to curate a large longitudinal dataset of SRS treatment data, resulting in a cohort of 896 BMs in 177 patients who were monitored for >360 days at approximately two-month intervals at Lausanne University Hospital (CHUV). We use a data-driven clustering to identify characteristic trajectories. In addition, we predict 12 months lesion-level response using classical as well as graph machine learning Graph Machine Learning (GML). Clustering revealed 5 dominant growth trajectories with distinct final response categories. Response prediction reaches up to 0.90 AUC (CI95%=0.88-0.92) using only pre-treatment and first follow-up MRI with gradient boosting. Similarly, robust predictive performance of up to 0.88 AUC (CI95%=0.86-0.90) was obtained using GML, offering more flexibility with a single model for multiple input time-points configurations. Our results suggest potential automation and increased precision for the comprehensive assessment and prediction of BM response to SRS in longitudinal MRI. The proposed pipeline facilitates scalable data curation for the investigation of BM growth patterns, and lays the foundation for clinical decision support systems aiming at optimizing personalized care.

7.1LGJul 18, 2025

Structural Connectome Harmonization Using Deep Learning: The Strength of Graph Neural Networks

Jagruti Patel, Thomas A. W. Bolton, Mikkel Schöttner et al.

Small sample sizes in neuroimaging in general, and in structural connectome (SC) studies in particular limit the development of reliable biomarkers for neurological and psychiatric disorders - such as Alzheimer's disease and schizophrenia - by reducing statistical power, reliability, and generalizability. Large-scale multi-site studies have exist, but they have acquisition-related biases due to scanner heterogeneity, compromising imaging consistency and downstream analyses. While existing SC harmonization methods - such as linear regression (LR), ComBat, and deep learning techniques - mitigate these biases, they often rely on detailed metadata, traveling subjects (TS), or overlook the graph-topology of SCs. To address these limitations, we propose a site-conditioned deep harmonization framework that harmonizes SCs across diverse acquisition sites without requiring metadata or TS that we test in a simulated scenario based on the Human Connectome Dataset. Within this framework, we benchmark three deep architectures - a fully connected autoencoder (AE), a convolutional AE, and a graph convolutional AE - against a top-performing LR baseline. While non-graph models excel in edge-weight prediction and edge existence detection, the graph AE demonstrates superior preservation of topological structure and subject-level individuality, as reflected by graph metrics and fingerprinting accuracy, respectively. Although the LR baseline achieves the highest numerical performance by explicitly modeling acquisition parameters, it lacks applicability to real-world multi-site use cases as detailed acquisition metadata is often unavailable. Our results highlight the critical role of model architecture in SC harmonization performance and demonstrate that graph-based approaches are particularly well-suited for structure-aware, domain-generalizable SC harmonization in large-scale multi-site SC studies.

5.1IVMar 22, 2025

Assessing workflow impact and clinical utility of AI-assisted brain aneurysm detection: a multi-reader study

Tommaso Di Noto, Sofyan Jankowski, Francesco Puccinelli et al.

Despite the plethora of AI-based algorithms developed for anomaly detection in radiology, subsequent integration into clinical setting is rarely evaluated. In this work, we assess the applicability and utility of an AI-based model for brain aneurysm detection comparing the performance of two readers with different levels of experience (2 and 13 years). We aim to answer the following questions: 1) Do the readers improve their performance when assisted by the AI algorithm? 2) How much does the AI algorithm impact routine clinical workflow? We reuse and enlarge our open-access, Time-Of-Flight Magnetic Resonance Angiography dataset (N=460). We use 360 subjects for training/validating our algorithm and 100 as unseen test set for the reading session. Even though our model reaches state-of-the-art results on the test set (sensitivity=74%, false positive rate=1.6), we show that neither the junior nor the senior reader significantly increase their sensitivity (p=0.59, p=1, respectively). In addition, we find that reading time for both readers is significantly higher in the "AI-assisted" setting than in the "Unassisted" (+15 seconds, on average; p=3x10^(-4) junior, p=3x10^(-5) senior). The confidence reported by the readers is unchanged across the two settings, indicating that the AI assistance does not influence the certainty of the diagnosis. Our findings highlight the importance of clinical validation of AI algorithms in a clinical setting involving radiologists. This study should serve as a reminder to the community to always examine the real-word effectiveness and workflow impact of proposed algorithms.

5.3IVMay 26, 2023

Fast refacing of MR images with a generative neural network lowers re-identification risk and preserves volumetric consistency

Nataliia Molchanova, Bénédicte Maréchal, Jean-Philippe Thiran et al.

With the rise of open data, identifiability of individuals based on 3D renderings obtained from routine structural magnetic resonance imaging (MRI) scans of the head has become a growing privacy concern. To protect subject privacy, several algorithms have been developed to de-identify imaging data using blurring, defacing or refacing. Completely removing facial structures provides the best re-identification protection but can significantly impact post-processing steps, like brain morphometry. As an alternative, refacing methods that replace individual facial structures with generic templates have a lower effect on the geometry and intensity distribution of original scans, and are able to provide more consistent post-processing results by the price of higher re-identification risk and computational complexity. In the current study, we propose a novel method for anonymised face generation for defaced 3D T1-weighted scans based on a 3D conditional generative adversarial network. To evaluate the performance of the proposed de-identification tool, a comparative study was conducted between several existing defacing and refacing tools, with two different segmentation algorithms (FAST and Morphobox). The aim was to evaluate (i) impact on brain morphometry reproducibility, (ii) re-identification risk, (iii) balance between (i) and (ii), and (iv) the processing time. The proposed method takes 9 seconds for face generation and is suitable for recovering consistent post-processing results after defacing.

16.9CVNov 17, 2021

The Multiscenario Multienvironment BioSecure Multimodal Database (BMDB)

Javier Ortega-Garcia, Julian Fierrez, Fernando Alonso-Fernandez et al.

A new multimodal biometric database designed and acquired within the framework of the European BioSecure Network of Excellence is presented. It is comprised of more than 600 individuals acquired simultaneously in three scenarios: 1) over the Internet, 2) in an office environment with desktop PC, and 3) in indoor/outdoor environments with mobile portable hardware. The three scenarios include a common part of audio/video data. Also, signature and fingerprint data have been acquired both with desktop PC and mobile portable hardware. Additionally, hand and iris data were acquired in the second scenario using desktop PC. Acquisition has been conducted by 11 European institutions. Additional features of the BioSecure Multimodal Database (BMDB) are: two acquisition sessions, several sensors in certain modalities, balanced gender and age distributions, multimodal realistic scenarios with simple and quick tasks per modality, cross-European diversity, availability of demographic data, and compatibility with other multimodal databases. The novel acquisition conditions of the BMDB allow us to perform new challenging research and evaluation of either monomodal or multimodal biometric systems, as in the recent BioSecure Multimodal Evaluation campaign. A description of this campaign including baseline results of individual modalities from the new database is also given. The database is expected to be available for research purposes through the BioSecure Association during 2008

12.0IVMar 10, 2021Code

Towards automated brain aneurysm detection in TOF-MRA: open data, weak labels, and anatomical knowledge

Tommaso Di Noto, Guillaume Marie, Sebastien Tourbier et al.

Brain aneurysm detection in Time-Of-Flight Magnetic Resonance Angiography (TOF-MRA) has undergone drastic improvements with the advent of Deep Learning (DL). However, performances of supervised DL models heavily rely on the quantity of labeled samples, which are extremely costly to obtain. Here, we present a DL model for aneurysm detection that overcomes the issue with ''weak'' labels: oversized annotations which are considerably faster to create. Our weak labels resulted to be four times faster to generate than their voxel-wise counterparts. In addition, our model leverages prior anatomical knowledge by focusing only on plausible locations for aneurysm occurrence. We frst train and evaluate our model through cross-validation on an in-house TOF-MRA dataset comprising 284 subjects (170 females / 127 healthy controls / 157 patients with 198 aneurysms). On this dataset, our best model achieved a sensitivity of 83%, with False Positive (FP) rate of 0.8 per patient. To assess model generalizability, we then participated in a challenge for aneurysm detection with TOF-MRA data (93 patients, 20 controls, 125 aneurysms). On the public challenge, sensitivity was 68% (FP rate=2.5), ranking 4th/18 on the open leaderboard. We found no signifcant diference in sensitivity between aneurysm risk-of-rupture groups (p=0.75), locations (p=0.72), or sizes (p=0.15). Data, code and model weights are released under permissive licenses. We demonstrate that weak labels and anatomical knowledge can alleviate the necessity for prohibitively expensive voxel-wise annotations.

3.7IVNov 27, 2020

An anatomically-informed 3D CNN for brain aneurysm classification with weak labels

Tommaso Di Noto, Guillaume Marie, Sébastien Tourbier et al.

A commonly adopted approach to carry out detection tasks in medical imaging is to rely on an initial segmentation. However, this approach strongly depends on voxel-wise annotations which are repetitive and time-consuming to draw for medical experts. An interesting alternative to voxel-wise masks are so-called "weak" labels: these can either be coarse or oversized annotations that are less precise, but noticeably faster to create. In this work, we address the task of brain aneurysm detection as a patch-wise binary classification with weak labels, in contrast to related studies that rather use supervised segmentation methods and voxel-wise delineations. Our approach comes with the non-trivial challenge of the data set creation: as for most focal diseases, anomalous patches (with aneurysm) are outnumbered by those showing no anomaly, and the two classes usually have different spatial distributions. To tackle this frequent scenario of inherently imbalanced, spatially skewed data sets, we propose a novel, anatomically-driven approach by using a multi-scale and multi-input 3D Convolutional Neural Network (CNN). We apply our model to 214 subjects (83 patients, 131 controls) who underwent Time-Of-Flight Magnetic Resonance Angiography (TOF-MRA) and presented a total of 111 unruptured cerebral aneurysms. We compare two strategies for negative patch sampling that have an increasing level of difficulty for the network and we show how this choice can strongly affect the results. To assess whether the added spatial information helps improving performances, we compare our anatomically-informed CNN with a baseline, spatially-agnostic CNN. When considering the more realistic and challenging scenario including vessel-like negative patches, the former model attains the highest classification results (accuracy$\simeq$95\%, AUROC$\simeq$0.95, AUPR$\simeq$0.71), thus outperforming the baseline.

4.1LGSep 10, 2018

Shallow vs deep learning architectures for white matter lesion segmentation in the early stages of multiple sclerosis

Francesco La Rosa, Mário João Fartaria, Tobias Kober et al.

In this work, we present a comparison of a shallow and a deep learning architecture for the automated segmentation of white matter lesions in MR images of multiple sclerosis patients. In particular, we train and test both methods on early stage disease patients, to verify their performance in challenging conditions, more similar to a clinical setting than what is typically provided in multiple sclerosis segmentation challenges. Furthermore, we evaluate a prototype naive combination of the two methods, which refines the final segmentation. All methods were trained on 32 patients, and the evaluation was performed on a pure test set of 73 cases. Results show low lesion-wise false positives (30%) for the deep learning architecture, whereas the shallow architecture yields the best Dice coefficient (63%) and volume difference (19%). Combining both shallow and deep architectures further improves the lesion-wise metrics (69% and 26% lesion-wise true and false positive rate, respectively).