IVNov 21, 2022Code
Segmentation, Classification, and Quality Assessment of UW-OCTA Images for the Diagnosis of Diabetic RetinopathyYihao Li, Rachid Zeghlache, Ikram Brahim et al.
Diabetic Retinopathy (DR) is a severe complication of diabetes that can cause blindness. Although effective treatments exist (notably laser) to slow the progression of the disease and prevent blindness, the best treatment remains prevention through regular check-ups (at least once a year) with an ophthalmologist. Optical Coherence Tomography Angiography (OCTA) allows for the visualization of the retinal vascularization, and the choroid at the microvascular level in great detail. This allows doctors to diagnose DR with more precision. In recent years, algorithms for DR diagnosis have emerged along with the development of deep learning and the improvement of computer hardware. However, these usually focus on retina photography. There are no current methods that can automatically analyze DR using Ultra-Wide OCTA (UW-OCTA). The Diabetic Retinopathy Analysis Challenge 2022 (DRAC22) provides a standardized UW-OCTA dataset to train and test the effectiveness of various algorithms on three tasks: lesions segmentation, quality assessment, and DR grading. In this paper, we will present our solutions for the three tasks of the DRAC22 challenge. The obtained results are promising and have allowed us to position ourselves in the TOP 5 of the segmentation task, the TOP 4 of the quality assessment task, and the TOP 3 of the DR grading task. The code is available at \url{https://github.com/Mostafa-EHD/Diabetic_Retinopathy_OCTA}.
41.7CLMay 19
Iterative LLM-based improvement for French Clinical Interview Transcription and Speaker DiarizationAmbre Marie, Thomas Bertin, Guillaume Dardenne et al.
Automatic speech recognition for French medical conversations remains challenging, with word error rates often exceeding 30% in spontaneous clinical speech. This study proposes a multi-pass LLM post-processing architecture alternating between Speaker Recognition and Word Recognition passes to improve transcription accuracy and speaker attribution. Ablation studies on two French clinical datasets (suicide prevention telephone counseling and preoperative awake neurosurgery consultations) investigate four design choices: model selection, prompting strategy, pass ordering, and iteration depth. Using Qwen3-Next-80B, Wilcoxon signed-rank tests confirm significant WDER reductions on suicide prevention conversations (p<0.05, n=18), while maintaining stability on awake neurosurgery consultations (n=10), with zero output failures and acceptable computational cost (RTF 0.32), suggesting feasibility for offline clinical deployment, pending validation on larger corpora.
IVSep 2, 2022
Multimodal Information Fusion for Glaucoma and DR ClassificationYihao Li, Mostafa El Habib Daho, Pierre-Henri Conze et al.
Multimodal information is frequently available in medical tasks. By combining information from multiple sources, clinicians are able to make more accurate judgments. In recent years, multiple imaging techniques have been used in clinical practice for retinal analysis: 2D fundus photographs, 3D optical coherence tomography (OCT) and 3D OCT angiography, etc. Our paper investigates three multimodal information fusion strategies based on deep learning to solve retinal analysis tasks: early fusion, intermediate fusion, and hierarchical fusion. The commonly used early and intermediate fusions are simple but do not fully exploit the complementary information between modalities. We developed a hierarchical fusion approach that focuses on combining features across multiple dimensions of the network, as well as exploring the correlation between modalities. These approaches were applied to glaucoma and diabetic retinopathy classification, using the public GAMMA dataset (fundus photographs and OCT) and a private dataset of PlexElite 9000 (Carl Zeis Meditec Inc.) OCT angiography acquisitions, respectively. Our hierarchical fusion method performed the best in both cases and paved the way for better clinical diagnosis.
IVOct 3, 2023
Improved Automatic Diabetic Retinopathy Severity Classification Using Deep Multimodal Fusion of UWF-CFP and OCTA ImagesMostafa El Habib Daho, Yihao Li, Rachid Zeghlache et al.
Diabetic Retinopathy (DR), a prevalent and severe complication of diabetes, affects millions of individuals globally, underscoring the need for accurate and timely diagnosis. Recent advancements in imaging technologies, such as Ultra-WideField Color Fundus Photography (UWF-CFP) imaging and Optical Coherence Tomography Angiography (OCTA), provide opportunities for the early detection of DR but also pose significant challenges given the disparate nature of the data they produce. This study introduces a novel multimodal approach that leverages these imaging modalities to notably enhance DR classification. Our approach integrates 2D UWF-CFP images and 3D high-resolution 6x6 mm$^3$ OCTA (both structure and flow) images using a fusion of ResNet50 and 3D-ResNet50 models, with Squeeze-and-Excitation (SE) blocks to amplify relevant features. Additionally, to increase the model's generalization capabilities, a multimodal extension of Manifold Mixup, applied to concatenated multimodal features, is implemented. Experimental results demonstrate a remarkable enhancement in DR classification performance with the proposed multimodal approach compared to methods relying on a single modality only. The methodology laid out in this work holds substantial promise for facilitating more accurate, early detection of DR, potentially improving clinical outcomes for patients.
CVOct 16, 2023
Longitudinal Self-supervised Learning Using Neural Ordinary Differential EquationRachid Zeghlache, Pierre-Henri Conze, Mostafa El Habib Daho et al.
Longitudinal analysis in medical imaging is crucial to investigate the progressive changes in anatomical structures or disease progression over time. In recent years, a novel class of algorithms has emerged with the goal of learning disease progression in a self-supervised manner, using either pairs of consecutive images or time series of images. By capturing temporal patterns without external labels or supervision, longitudinal self-supervised learning (LSSL) has become a promising avenue. To better understand this core method, we explore in this paper the LSSL algorithm under different scenarios. The original LSSL is embedded in an auto-encoder (AE) structure. However, conventional self-supervised strategies are usually implemented in a Siamese-like manner. Therefore, (as a first novelty) in this study, we explore the use of Siamese-like LSSL. Another new core framework named neural ordinary differential equation (NODE). NODE is a neural network architecture that learns the dynamics of ordinary differential equations (ODE) through the use of neural networks. Many temporal systems can be described by ODE, including modeling disease progression. We believe that there is an interesting connection to make between LSSL and NODE. This paper aims at providing a better understanding of those core algorithms for learning the disease progression with the mentioned change. In our different experiments, we employ a longitudinal dataset, named OPHDIAT, targeting diabetic retinopathy (DR) follow-up. Our results demonstrate the application of LSSL without including a reconstruction term, as well as the potential of incorporating NODE in conjunction with LSSL.
IVOct 16, 2023
LMT: Longitudinal Mixing Training, a Framework to Predict Disease Progression from a Single ImageRachid Zeghlache, Pierre-Henri Conze, Mostafa El Habib Daho et al.
Longitudinal imaging is able to capture both static anatomical structures and dynamic changes in disease progression toward earlier and better patient-specific pathology management. However, conventional approaches rarely take advantage of longitudinal information for detection and prediction purposes, especially for Diabetic Retinopathy (DR). In the past years, Mix-up training and pretext tasks with longitudinal context have effectively enhanced DR classification results and captured disease progression. In the meantime, a novel type of neural network named Neural Ordinary Differential Equation (NODE) has been proposed for solving ordinary differential equations, with a neural network treated as a black box. By definition, NODE is well suited for solving time-related problems. In this paper, we propose to combine these three aspects to detect and predict DR progression. Our framework, Longitudinal Mixing Training (LMT), can be considered both as a regularizer and as a pretext task that encodes the disease progression in the latent space. Additionally, we evaluate the trained model weights on a downstream task with a longitudinal context using standard and longitudinal pretext tasks. We introduce a new way to train time-aware models using $t_{mix}$, a weighted average time between two consecutive examinations. We compare our approach to standard mixing training on DR classification using OPHDIAT a longitudinal retinal Color Fundus Photographs (CFP) dataset. We were able to predict whether an eye would develop a severe DR in the following visit using a single image, with an AUC of 0.798 compared to baseline results of 0.641. Our results indicate that our longitudinal pretext task can learn the progression of DR disease and that introducing $t_{mix}$ augmentation is beneficial for time-aware models.
IVNov 18, 2022
Joint nnU-Net and Radiomics Approaches for Segmentation and Prognosis of Head and Neck Cancers with PET/CT imagesHui Xu, Yihao Li, Wei Zhao et al.
Automatic segmentation of head and neck cancer (HNC) tumors and lymph nodes plays a crucial role in the optimization treatment strategy and prognosis analysis. This study aims to employ nnU-Net for automatic segmentation and radiomics for recurrence-free survival (RFS) prediction using pretreatment PET/CT images in multi-center HNC cohort. A multi-center HNC dataset with 883 patients (524 patients for training, 359 for testing) was provided in HECKTOR 2022. A bounding box of the extended oropharyngeal region was retrieved for each patient with fixed size of 224 x 224 x 224 $mm^{3}$. Then 3D nnU-Net architecture was adopted to automatic segmentation of primary tumor and lymph nodes synchronously.Based on predicted segmentation, ten conventional features and 346 standardized radiomics features were extracted for each patient. Three prognostic models were constructed containing conventional and radiomics features alone, and their combinations by multivariate CoxPH modelling. The statistical harmonization method, ComBat, was explored towards reducing multicenter variations. Dice score and C-index were used as evaluation metrics for segmentation and prognosis task, respectively. For segmentation task, we achieved mean dice score around 0.701 for primary tumor and lymph nodes by 3D nnU-Net. For prognostic task, conventional and radiomics models obtained the C-index of 0.658 and 0.645 in the test set, respectively, while the combined model did not improve the prognostic performance with the C-index of 0.648.
CVSep 2, 2022
Detection of diabetic retinopathy using longitudinal self-supervised learningRachid Zeghlache, Pierre-Henri Conze, Mostafa El Habib Daho et al.
Longitudinal imaging is able to capture both static anatomical structures and dynamic changes in disease progression towards earlier and better patient-specific pathology management. However, conventional approaches for detecting diabetic retinopathy (DR) rarely take advantage of longitudinal information to improve DR analysis. In this work, we investigate the benefit of exploiting self-supervised learning with a longitudinal nature for DR diagnosis purposes. We compare different longitudinal self-supervised learning (LSSL) methods to model the disease progression from longitudinal retinal color fundus photographs (CFP) to detect early DR severity changes using a pair of consecutive exams. The experiments were conducted on a longitudinal DR screening dataset with or without those trained encoders (LSSL) acting as a longitudinal pretext task. Results achieve an AUC of 0.875 for the baseline (model trained from scratch) and an AUC of 0.96 (95% CI: 0.9593-0.9655 DeLong test) with a p-value < 2.2e-16 on early fusion using a simple ResNet alike architecture with frozen LSSL weights, suggesting that the LSSL latent space enables to encode the dynamic of DR progression.
IVSep 19, 2024
Deep Learning-Based Detection of Referable Diabetic Retinopathy and Macular Edema Using Ultra-Widefield Fundus ImagingPhilippe Zhang, Pierre-Henri Conze, Mathieu Lamard et al.
Diabetic retinopathy and diabetic macular edema are significant complications of diabetes that can lead to vision loss. Early detection through ultra-widefield fundus imaging enhances patient outcomes but presents challenges in image quality and analysis scale. This paper introduces deep learning solutions for automated UWF image analysis within the framework of the MICCAI 2024 UWF4DR challenge. We detail methods and results across three tasks: image quality assessment, detection of referable DR, and identification of DME. Employing advanced convolutional neural network architectures such as EfficientNet and ResNet, along with preprocessing and augmentation strategies, our models demonstrate robust performance in these tasks. Results indicate that deep learning can significantly aid in the automated analysis of UWF images, potentially improving the efficiency and accuracy of DR and DME detection in clinical settings.
CRMay 30, 2022
White-box Membership Attack Against Machine Learning Based Retinopathy ClassificationMounia Hamidouche, Reda Bellafqira, Gwenolé Quellec et al.
The advances in machine learning (ML) have greatly improved AI-based diagnosis aid systems in medical imaging. However, being based on collecting medical data specific to individuals induces several security issues, especially in terms of privacy. Even though the owner of the images like a hospital put in place strict privacy protection provisions at the level of its information system, the model trained over his images still holds disclosure potential. The trained model may be accessible to an attacker as: 1) White-box: accessing to the model architecture and parameters; 2) Black box: where he can only query the model with his own inputs through an appropriate interface. Existing attack methods include: feature estimation attacks (FEA), membership inference attack (MIA), model memorization attack (MMA) and identification attacks (IA). In this work we focus on MIA against a model that has been trained to detect diabetic retinopathy from retinal images. Diabetic retinopathy is a condition that can cause vision loss and blindness in the people who have diabetes. MIA is the process of determining whether a data sample comes from the training data set of a trained ML model or not. From a privacy perspective in our use case where a diabetic retinopathy classification model is given to partners that have at their disposal images along with patients' identifiers, inferring the membership status of a data sample can help to state if a patient has contributed or not to the training of the model.
CVSep 2, 2022
Mapping the ocular surface from monocular videos with an application to dry eye disease gradingIkram Brahim, Mathieu Lamard, Anas-Alexis Benyoussef et al.
With a prevalence of 5 to 50%, Dry Eye Disease (DED) is one of the leading reasons for ophthalmologist consultations. The diagnosis and quantification of DED usually rely on ocular surface analysis through slit-lamp examinations. However, evaluations are subjective and non-reproducible. To improve the diagnosis, we propose to 1) track the ocular surface in 3-D using video recordings acquired during examinations, and 2) grade the severity using registered frames. Our registration method uses unsupervised image-to-depth learning. These methods learn depth from lights and shadows and estimate pose based on depth maps. However, DED examinations undergo unresolved challenges including a moving light source, transparent ocular tissues, etc. To overcome these and estimate the ego-motion, we implement joint CNN architectures with multiple losses incorporating prior known information, namely the shape of the eye, through semantic segmentation as well as sphere fitting. The achieved tracking errors outperform the state-of-the-art, with a mean Euclidean distance as low as 0.48% of the image width on our test set. This registration improves the DED severity classification by a 0.20 AUC difference. The proposed approach is the first to address DED diagnosis with supervision from monocular videos
IVJan 8, 2024Code
Automated Detection of Myopic Maculopathy in MMAC 2023: Achievements in Classification, Segmentation, and Spherical Equivalent PredictionYihao Li, Philippe Zhang, Yubo Tan et al.
Myopic macular degeneration is the most common complication of myopia and the primary cause of vision loss in individuals with pathological myopia. Early detection and prompt treatment are crucial in preventing vision impairment due to myopic maculopathy. This was the focus of the Myopic Maculopathy Analysis Challenge (MMAC), in which we participated. In task 1, classification of myopic maculopathy, we employed the contrastive learning framework, specifically SimCLR, to enhance classification accuracy by effectively capturing enriched features from unlabeled data. This approach not only improved the intrinsic understanding of the data but also elevated the performance of our classification model. For Task 2 (segmentation of myopic maculopathy plus lesions), we have developed independent segmentation models tailored for different lesion segmentation tasks and implemented a test-time augmentation strategy to further enhance the model's performance. As for Task 3 (prediction of spherical equivalent), we have designed a deep regression model based on the data distribution of the dataset and employed an integration strategy to enhance the model's prediction accuracy. The results we obtained are promising and have allowed us to position ourselves in the Top 6 of the classification task, the Top 2 of the segmentation task, and the Top 1 of the prediction task. The code is available at \url{https://github.com/liyihao76/MMAC_LaTIM_Solution}.
25.9LGMay 15
Constrained latent state modeling: A unifying perspective on representation learning under competing constraintsGwenolé Quellec
Learning latent representations from complex data is central to modern machine learning, spanning temporal, multimodal, and partially observed systems. In such settings, representations are better understood as latent states capturing underlying system dynamics, rather than as mere compressed summaries of observations. Yet current approaches remain fragmented, relying on distinct -- and often implicit -- assumptions about what these states should represent. We argue that this fragmentation reflects a more fundamental limitation: latent representations are typically learned from underconstrained objectives that fail to specify the properties that meaningful latent states should satisfy. As a result, multiple representations can satisfy the same objective, leading to ambiguity in their structure and interpretation. While many of the underlying principles have been explored in isolation, their interactions have not been explicitly formalized. In this work, we propose constrained latent state modeling (CLSM) as a unifying perspective. We identify a set of core properties -- predictive sufficiency, minimality, temporal coherence, observation compatibility, invariance to nuisance factors, and structural constraints -- and show that they are intrinsically coupled through fundamental trade-offs. Revisiting major modeling families through this lens, we show that existing approaches can be interpreted as enforcing different subsets of constraints, thereby occupying distinct regions of a common design space. This perspective reframes persistent challenges such as lack of identifiability as consequences of underconstrained formulations, rather than isolated technical limitations. More broadly, CLSM provides a principled framework to make design choices explicit, to analyze trade-offs, and to guide the development of more interpretable, robust, and task-aligned latent state models.
CVApr 23, 2024
A review of deep learning-based information fusion techniques for multimodal medical image classificationYihao Li, Mostafa El Habib Daho, Pierre-Henri Conze et al.
Multimodal medical imaging plays a pivotal role in clinical diagnosis and research, as it combines information from various imaging modalities to provide a more comprehensive understanding of the underlying pathology. Recently, deep learning-based multimodal fusion techniques have emerged as powerful tools for improving medical image classification. This review offers a thorough analysis of the developments in deep learning-based multimodal fusion for medical classification tasks. We explore the complementary relationships among prevalent clinical modalities and outline three main fusion schemes for multimodal classification networks: input fusion, intermediate fusion (encompassing single-level fusion, hierarchical fusion, and attention-based fusion), and output fusion. By evaluating the performance of these fusion techniques, we provide insight into the suitability of different network architectures for various multimodal fusion scenarios and application domains. Furthermore, we delve into challenges related to network architecture selection, handling incomplete multimodal data management, and the potential limitations of multimodal fusion. Finally, we spotlight the promising future of Transformer-based multimodal fusion techniques and give recommendations for future research in this rapidly evolving field.
IVMar 18, 2024
A Systematic Review of Generalization Research in Medical Image ClassificationSarah Matta, Mathieu Lamard, Philippe Zhang et al.
Numerous Deep Learning (DL) classification models have been developed for a large spectrum of medical image analysis applications, which promises to reshape various facets of medical practice. Despite early advances in DL model validation and implementation, which encourage healthcare institutions to adopt them, a fundamental questions remain: how can these models effectively handle domain shift? This question is crucial to limit DL models performance degradation. Medical data are dynamic and prone to domain shift, due to multiple factors. Two main shift types can occur over time: 1) covariate shift mainly arising due to updates to medical equipment and 2) concept shift caused by inter-grader variability. To mitigate the problem of domain shift, existing surveys mainly focus on domain adaptation techniques, with an emphasis on covariate shift. More generally, no work has reviewed the state-of-the-art solutions while focusing on the shift types. This paper aims to explore existing domain generalization methods for DL-based classification models through a systematic review of literature. It proposes a taxonomy based on the shift type they aim to solve. Papers were searched and gathered on Scopus till 10 April 2023, and after the eligibility screening and quality evaluation, 77 articles were identified. Exclusion criteria included: lack of methodological novelty (e.g., reviews, benchmarks), experiments conducted on a single mono-center dataset, or articles not written in English. The results of this paper show that learning based methods are emerging, for both shift types. Finally, we discuss future challenges, including the need for improved evaluation protocols and benchmarks, and envisioned future developments to achieve robust, generalized models for medical image classification.
IVJan 10, 2024
DISCOVER: 2-D Multiview Summarization of Optical Coherence Tomography Angiography for Automatic Diabetic Retinopathy DiagnosisMostafa El Habib Daho, Yihao Li, Rachid Zeghlache et al.
Diabetic Retinopathy (DR), an ocular complication of diabetes, is a leading cause of blindness worldwide. Traditionally, DR is monitored using Color Fundus Photography (CFP), a widespread 2-D imaging modality. However, DR classifications based on CFP have poor predictive power, resulting in suboptimal DR management. Optical Coherence Tomography Angiography (OCTA) is a recent 3-D imaging modality offering enhanced structural and functional information (blood flow) with a wider field of view. This paper investigates automatic DR severity assessment using 3-D OCTA. A straightforward solution to this task is a 3-D neural network classifier. However, 3-D architectures have numerous parameters and typically require many training samples. A lighter solution consists in using 2-D neural network classifiers processing 2-D en-face (or frontal) projections and/or 2-D cross-sectional slices. Such an approach mimics the way ophthalmologists analyze OCTA acquisitions: 1) en-face flow maps are often used to detect avascular zones and neovascularization, and 2) cross-sectional slices are commonly analyzed to detect macular edemas, for instance. However, arbitrary data reduction or selection might result in information loss. Two complementary strategies are thus proposed to optimally summarize OCTA volumes with 2-D images: 1) a parametric en-face projection optimized through deep learning and 2) a cross-sectional slice selection process controlled through gradient-based attribution. The full summarization and DR classification pipeline is trained from end to end. The automatic 2-D summary can be displayed in a viewer or printed in a report to support the decision. We show that the proposed 2-D summarization and classification pipeline outperforms direct 3-D classification with the advantage of improved interpretability.
CVAug 27, 2025
Patch Progression Masked Autoencoder with Fusion CNN Network for Classifying Evolution Between Two Pairs of 2D OCT SlicesPhilippe Zhang, Weili Jiang, Yihao Li et al.
Age-related Macular Degeneration (AMD) is a prevalent eye condition affecting visual acuity. Anti-vascular endothelial growth factor (anti-VEGF) treatments have been effective in slowing the progression of neovascular AMD, with better outcomes achieved through timely diagnosis and consistent monitoring. Tracking the progression of neovascular activity in OCT scans of patients with exudative AMD allows for the development of more personalized and effective treatment plans. This was the focus of the Monitoring Age-related Macular Degeneration Progression in Optical Coherence Tomography (MARIO) challenge, in which we participated. In Task 1, which involved classifying the evolution between two pairs of 2D slices from consecutive OCT acquisitions, we employed a fusion CNN network with model ensembling to further enhance the model's performance. For Task 2, which focused on predicting progression over the next three months based on current exam data, we proposed the Patch Progression Masked Autoencoder that generates an OCT for the next exam and then classifies the evolution between the current OCT and the one generated using our solution from Task 1. The results we achieved allowed us to place in the Top 10 for both tasks. Some team members are part of the same organization as the challenge organizers; therefore, we are not eligible to compete for the prize.
LGApr 10, 2024
LaTiM: Longitudinal representation learning in continuous-time models to predict disease progressionRachid Zeghlache, Pierre-Henri Conze, Mostafa El Habib Daho et al.
This work proposes a novel framework for analyzing disease progression using time-aware neural ordinary differential equations (NODE). We introduce a "time-aware head" in a framework trained through self-supervised learning (SSL) to leverage temporal information in latent space for data augmentation. This approach effectively integrates NODEs with SSL, offering significant performance improvements compared to traditional methods that lack explicit temporal integration. We demonstrate the effectiveness of our strategy for diabetic retinopathy progression prediction using the OPHDIAT database. Compared to the baseline, all NODE architectures achieve statistically significant improvements in area under the ROC curve (AUC) and Kappa metrics, highlighting the efficacy of pre-training with SSL-inspired approaches. Additionally, our framework promotes stable training for NODEs, a commonly encountered challenge in time-aware modeling.
CVMar 24, 2024
L-MAE: Longitudinal masked auto-encoder with time and severity-aware encoding for diabetic retinopathy progression predictionRachid Zeghlache, Pierre-Henri Conze, Mostafa El Habib Daho et al.
Pre-training strategies based on self-supervised learning (SSL) have proven to be effective pretext tasks for many downstream tasks in computer vision. Due to the significant disparity between medical and natural images, the application of typical SSL is not straightforward in medical imaging. Additionally, those pretext tasks often lack context, which is critical for computer-aided clinical decision support. In this paper, we developed a longitudinal masked auto-encoder (MAE) based on the well-known Transformer-based MAE. In particular, we explored the importance of time-aware position embedding as well as disease progression-aware masking. Taking into account the time between examinations instead of just scheduling them offers the benefit of capturing temporal changes and trends. The masking strategy, for its part, evolves during follow-up to better capture pathological changes, ensuring a more accurate assessment of disease progression. Using OPHDIAT, a large follow-up screening dataset targeting diabetic retinopathy (DR), we evaluated the pre-trained weights on a longitudinal task, which is to predict the severity label of the next visit within 3 years based on the past time series examinations. Our results demonstrated the relevancy of both time-aware position embedding and masking strategies based on disease progression knowledge. Compared to popular baseline models and standard longitudinal Transformers, these simple yet effective extensions significantly enhance the predictive ability of deep classification models.
CVJun 3, 2025
Deep Learning for Retinal Degeneration Assessment: A Comprehensive Analysis of the MARIO AMD Progression ChallengeRachid Zeghlache, Ikram Brahim, Pierre-Henri Conze et al.
The MARIO challenge, held at MICCAI 2024, focused on advancing the automated detection and monitoring of age-related macular degeneration (AMD) through the analysis of optical coherence tomography (OCT) images. Designed to evaluate algorithmic performance in detecting neovascular activity changes within AMD, the challenge incorporated unique multi-modal datasets. The primary dataset, sourced from Brest, France, was used by participating teams to train and test their models. The final ranking was determined based on performance on this dataset. An auxiliary dataset from Algeria was used post-challenge to evaluate population and device shifts from submitted solutions. Two tasks were involved in the MARIO challenge. The first one was the classification of evolution between two consecutive 2D OCT B-scans. The second one was the prediction of future AMD evolution over three months for patients undergoing anti-vascular endothelial growth factor (VEGF) therapy. Thirty-five teams participated, with the top 12 finalists presenting their methods. This paper outlines the challenge's structure, tasks, data characteristics, and winning methodologies, setting a benchmark for AMD monitoring using OCT, infrared imaging, and clinical data (such as the number of visits, age, gender, etc.). The results of this challenge indicate that artificial intelligence (AI) performs as well as a physician in measuring AMD progression (Task 1) but is not yet able of predicting future evolution (Task 2).
ASMay 20, 2025
Acoustic and Machine Learning Methods for Speech-Based Suicide Risk Assessment: A Systematic ReviewAmbre Marie, Marine Garnier, Thomas Bertin et al.
Suicide remains a public health challenge, necessitating improved detection methods to facilitate timely intervention and treatment. This systematic review evaluates the role of Artificial Intelligence (AI) and Machine Learning (ML) in assessing suicide risk through acoustic analysis of speech. Following PRISMA guidelines, we analyzed 33 articles selected from PubMed, Cochrane, Scopus, and Web of Science databases. The last search was conducted in February 2025. Risk of bias was assessed using the PROBAST tool. Studies analyzing acoustic features between individuals at risk of suicide (RS) and those not at risk (NRS) were included, while studies lacking acoustic data, a suicide-related focus, or sufficient methodological details were excluded. Sample sizes varied widely and were reported in terms of participants or speech segments, depending on the study. Results were synthesized narratively based on acoustic features and classifier performance. Findings consistently showed significant acoustic feature variations between RS and NRS populations, particularly involving jitter, fundamental frequency (F0), Mel-frequency cepstral coefficients (MFCC), and power spectral density (PSD). Classifier performance varied based on algorithms, modalities, and speech elicitation methods, with multimodal approaches integrating acoustic, linguistic, and metadata features demonstrating superior performance. Among the 29 classifier-based studies, reported AUC values ranged from 0.62 to 0.985 and accuracies from 60% to 99.85%. Most datasets were imbalanced in favor of NRS, and performance metrics were rarely reported separately by group, limiting clear identification of direction of effect.
CLMay 19, 2025
Suicide Risk Assessment Using Multimodal Speech Features: A Study on the SW1 Challenge DatasetAmbre Marie, Ilias Maoudj, Guillaume Dardenne et al.
The 1st SpeechWellness Challenge conveys the need for speech-based suicide risk assessment in adolescents. This study investigates a multimodal approach for this challenge, integrating automatic transcription with WhisperX, linguistic embeddings from Chinese RoBERTa, and audio embeddings from WavLM. Additionally, handcrafted acoustic features -- including MFCCs, spectral contrast, and pitch-related statistics -- were incorporated. We explored three fusion strategies: early concatenation, modality-specific processing, and weighted attention with mixup regularization. Results show that weighted attention provided the best generalization, achieving 69% accuracy on the development set, though a performance gap between development and test sets highlights generalization challenges. Our findings, strictly tied to the MINI-KID framework, emphasize the importance of refining embedding representations and fusion mechanisms to enhance classification reliability.
CVAug 13, 2020
ExplAIn: Explanatory Artificial Intelligence for Diabetic Retinopathy DiagnosisGwenolé Quellec, Hassan Al Hajj, Mathieu Lamard et al.
In recent years, Artificial Intelligence (AI) has proven its relevance for medical decision support. However, the "black-box" nature of successful AI algorithms still holds back their wide-spread deployment. In this paper, we describe an eXplanatory Artificial Intelligence (XAI) that reaches the same level of performance as black-box AI, for the task of classifying Diabetic Retinopathy (DR) severity using Color Fundus Photography (CFP). This algorithm, called ExplAIn, learns to segment and categorize lesions in images; the final image-level classification directly derives from these multivariate lesion segmentations. The novelty of this explanatory framework is that it is trained from end to end, with image supervision only, just like black-box AI algorithms: the concepts of lesions and lesion categories emerge by themselves. For improved lesion localization, foreground/background separation is trained through self-supervision, in such a way that occluding foreground pixels transforms the input image into a healthy-looking image. The advantage of such an architecture is that automatic diagnoses can be explained simply by an image and/or a few sentences. ExplAIn is evaluated at the image level and at the pixel level on various CFP image datasets. We expect this new framework, which jointly offers high classification performance and explainability, to facilitate AI deployment.
IVFeb 27, 2020
Two-stage multi-scale breast mass segmentation for full mammogram analysis without user interventionYutong Yan, Pierre-Henri Conze, Gwenolé Quellec et al.
Mammography is the primary imaging modality used for early detection and diagnosis of breast cancer. X-ray mammogram analysis mainly refers to the localization of suspicious regions of interest followed by segmentation, towards further lesion classification into benign versus malignant. Among diverse types of breast abnormalities, masses are the most important clinical findings of breast carcinomas. However, manually segmenting breast masses from native mammograms is time-consuming and error-prone. Therefore, an integrated computer-aided diagnosis system is required to assist clinicians for automatic and precise breast mass delineation. In this work, we present a two-stage multi-scale pipeline that provides accurate mass contours from high-resolution full mammograms. First, we propose an extended deep detector integrating a multi-scale fusion strategy for automated mass localization. Second, a convolutional encoder-decoder network using nested and dense skip connections is employed to fine-delineate candidate masses. Unlike most previous studies based on segmentation from regions, our framework handles mass segmentation from native full mammograms without any user intervention. Trained on INbreast and DDSM-CBIS public datasets, the pipeline achieves an overall average Dice of 80.44% on INbreast test images, outperforming state-of-the-art. Our system shows promising accuracy as an automatic full-image mass segmentation system. Extensive experiments reveals robustness against the diversity of size, shape and appearance of breast masses, towards better interaction-free computer-aided diagnosis.
CVJul 22, 2019
Automatic detection of rare pathologies in fundus photographs using few-shot learningGwenolé Quellec, Mathieu Lamard, Pierre-Henri Conze et al.
In the last decades, large datasets of fundus photographs have been collected in diabetic retinopathy (DR) screening networks. Through deep learning, these datasets were used to train automatic detectors for DR and a few other frequent pathologies, with the goal to automate screening. One challenge limits the adoption of such systems so far: automatic detectors ignore rare conditions that ophthalmologists currently detect, such as papilledema or anterior ischemic optic neuropathy. The reason is that standard deep learning requires too many examples of these conditions. However, this limitation can be addressed with few-shot learning, a machine learning paradigm where a classifier has to generalize to a new category not seen in training, given only a few examples of this category. This paper presents a new few-shot learning framework that extends convolutional neural networks (CNNs), trained for frequent conditions, with an unsupervised probabilistic model for rare condition detection. It is based on the observation that CNNs often perceive photographs containing the same anomalies as similar, even though these CNNs were trained to detect unrelated conditions. This observation was based on the t-SNE visualization tool, which we decided to incorporate in our probabilistic model. Experiments on a dataset of 164,660 screening examinations from the OPHDIAT screening network show that 37 conditions, out of 41, can be detected with an area under the ROC curve (AUC) greater than 0.8 (average AUC: 0.938). In particular, this framework significantly outperforms other frameworks for detecting rare conditions, including multitask learning, transfer learning and Siamese networks, another few-shot learning solution. We expect these richer predictions to trigger the adoption of automated eye pathology screening, which will revolutionize clinical practice in ophthalmology.
IVJun 12, 2019
Instant automatic diagnosis of diabetic retinopathyGwenolé Quellec, Mathieu Lamard, Bruno Lay et al.
The purpose of this study is to evaluate the performance of the OphtAI system for the automatic detection of referable diabetic retinopathy (DR) and the automatic assessment of DR severity using color fundus photography. OphtAI relies on ensembles of convolutional neural networks trained to recognize eye laterality, detect referable DR and assess DR severity. The system can either process single images or full examination records. To document the automatic diagnoses, accurate heatmaps are generated. The system was developed and validated using a dataset of 763,848 images from 164,660 screening procedures from the OPHDIAT screening program. For comparison purposes, it was also evaluated in the public Messidor-2 dataset. Referable DR can be detected with an area under the ROC curve of AUC = 0.989 in the Messidor-2 dataset, using the University of Iowa's reference standard (95% CI: 0.984-0.994). This is better than the only AI system authorized by the FDA, evaluated in the exact same conditions (AUC = 0.980). OphtAI can also detect vision-threatening DR with an AUC of 0.997 (95% CI: 0.996-0.998) and proliferative DR with an AUC of 0.997 (95% CI: 0.995-0.999). The system runs in 0.3 seconds using a graphics processing unit and less than 2 seconds without. OphtAI is safer, faster and more comprehensive than the only AI system authorized by the FDA so far. Instant DR diagnosis is now possible, which is expected to streamline DR screening and to give easy access to DR screening to more diabetic patients.
CVFeb 25, 2019
Unsupervised learning-based long-term superpixel trackingPierre-Henri Conze, Florian Tilquin, Mathieu Lamard et al.
Finding correspondences between structural entities decomposing images is of high interest for computer vision applications. In particular, we analyze how to accurately track superpixels - visual primitives generated by aggregating adjacent pixels sharing similar characteristics - over extended time periods relying on unsupervised learning and temporal integration. A two-step video processing pipeline dedicated to long-term superpixel tracking is proposed. First, unsupervised learning-based superpixel matching provides correspondences between consecutive and distant frames using new context-rich features extended from greyscale to multi-channel and forward-backward consistency contraints. Resulting elementary matches are then combined along multi-step paths running through the whole sequence with various inter-frame distances. This produces a large set of candidate long-term superpixel pairings upon which majority voting is performed. Video object tracking experiments demonstrate the accuracy of our elementary estimator against state-of-the-art methods and proves the ability of multi-step integration to provide accurate long-term superpixel matches compared to usual direct and sequential integration.
CVOct 4, 2017
Monitoring tool usage in surgery videos using boosted convolutional and recurrent neural networksHassan Al Hajj, Mathieu Lamard, Pierre-Henri Conze et al.
This paper investigates the automatic monitoring of tool usage during a surgery, with potential applications in report generation, surgical training and real-time decision support. Two surgeries are considered: cataract surgery, the most common surgical procedure, and cholecystectomy, one of the most common digestive surgeries. Tool usage is monitored in videos recorded either through a microscope (cataract surgery) or an endoscope (cholecystectomy). Following state-of-the-art video analysis solutions, each frame of the video is analyzed by convolutional neural networks (CNNs) whose outputs are fed to recurrent neural networks (RNNs) in order to take temporal relationships between events into account. Novelty lies in the way those CNNs and RNNs are trained. Computational complexity prevents the end-to-end training of "CNN+RNN" systems. Therefore, CNNs are usually trained first, independently from the RNNs. This approach is clearly suboptimal for surgical tool analysis: many tools are very similar to one another, but they can generally be differentiated based on past events. CNNs should be trained to extract the most useful visual features in combination with the temporal context. A novel boosting strategy is proposed to achieve this goal: the CNN and RNN parts of the system are simultaneously enriched by progressively adding weak classifiers (either CNNs or RNNs) trained to improve the overall classification accuracy. Experiments were performed in a dataset of 50 cataract surgery videos and a dataset of 80 cholecystectomy videos. Very good classification performance are achieved in both datasets: tool usage could be labeled with an average area under the ROC curve of $A_z = 0.9961$ and $A_z = 0.9939$, respectively, in offline mode (using past, present and future information), and $A_z = 0.9957$ and $A_z = 0.9936$, respectively, in online mode (using past and present information only).
CVOct 22, 2016
Deep image mining for diabetic retinopathy screeningGwenolé Quellec, Katia Charrière, Yassine Boudi et al.
Deep learning is quickly becoming the leading methodology for medical image analysis. Given a large medical archive, where each image is associated with a diagnosis, efficient pathology detectors or classifiers can be trained with virtually no expert knowledge about the target pathologies. However, deep learning algorithms, including the popular ConvNets, are black boxes: little is known about the local patterns analyzed by ConvNets to make a decision at the image level. A solution is proposed in this paper to create heatmaps showing which pixels in images play a role in the image-level predictions. In other words, a ConvNet trained for image-level classification can be used to detect lesions as well. A generalization of the backpropagation method is proposed in order to train ConvNets that produce high-quality heatmaps. The proposed solution is applied to diabetic retinopathy (DR) screening in a dataset of almost 90,000 fundus photographs from the 2015 Kaggle Diabetic Retinopathy competition and a private dataset of almost 110,000 photographs (e-ophtha). For the task of detecting referable DR, very good detection performance was achieved: $A_z = 0.954$ in Kaggle's dataset and $A_z = 0.949$ in e-ophtha. Performance was also evaluated at the image level and at the lesion level in the DiaretDB1 dataset, where four types of lesions are manually segmented: microaneurysms, hemorrhages, exudates and cotton-wool spots. The proposed detector outperforms recent algorithms trained to detect those lesions specifically, as well as competing heatmap generation algorithms for ConvNets. This detector is part of the Messidor system for mobile eye pathology screening. Because it does not rely on expert knowledge or manual segmentation for detecting relevant patterns, the proposed solution is a promising image mining tool, which has the potential to discover new biomarkers in images.
CVOct 18, 2016
Real-time analysis of cataract surgery videos using statistical modelsKatia Charrière, Gwenolé Quellec, Mathieu Lamard et al.
The automatic analysis of the surgical process, from videos recorded during surgeries, could be very useful to surgeons, both for training and for acquiring new techniques. The training process could be optimized by automatically providing some targeted recommendations or warnings, similar to the expert surgeon's guidance. In this paper, we propose to reuse videos recorded and stored during cataract surgeries to perform the analysis. The proposed system allows to automatically recognize, in real time, what the surgeon is doing: what surgical phase or, more precisely, what surgical step he or she is performing. This recognition relies on the inference of a multilevel statistical model which uses 1) the conditional relations between levels of description (steps and phases) and 2) the temporal relations among steps and among phases. The model accepts two types of inputs: 1) the presence of surgical tools, manually provided by the surgeons, or 2) motion in videos, automatically analyzed through the Content Based Video retrieval (CBVR) paradigm. Different data-driven statistical models are evaluated in this paper. For this project, a dataset of 30 cataract surgery videos was collected at Brest University hospital. The system was evaluated in terms of area under the ROC curve. Promising results were obtained using either the presence of surgical tools ($A_z$ = 0.983) or motion analysis ($A_z$ = 0.759). The generality of the method allows to adapt it to any kinds of surgeries. The proposed solution could be used in a computer assisted surgery tool to support surgeons during the surgery.
CVSep 19, 2016
Coarse-to-fine Surgical Instrument Detection for Cataract Surgery MonitoringHassan Al Hajj, Gwenolé Quellec, Mathieu Lamard et al.
The amount of surgical data, recorded during video-monitored surgeries, has extremely increased. This paper aims at improving existing solutions for the automated analysis of cataract surgeries in real time. Through the analysis of a video recording the operating table, it is possible to know which instruments exit or enter the operating table, and therefore which ones are likely being used by the surgeon. Combining these observations with observations from the microscope video should enhance the overall performance of the systems. To this end, the proposed solution is divided into two main parts: one to detect the instruments at the beginning of the surgery and one to update the list of instruments every time a change is detected in the scene. In the first part, the goal is to separate the instruments from the background and from irrelevant objects. For the second, we are interested in detecting the instruments that appear and disappear whenever the surgeon interacts with the table. Experiments on a dataset of 36 cataract surgeries validate the good performance of the proposed solution.