IVApr 15, 2024
ODFormer: Semantic Fundus Image Segmentation Using Transformer for Optic Nerve Head DetectionJiayi Wang, Yi-An Mao, Xiaoyu Ma et al.
Optic nerve head (ONH) detection has been a crucial area of study in ophthalmology for years. However, the significant discrepancy between fundus image datasets, each generated using a single type of fundus camera, poses challenges to the generalizability of ONH detection approaches developed based on semantic segmentation networks. Despite the numerous recent advancements in general-purpose semantic segmentation methods using convolutional neural networks (CNNs) and Transformers, there is currently a lack of benchmarks for these state-of-the-art (SoTA) networks specifically trained for ONH detection. Therefore, in this article, we make contributions from three key aspects: network design, the publication of a dataset, and the establishment of a comprehensive benchmark. Our newly developed ONH detection network, referred to as ODFormer, is based upon the Swin Transformer architecture and incorporates two novel components: a multi-scale context aggregator and a lightweight bidirectional feature recalibrator. Our published large-scale dataset, known as TongjiU-DROD, provides multi-resolution fundus images for each participant, captured using two distinct types of cameras. Our established benchmark involves three datasets: DRIONS-DB, DRISHTI-GS1, and TongjiU-DROD, created by researchers from different countries and containing fundus images captured from participants of diverse races and ages. Extensive experimental results demonstrate that our proposed ODFormer outperforms other state-of-the-art (SoTA) networks in terms of performance and generalizability. Our dataset and source code are publicly available at mias.group/ODFormer.
QMOct 1, 2025
Glaucoma Detection and Structured OCT Report Generation via a Fine-tuned Multimodal Large Language ModelJalil Jalili, Yashraj Gavhane, Evan Walker et al.
Objective: To develop an explainable multimodal large language model (MM-LLM) that (1) screens optic nerve head (ONH) OCT circle scans for quality and (2) generates structured clinical reports that include glaucoma diagnosis and sector-wise retinal nerve fiber layer (RNFL) thinning assessments. Design: Retrospective cohort study of 1,310 subjects contributing 43,849 Spectralis ONH OCT circle scans (1,331 glaucomatous and 867 healthy eyes) from the DIGS and ADAGES cohorts. Methods: A MM-LLM (Llama 3.2 Vision-Instruct model) was fine-tuned to generate clinical descriptions of OCT imaging data. Training data included paired OCT images and automatically generated, structured clinical reports that described global and sectoral RNFL thinning. Poor-quality scans were labeled as unusable and paired with a fixed refusal statement. The model was evaluated on a held-out test set for three tasks: quality assessment, glaucoma detection, and RNFL thinning classification across seven anatomical sectors. Evaluation metrics included accuracy, sensitivity, specificity, precision, and F1-score. Model description quality was also evaluated using standard text evaluation metrics. Results: The model achieved 0.90 accuracy and 0.98 specificity for quality triage. For glaucoma detection, accuracy was 0.86 (sensitivity 0.91, specificity 0.73, F1-score 0.91). RNFL thinning prediction accuracy ranged from 0.83 to 0.94, with highest performance in global and temporal sectors. Text generation scores showed strong alignment with reference reports (BLEU: 0.82; ROUGE-1: 0.94; ROUGE-2: 0.87; ROUGE-L: 0.92; BERTScore-F1: 0.99). Conclusions: The fine-tuned MM-LLM generated accurate clinical descriptions based on OCT imaging. The model achieved high accuracy in identifying image quality issues and detecting glaucoma. The model also provided sectoral descriptions of RNFL thinning to help support clinical OCT evaluation.
CVDec 9, 2020
One-Vote Veto: Semi-Supervised Learning for Low-Shot Glaucoma DiagnosisRui Fan, Christopher Bowd, Nicole Brye et al.
Convolutional neural networks (CNNs) are a promising technique for automated glaucoma diagnosis from images of the fundus, and these images are routinely acquired as part of an ophthalmic exam. Nevertheless, CNNs typically require a large amount of well-labeled data for training, which may not be available in many biomedical image classification applications, especially when diseases are rare and where labeling by experts is costly. This article makes two contributions to address this issue: (1) It extends the conventional Siamese network and introduces a training method for low-shot learning when labeled data are limited and imbalanced, and (2) it introduces a novel semi-supervised learning strategy that uses additional unlabeled training data to achieve greater accuracy. Our proposed multi-task Siamese network (MTSN) can employ any backbone CNN, and we demonstrate with four backbone CNNs that its accuracy with limited training data approaches the accuracy of backbone CNNs trained with a dataset that is 50 times larger. We also introduce One-Vote Veto (OVV) self-training, a semi-supervised learning strategy that is designed specifically for MTSNs. By taking both self-predictions and contrastive predictions of the unlabeled training data into account, OVV self-training provides additional pseudo labels for fine-tuning a pre-trained MTSN. Using a large (imbalanced) dataset with 66,715 fundus photographs acquired over 15 years, extensive experimental results demonstrate the effectiveness of low-shot learning with MTSN and semi-supervised learning with OVV self-training. Three additional, smaller clinical datasets of fundus images acquired under different conditions (cameras, instruments, locations, populations) are used to demonstrate the generalizability of the proposed methods.