ROJan 17, 2023
Robotic Navigation Autonomy for Subretinal Injection via Intelligent Real-Time Virtual iOCT Volume SlicingShervin Dehghani, Michael Sommersperger, Peiyao Zhang et al.
In the last decade, various robotic platforms have been introduced that could support delicate retinal surgeries. Concurrently, to provide semantic understanding of the surgical area, recent advances have enabled microscope-integrated intraoperative Optical Coherent Tomography (iOCT) with high-resolution 3D imaging at near video rate. The combination of robotics and semantic understanding enables task autonomy in robotic retinal surgery, such as for subretinal injection. This procedure requires precise needle insertion for best treatment outcomes. However, merging robotic systems with iOCT introduces new challenges. These include, but are not limited to high demands on data processing rates and dynamic registration of these systems during the procedure. In this work, we propose a framework for autonomous robotic navigation for subretinal injection, based on intelligent real-time processing of iOCT volumes. Our method consists of an instrument pose estimation method, an online registration between the robotic and the iOCT system, and trajectory planning tailored for navigation to an injection target. We also introduce intelligent virtual B-scans, a volume slicing approach for rapid instrument pose estimation, which is enabled by Convolutional Neural Networks (CNNs). Our experiments on ex-vivo porcine eyes demonstrate the precision and repeatability of the method. Finally, we discuss identified challenges in this work and suggest potential solutions to further the development of such systems.
RONov 15, 2023
EyeLS: Shadow-Guided Instrument Landing System for Intraocular Target Approaching in Robotic Eye SurgeryJunjie Yang, Zhihao Zhao, Siyuan Shen et al.
Robotic ophthalmic surgery is an emerging technology to facilitate high-precision interventions such as retina penetration in subretinal injection and removal of floating tissues in retinal detachment depending on the input imaging modalities such as microscopy and intraoperative OCT (iOCT). Although iOCT is explored to locate the needle tip within its range-limited ROI, it is still difficult to coordinate iOCT's motion with the needle, especially at the initial target-approaching stage. Meanwhile, due to 2D perspective projection and thus the loss of depth information, current image-based methods cannot effectively estimate the needle tip's trajectory towards both retinal and floating targets. To address this limitation, we propose to use the shadow positions of the target and the instrument tip to estimate their relative depth position and accordingly optimize the instrument tip's insertion trajectory until the tip approaches targets within iOCT's scanning area. Our method succeeds target approaching on a retina model, and achieves an average depth error of 0.0127 mm and 0.3473 mm for floating and retinal targets respectively in the surgical simulator without damaging the retina.
IVSep 19, 2024
KLDD: Kalman Filter based Linear Deformable Diffusion Model in Retinal Image SegmentationZhihao Zhao, Yinzheng Zhao, Junjie Yang et al.
AI-based vascular segmentation is becoming increasingly common in enhancing the screening and treatment of ophthalmic diseases. Deep learning structures based on U-Net have achieved relatively good performance in vascular segmentation. However, small blood vessels and capillaries tend to be lost during segmentation when passed through the traditional U-Net downsampling module. To address this gap, this paper proposes a novel Kalman filter based Linear Deformable Diffusion (KLDD) model for retinal vessel segmentation. Our model employs a diffusion process that iteratively refines the segmentation, leveraging the flexible receptive fields of deformable convolutions in feature extraction modules to adapt to the detailed tubular vascular structures. More specifically, we first employ a feature extractor with linear deformable convolution to capture vascular structure information form the input images. To better optimize the coordinate positions of deformable convolution, we employ the Kalman filter to enhance the perception of vascular structures in linear deformable convolution. Subsequently, the features of the vascular structures extracted are utilized as a conditioning element within a diffusion model by the Cross-Attention Aggregation module (CAAM) and the Channel-wise Soft Attention module (CSAM). These aggregations are designed to enhance the diffusion model's capability to generate vascular structures. Experiments are evaluated on retinal fundus image datasets (DRIVE, CHASE_DB1) as well as the 3mm and 6mm of the OCTA-500 dataset, and the results show that the diffusion model proposed in this paper outperforms other methods.
CVSep 30, 2024
AI-Based Fully Automatic Analysis of Retinal Vascular Morphology in Pediatric High MyopiaYinzheng Zhao, Zhihao Zhao, Junjie Yang et al.
Purpose: To investigate the changes in retinal vascular structures associated various stages of myopia by designing automated software based on an artif intelligencemodel. Methods: The study involved 1324 pediatric participants from the National Childr Medical Center in China, and 2366 high-quality retinal images and correspon refractive parameters were obtained and analyzed. Spherical equivalent refrac(SER) degree was calculated. We proposed a data analysis model based c combination of the Convolutional Neural Networks (CNN) model and the atter module to classify images, segment vascular structures, and measure vasc parameters, such as main angle (MA), branching angle (BA), bifurcation edge al(BEA) and bifurcation edge coefficient (BEC). One-way ANOVA compared param measurements betweenthenormalfundus,lowmyopia,moderate myopia,and high myopia group. Results: There were 279 (12.38%) images in normal group and 384 (16.23%) images in the high myopia group. Compared normal fundus, the MA of fundus vessels in different myopic refractive groups significantly reduced (P = 0.006, P = 0.004, P = 0.019, respectively), and performance of the venous system was particularly obvious (P<0.001). At the sa time, the BEC decreased disproportionately (P<0.001). Further analysis of fundus vascular parameters at different degrees of myopia showed that there were also significant differences in BA and branching coefficient (BC). The arterial BA value of the fundus vessel in the high myopia group was lower than that of other groups (P : 0.032, 95% confidence interval [Ci], 0.22-4.86), while the venous BA values increased(P = 0.026). The BEC values of high myopia were higher than those of low and moderate myopia groups. When the loss function of our data classification model converged to 0.09,the model accuracy reached 94.19%
IVOct 28, 2024Code
KaLDeX: Kalman Filter based Linear Deformable Cross Attention for Retina Vessel SegmentationZhihao Zhao, Shahrooz Faghihroohi, Yinzheng Zhao et al.
Background and Objective: In the realm of ophthalmic imaging, accurate vascular segmentation is paramount for diagnosing and managing various eye diseases. Contemporary deep learning-based vascular segmentation models rival human accuracy but still face substantial challenges in accurately segmenting minuscule blood vessels in neural network applications. Due to the necessity of multiple downsampling operations in the CNN models, fine details from high-resolution images are inevitably lost. The objective of this study is to design a structure to capture the delicate and small blood vessels. Methods: To address these issues, we propose a novel network (KaLDeX) for vascular segmentation leveraging a Kalman filter based linear deformable cross attention (LDCA) module, integrated within a UNet++ framework. Our approach is based on two key components: Kalman filter (KF) based linear deformable convolution (LD) and cross-attention (CA) modules. The LD module is designed to adaptively adjust the focus on thin vessels that might be overlooked in standard convolution. The CA module improves the global understanding of vascular structures by aggregating the detailed features from the LD module with the high level features from the UNet++ architecture. Finally, we adopt a topological loss function based on persistent homology to constrain the topological continuity of the segmentation. Results: The proposed method is evaluated on retinal fundus image datasets (DRIVE, CHASE_BD1, and STARE) as well as the 3mm and 6mm of the OCTA-500 dataset, achieving an average accuracy (ACC) of 97.25%, 97.77%, 97.85%, 98.89%, and 98.21%, respectively. Conclusions: Empirical evidence shows that our method outperforms the current best models on different vessel segmentation datasets. Our source code is available at: https://github.com/AIEyeSystem/KalDeX.
CVOct 28, 2024
Extrapolating Prospective Glaucoma Fundus Images through Diffusion Model in Irregular Longitudinal SequencesZhihao Zhao, Junjie Yang, Shahrooz Faghihroohi et al.
The utilization of longitudinal datasets for glaucoma progression prediction offers a compelling approach to support early therapeutic interventions. Predominant methodologies in this domain have primarily focused on the direct prediction of glaucoma stage labels from longitudinal datasets. However, such methods may not adequately encapsulate the nuanced developmental trajectory of the disease. To enhance the diagnostic acumen of medical practitioners, we propose a novel diffusion-based model to predict prospective images by extrapolating from existing longitudinal fundus images of patients. The methodology delineated in this study distinctively leverages sequences of images as inputs. Subsequently, a time-aligned mask is employed to select a specific year for image generation. During the training phase, the time-aligned mask resolves the issue of irregular temporal intervals in longitudinal image sequence sampling. Additionally, we utilize a strategy of randomly masking a frame in the sequence to establish the ground truth. This methodology aids the network in continuously acquiring knowledge regarding the internal relationships among the sequences throughout the learning phase. Moreover, the introduction of textual labels is instrumental in categorizing images generated within the sequence. The empirical findings from the conducted experiments indicate that our proposed model not only effectively generates longitudinal data but also significantly improves the precision of downstream classification tasks.
CVSep 25, 2025
Decoding the Surgical Scene: A Scoping Review of Scene Graphs in SurgeryAngelo Henriques, Korab Hoxha, Daniel Zapp et al.
Scene graphs (SGs) provide structured relational representations crucial for decoding complex, dynamic surgical environments. This PRISMA-ScR-guided scoping review systematically maps the evolving landscape of SG research in surgery, charting its applications, methodological advancements, and future directions. Our analysis reveals rapid growth, yet uncovers a critical 'data divide': internal-view research (e.g., triplet recognition) almost exclusively uses real-world 2D video, while external-view 4D modeling relies heavily on simulated data, exposing a key translational research gap. Methodologically, the field has advanced from foundational graph neural networks to specialized foundation models that now significantly outperform generalist large vision-language models in surgical contexts. This progress has established SGs as a cornerstone technology for both analysis, such as workflow recognition and automated safety monitoring, and generative tasks like controllable surgical simulation. Although challenges in data annotation and real-time implementation persist, they are actively being addressed through emerging techniques. Surgical SGs are maturing into an essential semantic bridge, enabling a new generation of intelligent systems to improve surgical safety, efficiency, and training.
CVSep 10, 2025
UOPSL: Unpaired OCT Predilection Sites Learning for Fundus Image Diagnosis AugmentationZhihao Zhao, Yinzheng Zhao, Junjie Yang et al.
Significant advancements in AI-driven multimodal medical image diagnosis have led to substantial improvements in ophthalmic disease identification in recent years. However, acquiring paired multimodal ophthalmic images remains prohibitively expensive. While fundus photography is simple and cost-effective, the limited availability of OCT data and inherent modality imbalance hinder further progress. Conventional approaches that rely solely on fundus or textual features often fail to capture fine-grained spatial information, as each imaging modality provides distinct cues about lesion predilection sites. In this study, we propose a novel unpaired multimodal framework \UOPSL that utilizes extensive OCT-derived spatial priors to dynamically identify predilection sites, enhancing fundus image-based disease recognition. Our approach bridges unpaired fundus and OCTs via extended disease text descriptions. Initially, we employ contrastive learning on a large corpus of unpaired OCT and fundus images while simultaneously learning the predilection sites matrix in the OCT latent space. Through extensive optimization, this matrix captures lesion localization patterns within the OCT feature space. During the fine-tuning or inference phase of the downstream classification task based solely on fundus images, where paired OCT data is unavailable, we eliminate OCT input and utilize the predilection sites matrix to assist in fundus image classification learning. Extensive experiments conducted on 9 diverse datasets across 28 critical categories demonstrate that our framework outperforms existing benchmarks.
CVSep 10, 2025
CLAPS: A CLIP-Unified Auto-Prompt Segmentation for Multi-Modal Retinal ImagingZhihao Zhao, Yinzheng Zhao, Junjie Yang et al.
Recent advancements in foundation models, such as the Segment Anything Model (SAM), have significantly impacted medical image segmentation, especially in retinal imaging, where precise segmentation is vital for diagnosis. Despite this progress, current methods face critical challenges: 1) modality ambiguity in textual disease descriptions, 2) a continued reliance on manual prompting for SAM-based workflows, and 3) a lack of a unified framework, with most methods being modality- and task-specific. To overcome these hurdles, we propose CLIP-unified Auto-Prompt Segmentation (\CLAPS), a novel method for unified segmentation across diverse tasks and modalities in retinal imaging. Our approach begins by pre-training a CLIP-based image encoder on a large, multi-modal retinal dataset to handle data scarcity and distribution imbalance. We then leverage GroundingDINO to automatically generate spatial bounding box prompts by detecting local lesions. To unify tasks and resolve ambiguity, we use text prompts enhanced with a unique "modality signature" for each imaging modality. Ultimately, these automated textual and spatial prompts guide SAM to execute precise segmentation, creating a fully automated and unified pipeline. Extensive experiments on 12 diverse datasets across 11 critical segmentation categories show that CLAPS achieves performance on par with specialized expert models while surpassing existing benchmarks across most metrics, demonstrating its broad generalizability as a foundation model.
RONov 30, 2021
ColibriDoc: An Eye-in-Hand Autonomous Trocar Docking SystemShervin Dehghani, Michael Sommersperger, Junjie Yang et al.
Retinal surgery is a complex medical procedure that requires exceptional expertise and dexterity. For this purpose, several robotic platforms are currently being developed to enable or improve the outcome of microsurgical tasks. Since the control of such robots is often designed for navigation inside the eye in proximity to the retina, successful trocar docking and inserting the instrument into the eye represents an additional cognitive effort, and is, therefore, one of the open challenges in robotic retinal surgery. For this purpose, we present a platform for autonomous trocar docking that combines computer vision and a robotic setup. Inspired by the Cuban Colibri (hummingbird) aligning its beak to a flower using only vision, we mount a camera onto the endeffector of a robotic system. By estimating the position and pose of the trocar, the robot is able to autonomously align and navigate the instrument towards the Trocar's Entry Point (TEP) and finally perform the insertion. Our experiments show that the proposed method is able to accurately estimate the position and pose of the trocar and achieve repeatable autonomous docking. The aim of this work is to reduce the complexity of robotic setup preparation prior to the surgical task and therefore, increase the intuitiveness of the system integration into the clinical workflow.
ROAug 17, 2018
Towards Robotic Eye Surgery: Marker-free, Online Hand-eye Calibration using Optical Coherence Tomography ImagesMingchuan Zhou, Mahdi Hamad, Jakob Weiss et al.
Ophthalmic microsurgery is known to be a challenging operation, which requires very precise and dexterous manipulation. Image guided robot-assisted surgery (RAS) is a promising solution that brings significant improvements in outcomes and reduces the physical limitations of human surgeons. However, this technology must be further developed before it can be routinely used in clinics. One of the problems is the lack of proper calibration between the robotic manipulator and appropriate imaging device. In this work, we developed a flexible framework for hand-eye calibration of an ophthalmic robot with a microscope-integrated Optical Coherence Tomography (MIOCT) without any markers. The proposed method consists of three main steps: a) we estimate the OCT calibration parameters; b) with micro-scale displacements controlled by the robot, we detect and segment the needle tip in 3D-OCT volume; c) we find the transformation between the coordinate system of the OCT camera and the coordinate system of the robot. We verified the capability of our framework in ex-vivo pig eye experiments and compared the results with a reference method (marker-based). In all experiments, our method showed a small difference from the marker based method, with a mean calibration error of 9.2 $μ$m and 7.0 $μ$m, respectively. Additionally, the noise test shows the robustness of the proposed method.