CVAug 17, 2022Code
Self-Supervised Depth Estimation in Laparoscopic Image using 3D Geometric ConsistencyBaoru Huang, Jian-Qing Zheng, Anh Nguyen et al.
Depth estimation is a crucial step for image-guided intervention in robotic surgery and laparoscopic imaging system. Since per-pixel depth ground truth is difficult to acquire for laparoscopic image data, it is rarely possible to apply supervised depth estimation to surgical applications. As an alternative, self-supervised methods have been introduced to train depth estimators using only synchronized stereo image pairs. However, most recent work focused on the left-right consistency in 2D and ignored valuable inherent 3D information on the object in real world coordinates, meaning that the left-right 3D geometric structural consistency is not fully utilized. To overcome this limitation, we present M3Depth, a self-supervised depth estimator to leverage 3D geometric structural information hidden in stereo pairs while keeping monocular inference. The method also removes the influence of border regions unseen in at least one of the stereo images via masking, to enhance the correspondences between left and right images in overlapping areas. Intensive experiments show that our method outperforms previous self-supervised approaches on both a public dataset and a newly acquired dataset by a large margin, indicating a good generalization across different samples and laparoscopes. Code and data are available at https://github.com/br0202/M3Depth.
CVSep 29, 2024Code
Tracking Everything in Robotic-Assisted SurgeryBohan Zhan, Wang Zhao, Yi Fang et al.
Accurate tracking of tissues and instruments in videos is crucial for Robotic-Assisted Minimally Invasive Surgery (RAMIS), as it enables the robot to comprehend the surgical scene with precise locations and interactions of tissues and tools. Traditional keypoint-based sparse tracking is limited by featured points, while flow-based dense two-view matching suffers from long-term drifts. Recently, the Tracking Any Point (TAP) algorithm was proposed to overcome these limitations and achieve dense accurate long-term tracking. However, its efficacy in surgical scenarios remains untested, largely due to the lack of a comprehensive surgical tracking dataset for evaluation. To address this gap, we introduce a new annotated surgical tracking dataset for benchmarking tracking methods for surgical scenarios, comprising real-world surgical videos with complex tissue and instrument motions. We extensively evaluate state-of-the-art (SOTA) TAP-based algorithms on this dataset and reveal their limitations in challenging surgical scenarios, including fast instrument motion, severe occlusions, and motion blur, etc. Furthermore, we propose a new tracking method, namely SurgMotion, to solve the challenges and further improve the tracking performance. Our proposed method outperforms most TAP-based algorithms in surgical instruments tracking, and especially demonstrates significant improvements over baselines in challenging medical videos. Our code and dataset are available at https://github.com/zhanbh1019/SurgicalMotion.
IVJul 7, 2023Code
Detecting the Sensing Area of A Laparoscopic Probe in Minimally Invasive Cancer SurgeryBaoru Huang, Yicheng Hu, Anh Nguyen et al.
In surgical oncology, it is challenging for surgeons to identify lymph nodes and completely resect cancer even with pre-operative imaging systems like PET and CT, because of the lack of reliable intraoperative visualization tools. Endoscopic radio-guided cancer detection and resection has recently been evaluated whereby a novel tethered laparoscopic gamma detector is used to localize a preoperatively injected radiotracer. This can both enhance the endoscopic imaging and complement preoperative nuclear imaging data. However, gamma activity visualization is challenging to present to the operator because the probe is non-imaging and it does not visibly indicate the activity origination on the tissue surface. Initial failed attempts used segmentation or geometric methods, but led to the discovery that it could be resolved by leveraging high-dimensional image features and probe position information. To demonstrate the effectiveness of this solution, we designed and implemented a simple regression network that successfully addressed the problem. To further validate the proposed solution, we acquired and publicly released two datasets captured using a custom-designed, portable stereo laparoscope system. Through intensive experimentation, we demonstrated that our method can successfully and effectively detect the sensing area, establishing a new performance benchmark. Code and data are available at https://github.com/br0202/Sensing_area_detection.git
CVMay 22
MuellerPT: Decomposition Driven Pretraining for Dense Learning in Mueller PolarimetryAdam Tlemsani, Yingdian Li, Maxime Giot et al.
Mueller matrix imaging provides rich, physically meaningful contrast for biomedical tissue analysis, but supervised learning is hindered by scarce dense annotations and strong domain shifts across specimens and acquisition settings. We introduce MuellerPT, a physics guided pre-training approach that learns transferable dense representations by predicting Lu-Chipman decomposition maps from per-pixel 4x4 Mueller matrices. To scale pre-training, we collected a new large Multispectral Animal Polarimetric Organ dataset (MAP-Org). The pre-trained encoder is adapted with a segmentation head for grey vs. white matter segmentation in lamb brain. A classification head is used for colorectal cancer vs. non-cancer classification. Both segmentation and classification are evaluated across few-shot learning scenarios. In segmentation, MuellerPT improves label efficiency and cross specimen transfer compared to models without pre-training, achieving an absolute DICE gain of over 20% compared to the baseline trained from scratch when using 5% of the training data. In classification, MuellerPT also enhances label efficiency, improving overall accuracy by 8% compared to the baseline when using 1% of the training data. We demonstrate MuellerPT's robustness to domain shift with a qualitative evaluation of its predicted Lu-Chipman maps on an ex vivo human oesophagus sample. These results suggest that predicting Lu-Chipman decomposition is an effective and practical pretext task for robust biomedical inference from Mueller polarimetry and can pave the way for future work on label efficient Mueller imaging.
CVOct 11, 2024
SurgicalGS: Dynamic 3D Gaussian Splatting for Accurate Robotic-Assisted Surgical Scene ReconstructionJialei Chen, Xin Zhang, Mobarakol Islam et al.
Accurate 3D reconstruction of dynamic surgical scenes from endoscopic video is essential for robotic-assisted surgery. While recent 3D Gaussian Splatting methods have shown promise in achieving high-quality reconstructions with fast rendering speeds, their use of inverse depth loss functions compresses depth variations. This can lead to a loss of fine geometric details, limiting their ability to capture precise 3D geometry and effectiveness in intraoperative application. To address these challenges, we present SurgicalGS, a dynamic 3D Gaussian Splatting framework specifically designed for surgical scene reconstruction with improved geometric accuracy. Our approach first initialises a Gaussian point cloud using depth priors, employing binary motion masks to identify pixels with significant depth variations and fusing point clouds from depth maps across frames for initialisation. We use the Flexible Deformation Model to represent dynamic scene and introduce a normalised depth regularisation loss along with an unsupervised depth smoothness constraint to ensure more accurate geometric reconstruction. Extensive experiments on two real surgical datasets demonstrate that SurgicalGS achieves state-of-the-art reconstruction quality, especially in terms of accurate geometry, advancing the usability of 3D Gaussian Splatting in robotic-assisted surgery.
CVJun 30, 2025
Towards Markerless Intraoperative Tracking of Deformable Spine TissueConnor Daly, Elettra Marconi, Marco Riva et al.
Consumer-grade RGB-D imaging for intraoperative orthopedic tissue tracking is a promising method with high translational potential. Unlike bone-mounted tracking devices, markerless tracking can reduce operating time and complexity. However, its use has been limited to cadaveric studies. This paper introduces the first real-world clinical RGB-D dataset for spine surgery and develops SpineAlign, a system for capturing deformation between preoperative and intraoperative spine states. We also present an intraoperative segmentation network trained on this data and introduce CorrespondNet, a multi-task framework for predicting key regions for registration in both intraoperative and preoperative scenes.
CVAug 1, 2025
SAMSA 2.0: Prompting Segment Anything with Spectral Angles for Hyperspectral Interactive Medical Image SegmentationAlfie Roddan, Tobias Czempiel, Chi Xu et al.
We present SAMSA 2.0, an interactive segmentation framework for hyperspectral medical imaging that introduces spectral angle prompting to guide the Segment Anything Model (SAM) using spectral similarity alongside spatial cues. This early fusion of spectral information enables more accurate and robust segmentation across diverse spectral datasets. Without retraining, SAMSA 2.0 achieves up to +3.8% higher Dice scores compared to RGB-only models and up to +3.1% over prior spectral fusion methods. Our approach enhances few-shot and zero-shot performance, demonstrating strong generalization in challenging low-data and noisy scenarios common in clinical imaging.
CVJul 31, 2025
SAMSA: Segment Anything Model Enhanced with Spectral Angles for Hyperspectral Interactive Medical Image SegmentationAlfie Roddan, Tobias Czempiel, Chi Xu et al.
Hyperspectral imaging (HSI) provides rich spectral information for medical imaging, yet encounters significant challenges due to data limitations and hardware variations. We introduce SAMSA, a novel interactive segmentation framework that combines an RGB foundation model with spectral analysis. SAMSA efficiently utilizes user clicks to guide both RGB segmentation and spectral similarity computations. The method addresses key limitations in HSI segmentation through a unique spectral feature fusion strategy that operates independently of spectral band count and resolution. Performance evaluation on publicly available datasets has shown 81.0% 1-click and 93.4% 5-click DICE on a neurosurgical and 81.1% 1-click and 89.2% 5-click DICE on an intraoperative porcine hyperspectral dataset. Experimental results demonstrate SAMSA's effectiveness in few-shot and zero-shot learning scenarios and using minimal training examples. Our approach enables seamless integration of datasets with different spectral characteristics, providing a flexible framework for hyperspectral medical image analysis.
IVJul 9, 2021
Self-Supervised Generative Adversarial Network for Depth Estimation in Laparoscopic ImagesBaoru Huang, Jianqing Zheng, Anh Nguyen et al.
Dense depth estimation and 3D reconstruction of a surgical scene are crucial steps in computer assisted surgery. Recent work has shown that depth estimation from a stereo images pair could be solved with convolutional neural networks. However, most recent depth estimation models were trained on datasets with per-pixel ground truth. Such data is especially rare for laparoscopic imaging, making it hard to apply supervised depth estimation to real surgical applications. To overcome this limitation, we propose SADepth, a new self-supervised depth estimation method based on Generative Adversarial Networks. It consists of an encoder-decoder generator and a discriminator to incorporate geometry constraints during training. Multi-scale outputs from the generator help to solve the local minima caused by the photometric reprojection loss, while the adversarial learning improves the framework generation quality. Extensive experiments on two public datasets show that SADepth outperforms recent state-of-the-art unsupervised methods by a large margin, and reduces the gap between supervised and unsupervised depth estimation in laparoscopic images.
CVApr 22, 2021
H-Net: Unsupervised Attention-based Stereo Depth Estimation Leveraging Epipolar GeometryBaoru Huang, Jian-Qing Zheng, Stamatia Giannarou et al.
Depth estimation from a stereo image pair has become one of the most explored applications in computer vision, with most of the previous methods relying on fully supervised learning settings. However, due to the difficulty in acquiring accurate and scalable ground truth data, the training of fully supervised methods is challenging. As an alternative, self-supervised methods are becoming more popular to mitigate this challenge. In this paper, we introduce the H-Net, a deep-learning framework for unsupervised stereo depth estimation that leverages epipolar geometry to refine stereo matching. For the first time, a Siamese autoencoder architecture is used for depth estimation which allows mutual information between the rectified stereo images to be extracted. To enforce the epipolar constraint, the mutual epipolar attention mechanism has been designed which gives more emphasis to correspondences of features which lie on the same epipolar line while learning mutual information between the input stereo pair. Stereo correspondences are further enhanced by incorporating semantic information to the proposed attention mechanism. More specifically, the optimal transport algorithm is used to suppress attention and eliminate outliers in areas not visible in both cameras. Extensive experiments on KITTI2015 and Cityscapes show that our method outperforms the state-ofthe-art unsupervised stereo depth estimation methods while closing the gap with the fully supervised approaches.
CVMay 1, 2019
Estimation of Tissue Oxygen Saturation from RGB images and Sparse Hyperspectral Signals based on Conditional Generative Adversarial NetworkQingbiao Li, Jianyu Lin, Neil T. Clancy et al.
Purpose: Intra-operative measurement of tissue oxygen saturation (StO2) is important in the detection of ischemia, monitoring perfusion and identifying disease. Hyperspectral imaging (HSI) measures the optical reflectance spectrum of the tissue and uses this information to quantify its composition, including StO2. However, real-time monitoring is difficult due to the capture rate and data processing time. Methods: An endoscopic system based on a multi-fiber probe was previously developed to sparsely capture HSI data (sHSI). These were combined with RGB images, via a deep neural network, to generate high-resolution hypercubes and calculate StO2. To improve accuracy and processing speed, we propose a dual-input conditional generative adversarial network (cGAN), Dual2StO2, to directly estimate StO2 by fusing features from both RGB and sHSI. Results: Validation experiments were carried out on in vivo porcine bowel data, where the ground truth StO2 was generated from the HSI camera. The performance was also compared to our previous super-spectral-resolution network, SSRNet in terms of mean StO2 prediction accuracy and structural similarity metrics. Dual2StO2 was also tested using simulated probe data with varying fiber number. Conclusions: StO2 estimation by Dual2StO2 is visually closer to ground truth in general structure, achieves higher prediction accuracy and faster processing speed than SSRNet. Simulations showed that results improved when a greater number of fibers are used in the probe. Future work will include refinement of the network architecture, hardware optimization based on simulation results, and evaluation of the technique in clinical applications beyond StO2 estimation.
CVApr 19, 2018
Estimation of Tissue Oxygen Saturation from RGB Images based on Pixel-level Image TranslationQing-Biao Li, Xiao-Yun Zhou, Jianyu Lin et al.
Intra-operative measurement of tissue oxygen saturation (StO2) has been widely explored by pulse oximetry or hyperspectral imaging (HSI) to assess the function and viability of tissue. In this paper we propose a pixel- level image-to-image translation approach based on conditional Generative Adversarial Networks (cGAN) to estimate tissue oxygen saturation (StO2) directly from RGB images. The real-time performance and non-reliance on additional hardware, enable a seamless integration of the proposed method into surgical and diagnostic workflows with standard endoscope systems. For validation, RGB images and StO2 ground truth were simulated and estimated from HSI images collected by a liquid crystal tuneable filter (LCTF) endoscope for three tissue types (porcine bowel, lamb uterus and rabbit uterus). The result show that the proposed method can achieve visually identical images with comparable accuracy.
CVJun 20, 2017
Recovering Dense Tissue Multispectral Signal from in vivo RGB ImagesJianyu Lin, Neil T. Clancy, Daniel S. Elson
Hyperspectral/multispectral imaging (HSI/MSI) contains rich information clinical applications, such as 1) narrow band imaging for vascular visualisation; 2) oxygen saturation for intraoperative perfusion monitoring and clinical decision making [1]; 3) tissue classification and identification of pathology [2]. The current systems which provide pixel-level HSI/MSI signal can be generally divided into two types: spatial scanning and spectral scanning. However, the trade-off between spatial/spectral resolution, the acquisition time, and the hardware complexity hampers implementation in real-world applications, especially intra-operatively. Acquiring high resolution images in real-time is important for HSI/MSI in intra-operative imaging, to alleviate the side effect caused by breathing, heartbeat, and other sources of motion. Therefore, we developed an algorithm to recover a pixel-level MSI stack using only the captured snapshot RGB images from a normal camera. We refer to this technique as "super-spectral-resolution". The proposed method enables recovery of pixel-level-dense MSI signals with 24 spectral bands at ~11 frames per second (FPS) on a GPU. Multispectral data captured from porcine bowel and sheep/rabbit uteri in vivo has been used for training, and the algorithm has been validated using unseen in vivo animal experiments.
CVJun 19, 2017
Endoscopic Depth Measurement and Super-Spectral-Resolution ImagingJianyu Lin, Neil T. Clancy, Yang Hu et al.
Intra-operative measurements of tissue shape and multi/ hyperspectral information have the potential to provide surgical guidance and decision making support. We report an optical probe based system to combine sparse hyperspectral measurements and spectrally-encoded structured lighting (SL) for surface measurements. The system provides informative signals for navigation with a surgical interface. By rapidly switching between SL and white light (WL) modes, SL information is combined with structure-from-motion (SfM) from white light images, based on SURF feature detection and Lucas-Kanade (LK) optical flow to provide quasi-dense surface shape reconstruction with known scale in real-time. Furthermore, "super-spectral-resolution" was realized, whereby the RGB images and sparse hyperspectral data were integrated to recover dense pixel-level hyperspectral stacks, by using convolutional neural networks to upscale the wavelength dimension. Validation and demonstration of this system is reported on ex vivo/in vivo animal/ human experiments.
CVJul 11, 2016
Inference of Haemoglobin Concentration From Stereo RGBGeoffrey Jones, Neil T. Clancy, Yusuf Helo et al.
Multispectral imaging (MSI) can provide information about tissue oxygenation, perfusion and potentially function during surgery. In this paper we present a novel, near real-time technique for intrinsic measurements of total haemoglobin (THb) and blood oxygenation (SO2) in tissue using only RGB images from a stereo laparoscope. The high degree of spectral overlap between channels makes inference of haemoglobin concentration challenging, non-linear and under constrained. We decompose the problem into two constrained linear sub-problems and show that with Tikhonov regularisation the estimation significantly improves, giving robust estimation of the Thb. We demonstrate by using the co-registered stereo image data from two cameras it is possible to get robust SO2 estimation as well. Our method is closed from, providing computational efficiency even with multiple cameras. The method we present requires only spectral response calibration of each camera, without modification of existing laparoscopic imaging hardware. We validate our technique on synthetic data from Monte Carlo simulation % of light transport through soft tissue containing submerged blood vessels and further, in vivo, on a multispectral porcine data set.
CVJun 15, 2016
Probe-based Rapid Hybrid Hyperspectral and Tissue Surface Imaging Aided by Fully Convolutional NetworksJianyu Lin, Neil T. Clancy, Xueqing Sun et al.
Tissue surface shape and reflectance spectra provide rich intra-operative information useful in surgical guidance. We propose a hybrid system which displays an endoscopic image with a fast joint inspection of tissue surface shape using structured light (SL) and hyperspectral imaging (HSI). For SL a miniature fibre probe is used to project a coloured spot pattern onto the tissue surface. In HSI mode standard endoscopic illumination is used, with the fibre probe collecting reflected light and encoding the spatial information into a linear format that can be imaged onto the slit of a spectrograph. Correspondence between the arrangement of fibres at the distal and proximal ends of the bundle was found using spectral encoding. Then during pattern decoding, a fully convolutional network (FCN) was used for spot detection, followed by a matching propagation algorithm for spot identification. This method enabled fast reconstruction (12 frames per second) using a GPU. The hyperspectral image was combined with the white light image and the reconstructed surface, showing the spectral information of different areas. Validation of this system using phantom and ex vivo experiments has been demonstrated.