CVAug 17, 2022Code
Self-Supervised Depth Estimation in Laparoscopic Image using 3D Geometric ConsistencyBaoru Huang, Jian-Qing Zheng, Anh Nguyen et al.
Depth estimation is a crucial step for image-guided intervention in robotic surgery and laparoscopic imaging system. Since per-pixel depth ground truth is difficult to acquire for laparoscopic image data, it is rarely possible to apply supervised depth estimation to surgical applications. As an alternative, self-supervised methods have been introduced to train depth estimators using only synchronized stereo image pairs. However, most recent work focused on the left-right consistency in 2D and ignored valuable inherent 3D information on the object in real world coordinates, meaning that the left-right 3D geometric structural consistency is not fully utilized. To overcome this limitation, we present M3Depth, a self-supervised depth estimator to leverage 3D geometric structural information hidden in stereo pairs while keeping monocular inference. The method also removes the influence of border regions unseen in at least one of the stereo images via masking, to enhance the correspondences between left and right images in overlapping areas. Intensive experiments show that our method outperforms previous self-supervised approaches on both a public dataset and a newly acquired dataset by a large margin, indicating a good generalization across different samples and laparoscopes. Code and data are available at https://github.com/br0202/M3Depth.
IVJul 7, 2023Code
Detecting the Sensing Area of A Laparoscopic Probe in Minimally Invasive Cancer SurgeryBaoru Huang, Yicheng Hu, Anh Nguyen et al.
In surgical oncology, it is challenging for surgeons to identify lymph nodes and completely resect cancer even with pre-operative imaging systems like PET and CT, because of the lack of reliable intraoperative visualization tools. Endoscopic radio-guided cancer detection and resection has recently been evaluated whereby a novel tethered laparoscopic gamma detector is used to localize a preoperatively injected radiotracer. This can both enhance the endoscopic imaging and complement preoperative nuclear imaging data. However, gamma activity visualization is challenging to present to the operator because the probe is non-imaging and it does not visibly indicate the activity origination on the tissue surface. Initial failed attempts used segmentation or geometric methods, but led to the discovery that it could be resolved by leveraging high-dimensional image features and probe position information. To demonstrate the effectiveness of this solution, we designed and implemented a simple regression network that successfully addressed the problem. To further validate the proposed solution, we acquired and publicly released two datasets captured using a custom-designed, portable stereo laparoscope system. Through intensive experimentation, we demonstrated that our method can successfully and effectively detect the sensing area, establishing a new performance benchmark. Code and data are available at https://github.com/br0202/Sensing_area_detection.git
CVFeb 6, 2023
SurgT challenge: Benchmark of Soft-Tissue Trackers for Robotic SurgeryJoao Cartucho, Alistair Weld, Samyakh Tukra et al.
This paper introduces the ``SurgT: Surgical Tracking" challenge which was organised in conjunction with MICCAI 2022. There were two purposes for the creation of this challenge: (1) the establishment of the first standardised benchmark for the research community to assess soft-tissue trackers; and (2) to encourage the development of unsupervised deep learning methods, given the lack of annotated data in surgery. A dataset of 157 stereo endoscopic videos from 20 clinical cases, along with stereo camera calibration parameters, have been provided. Participants were assigned the task of developing algorithms to track the movement of soft tissues, represented by bounding boxes, in stereo endoscopic videos. At the end of the challenge, the developed methods were assessed on a previously hidden test subset. This assessment uses benchmarking metrics that were purposely developed for this challenge, to verify the efficacy of unsupervised deep learning algorithms in tracking soft-tissue. The metric used for ranking the methods was the Expected Average Overlap (EAO) score, which measures the average overlap between a tracker's and the ground truth bounding boxes. Coming first in the challenge was the deep learning submission by ICVS-2Ai with a superior EAO score of 0.617. This method employs ARFlow to estimate unsupervised dense optical flow from cropped images, using photometric and regularization losses. Second, Jmees with an EAO of 0.583, uses deep learning for surgical tool segmentation on top of a non-deep learning baseline method: CSRT. CSRT by itself scores a similar EAO of 0.563. The results from this challenge show that currently, non-deep learning methods are still competitive. The dataset and benchmarking tool created for this challenge have been made publicly available at https://surgt.grand-challenge.org/.
99.7ROApr 22
Open-H-Embodiment: A Large-Scale Dataset for Enabling Foundation Models in Medical RoboticsOpen-H-Embodiment Consortium, Nigel Nelson, Juo-Tung Chen et al.
Autonomous medical robots hold promise to improve patient outcomes, reduce provider workload, democratize access to care, and enable superhuman precision. However, autonomous medical robotics has been limited by a fundamental data problem: existing medical robotic datasets are small, single-embodiment, and rarely shared openly, restricting the development of foundation models that the field needs to advance. We introduce Open-H-Embodiment, the largest open dataset of medical robotic video with synchronized kinematics to date, spanning more than 49 institutions and multiple robotic platforms including the CMR Versius, Intuitive Surgical's da Vinci, da Vinci Research Kit (dVRK), Rob Surgical BiTrack, Virtual Incision's MIRA, Moon Surgical Maestro, and a variety of custom systems, spanning surgical manipulation, robotic ultrasound, and endoscopy procedures. We demonstrate the research enabled by this dataset through two foundation models. GR00T-H is the first open foundation vision-language-action model for medical robotics, which is the only evaluated model to achieve full end-to-end task completion on a structured suturing benchmark (25% of trials vs. 0% for all others) and achieves 64% average success across a 29-step ex vivo suturing sequence. We also train Cosmos-H-Surgical-Simulator, the first action-conditioned world model to enable multi-embodiment surgical simulation from a single checkpoint, spanning nine robotic platforms and supporting in silico policy evaluation and synthetic data generation for the medical domain. These results suggest that open, large-scale medical robot data collection can serve as critical infrastructure for the research community, enabling advances in robot learning, world modeling, and beyond.
CVJan 19, 2023
Regularising disparity estimation via multi task learning with structured light reconstructionAlistair Weld, Joao Cartucho, Chi Xu et al.
3D reconstruction is a useful tool for surgical planning and guidance. However, the lack of available medical data stunts research and development in this field, as supervised deep learning methods for accurate disparity estimation rely heavily on large datasets containing ground truth information. Alternative approaches to supervision have been explored, such as self-supervision, which can reduce or remove entirely the need for ground truth. However, no proposed alternatives have demonstrated performance capabilities close to what would be expected from a supervised setup. This work aims to alleviate this issue. In this paper, we investigate the learning of structured light projections to enhance the development of direct disparity estimation networks. We show for the first time that it is possible to accurately learn the projection of structured light on a scene, implicitly learning disparity. Secondly, we \textcolor{black}{explore the use of a multi task learning (MTL) framework for the joint training of structured light and disparity. We present results which show that MTL with structured light improves disparity training; without increasing the number of model parameters. Our MTL setup outperformed the single task learning (STL) network in every validation test. Notably, in the medical generalisation test, the STL error was 1.4 times worse than that of the best MTL performance. The benefit of using MTL is emphasised when the training data is limited.} A dataset containing stereoscopic images, disparity maps and structured light projections on medical phantoms and ex vivo tissue was created for evaluation together with virtual scenes. This dataset will be made publicly available in the future.
IVJun 1, 2023
Identifying visible tissue in intraoperative ultrasound: a method and applicationAlistair Weld, Luke Dixon, Giulio Anichini et al.
Purpose: Intraoperative ultrasound scanning is a demanding visuotactile task. It requires operators to simultaneously localise the ultrasound perspective and manually perform slight adjustments to the pose of the probe, making sure not to apply excessive force or breaking contact with the tissue, whilst also characterising the visible tissue. Method: To analyse the probe-tissue contact, an iterative filtering and topological method is proposed to identify the underlying visible tissue, which can be used to detect acoustic shadow and construct confidence maps of perceptual salience. Results: For evaluation, datasets containing both in vivo and medical phantom data are created. A suite of evaluations is performed, including an evaluation of acoustic shadow classification. Compared to an ablation, deep learning, and statistical method, the proposed approach achieves superior classification on in vivo data, achieving an F_beta score of 0.864, in comparison to 0.838, 0.808, 0.808. A novel framework for evaluating the confidence estimation of probe tissue contact is created. The phantom data is captured specifically for this, and comparison is made against two established methods. The proposed method produced the superior response, achieving an average normalised root mean square error of 0.168, in comparison to 1.836 and 4.542. Evaluation is also extended to determine the algorithm's robustness to parameter perturbation, speckle noise, data distribution shift, and capability for guiding a robotic scan. Conclusion: The results of this comprehensive set of experiments justify the potential clinical value of the proposed algorithm, which can be used to support clinical training and robotic ultrasound automation.
ROJan 19, 2023
Collaborative Robotic Ultrasound Tissue Scanning for Surgical Resection Guidance in NeurosurgeryAlistair Weld, Michael Dyck, Julian Klodmann et al.
The aim of this paper is to introduce a robotic platform for autonomous iUS tissue scanning to optimise intraoperative diagnosis and improve surgical resection during robot-assisted operations. To guide anatomy specific robotic scanning and generate a representation of the robot task space, fast and accurate techniques for the recovery of 3D morphological structures of the surgical cavity are developed. The prototypic DLR MIRO surgical robotic arm is used to control the applied force and the in-plane motion of the US transducer. A key application of the proposed platform is the scanning of brain tissue to guide tumour resection.
CVDec 14, 2023
Dietary Assessment with Multimodal ChatGPT: A Systematic AnalysisFrank P. -W. Lo, Jianing Qiu, Zeyu Wang et al.
Conventional approaches to dietary assessment are primarily grounded in self-reporting methods or structured interviews conducted under the supervision of dietitians. These methods, however, are often subjective, potentially inaccurate, and time-intensive. Although artificial intelligence (AI)-based solutions have been devised to automate the dietary assessment process, these prior AI methodologies encounter challenges in their ability to generalize across a diverse range of food types, dietary behaviors, and cultural contexts. This results in AI applications in the dietary field that possess a narrow specialization and limited accuracy. Recently, the emergence of multimodal foundation models such as GPT-4V powering the latest ChatGPT has exhibited transformative potential across a wide range of tasks (e.g., Scene understanding and image captioning) in numerous research domains. These models have demonstrated remarkable generalist intelligence and accuracy, capable of processing various data modalities. In this study, we explore the application of multimodal ChatGPT within the realm of dietary assessment. Our findings reveal that GPT-4V excels in food detection under challenging conditions with accuracy up to 87.5% without any fine-tuning or adaptation using food-specific datasets. By guiding the model with specific language prompts (e.g., African cuisine), it shifts from recognizing common staples like rice and bread to accurately identifying regional dishes like banku and ugali. Another GPT-4V's standout feature is its contextual awareness. GPT-4V can leverage surrounding objects as scale references to deduce the portion sizes of food items, further enhancing its accuracy in translating food weight into nutritional content. This alignment with the USDA National Nutrient Database underscores GPT-4V's potential to advance nutritional science and dietary assessment techniques.
CVOct 31, 2024
Image Synthesis with Class-Aware Semantic Diffusion Models for Surgical Scene SegmentationYihang Zhou, Rebecca Towning, Zaid Awad et al.
Surgical scene segmentation is essential for enhancing surgical precision, yet it is frequently compromised by the scarcity and imbalance of available data. To address these challenges, semantic image synthesis methods based on generative adversarial networks and diffusion models have been developed. However, these models often yield non-diverse images and fail to capture small, critical tissue classes, limiting their effectiveness. In response, we propose the Class-Aware Semantic Diffusion Model (CASDM), a novel approach which utilizes segmentation maps as conditions for image synthesis to tackle data scarcity and imbalance. Novel class-aware mean squared error and class-aware self-perceptual loss functions have been defined to prioritize critical, less visible classes, thereby enhancing image quality and relevance. Furthermore, to our knowledge, we are the first to generate multi-class segmentation maps using text prompts in a novel fashion to specify their contents. These maps are then used by CASDM to generate surgical scene images, enhancing datasets for training and validating segmentation models. Our evaluation, which assesses both image quality and downstream segmentation performance, demonstrates the strong effectiveness and generalisability of CASDM in producing realistic image-map pairs, significantly advancing surgical scene segmentation across diverse and challenging datasets.
CVJul 31, 2025
Explainable Image Classification with Reduced Overconfidence for Tissue CharacterisationAlfie Roddan, Chi Xu, Serine Ajlouni et al.
The deployment of Machine Learning models intraoperatively for tissue characterisation can assist decision making and guide safe tumour resections. For image classification models, pixel attribution methods are popular to infer explainability. However, overconfidence in deep learning model's predictions translates to overconfidence in pixel attribution. In this paper, we propose the first approach which incorporates risk estimation into a pixel attribution method for improved image classification explainability. The proposed method iteratively applies a classification model with a pixel attribution method to create a volume of PA maps. This volume is used for the first time, to generate a pixel-wise distribution of PA values. We introduce a method to generate an enhanced PA map by estimating the expectation values of the pixel-wise distributions. In addition, the coefficient of variation (CV) is used to estimate pixel-wise risk of this enhanced PA map. Hence, the proposed method not only provides an improved PA map but also produces an estimation of risk on the output PA values. Performance evaluation on probe-based Confocal Laser Endomicroscopy (pCLE) data and ImageNet verifies that our improved explainability method outperforms the state-of-the-art.
CVFeb 13, 2025
Standardisation of Convex Ultrasound Data Through Geometric Analysis and AugmentationAlistair Weld, Giovanni Faoro, Luke Dixon et al.
The application of ultrasound in healthcare has seen increased diversity and importance. Unlike other medical imaging modalities, ultrasound research and development has historically lagged, particularly in the case of applications with data-driven algorithms. A significant issue with ultrasound is the extreme variability of the images, due to the number of different machines available and the possible combination of parameter settings. One outcome of this is the lack of standardised and benchmarking ultrasound datasets. The method proposed in this article is an approach to alleviating this issue of disorganisation. For this purpose, the issue of ultrasound data sparsity is examined and a novel perspective, approach, and solution is proposed; involving the extraction of the underlying ultrasound plane within the image and representing it using annulus sector geometry. An application of this methodology is proposed, which is the extraction of scan lines and the linearisation of convex planes. Validation of the robustness of the proposed method is performed on both private and public data. The impact of deformation and the invertibility of augmentation using the estimated annulus sector parameters is also studied. Keywords: Ultrasound, Annulus Sector, Augmentation, Linearisation.
CVJan 6, 2025
SurgRIPE challenge: Benchmark of Surgical Robot Instrument Pose EstimationHaozheng Xu, Alistair Weld, Chi Xu et al.
Accurate instrument pose estimation is a crucial step towards the future of robotic surgery, enabling applications such as autonomous surgical task execution. Vision-based methods for surgical instrument pose estimation provide a practical approach to tool tracking, but they often require markers to be attached to the instruments. Recently, more research has focused on the development of marker-less methods based on deep learning. However, acquiring realistic surgical data, with ground truth instrument poses, required for deep learning training, is challenging. To address the issues in surgical instrument pose estimation, we introduce the Surgical Robot Instrument Pose Estimation (SurgRIPE) challenge, hosted at the 26th International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI) in 2023. The objectives of this challenge are: (1) to provide the surgical vision community with realistic surgical video data paired with ground truth instrument poses, and (2) to establish a benchmark for evaluating markerless pose estimation methods. The challenge led to the development of several novel algorithms that showcased improved accuracy and robustness over existing methods. The performance evaluation study on the SurgRIPE dataset highlights the potential of these advanced algorithms to be integrated into robotic surgery systems, paving the way for more precise and autonomous surgical procedures. The SurgRIPE challenge has successfully established a new benchmark for the field, encouraging further research and development in surgical robot instrument pose estimation.
CVAug 1, 2025
SAMSA 2.0: Prompting Segment Anything with Spectral Angles for Hyperspectral Interactive Medical Image SegmentationAlfie Roddan, Tobias Czempiel, Chi Xu et al.
We present SAMSA 2.0, an interactive segmentation framework for hyperspectral medical imaging that introduces spectral angle prompting to guide the Segment Anything Model (SAM) using spectral similarity alongside spatial cues. This early fusion of spectral information enables more accurate and robust segmentation across diverse spectral datasets. Without retraining, SAMSA 2.0 achieves up to +3.8% higher Dice scores compared to RGB-only models and up to +3.1% over prior spectral fusion methods. Our approach enhances few-shot and zero-shot performance, demonstrating strong generalization in challenging low-data and noisy scenarios common in clinical imaging.
CVJul 31, 2025
SAMSA: Segment Anything Model Enhanced with Spectral Angles for Hyperspectral Interactive Medical Image SegmentationAlfie Roddan, Tobias Czempiel, Chi Xu et al.
Hyperspectral imaging (HSI) provides rich spectral information for medical imaging, yet encounters significant challenges due to data limitations and hardware variations. We introduce SAMSA, a novel interactive segmentation framework that combines an RGB foundation model with spectral analysis. SAMSA efficiently utilizes user clicks to guide both RGB segmentation and spectral similarity computations. The method addresses key limitations in HSI segmentation through a unique spectral feature fusion strategy that operates independently of spectral band count and resolution. Performance evaluation on publicly available datasets has shown 81.0% 1-click and 93.4% 5-click DICE on a neurosurgical and 81.1% 1-click and 89.2% 5-click DICE on an intraoperative porcine hyperspectral dataset. Experimental results demonstrate SAMSA's effectiveness in few-shot and zero-shot learning scenarios and using minimal training examples. Our approach enables seamless integration of datasets with different spectral characteristics, providing a flexible framework for hyperspectral medical image analysis.
CVFeb 21, 2025
Confidence-Based Annotation Of Brain Tumours In UltrasoundAlistair Weld, Luke Dixon, Alfie Roddan et al.
Purpose: An investigation of the challenge of annotating discrete segmentations of brain tumours in ultrasound, with a focus on the issue of aleatoric uncertainty along the tumour margin, particularly for diffuse tumours. A segmentation protocol and method is proposed that incorporates this margin-related uncertainty while minimising the interobserver variance through reduced subjectivity, thereby diminishing annotator epistemic uncertainty. Approach: A sparse confidence method for annotation is proposed, based on a protocol designed using computer vision and radiology theory. Results: Output annotations using the proposed method are compared with the corresponding professional discrete annotation variance between the observers. A linear relationship was measured within the tumour margin region, with a Pearson correlation of 0.8. The downstream application was explored, comparing training using confidence annotations as soft labels with using the best discrete annotations as hard labels. In all evaluation folds, the Brier score was superior for the soft-label trained network. Conclusion: A formal framework was constructed to demonstrate the infeasibility of discrete annotation of brain tumours in B-mode ultrasound. Subsequently, a method for sparse confidence-based annotation is proposed and evaluated. Keywords: Brain tumours, ultrasound, confidence, annotation.
IVJul 9, 2021
Self-Supervised Generative Adversarial Network for Depth Estimation in Laparoscopic ImagesBaoru Huang, Jianqing Zheng, Anh Nguyen et al.
Dense depth estimation and 3D reconstruction of a surgical scene are crucial steps in computer assisted surgery. Recent work has shown that depth estimation from a stereo images pair could be solved with convolutional neural networks. However, most recent depth estimation models were trained on datasets with per-pixel ground truth. Such data is especially rare for laparoscopic imaging, making it hard to apply supervised depth estimation to real surgical applications. To overcome this limitation, we propose SADepth, a new self-supervised depth estimation method based on Generative Adversarial Networks. It consists of an encoder-decoder generator and a discriminator to incorporate geometry constraints during training. Multi-scale outputs from the generator help to solve the local minima caused by the photometric reprojection loss, while the adversarial learning improves the framework generation quality. Extensive experiments on two public datasets show that SADepth outperforms recent state-of-the-art unsupervised methods by a large margin, and reduces the gap between supervised and unsupervised depth estimation in laparoscopic images.
CVApr 22, 2021
H-Net: Unsupervised Attention-based Stereo Depth Estimation Leveraging Epipolar GeometryBaoru Huang, Jian-Qing Zheng, Stamatia Giannarou et al.
Depth estimation from a stereo image pair has become one of the most explored applications in computer vision, with most of the previous methods relying on fully supervised learning settings. However, due to the difficulty in acquiring accurate and scalable ground truth data, the training of fully supervised methods is challenging. As an alternative, self-supervised methods are becoming more popular to mitigate this challenge. In this paper, we introduce the H-Net, a deep-learning framework for unsupervised stereo depth estimation that leverages epipolar geometry to refine stereo matching. For the first time, a Siamese autoencoder architecture is used for depth estimation which allows mutual information between the rectified stereo images to be extracted. To enforce the epipolar constraint, the mutual epipolar attention mechanism has been designed which gives more emphasis to correspondences of features which lie on the same epipolar line while learning mutual information between the input stereo pair. Stereo correspondences are further enhanced by incorporating semantic information to the proposed attention mechanism. More specifically, the optimal transport algorithm is used to suppress attention and eliminate outliers in areas not visible in both cameras. Extensive experiments on KITTI2015 and Cityscapes show that our method outperforms the state-ofthe-art unsupervised stereo depth estimation methods while closing the gap with the fully supervised approaches.
CYOct 30, 2020
Surgical Data Science -- from Concepts toward Clinical TranslationLena Maier-Hein, Matthias Eisenmann, Duygu Sarikaya et al.
Recent developments in data science in general and machine learning in particular have transformed the way experts envision the future of surgery. Surgical Data Science (SDS) is a new research field that aims to improve the quality of interventional healthcare through the capture, organization, analysis and modeling of data. While an increasing number of data-driven approaches and clinical applications have been studied in the fields of radiological and clinical data science, translational success stories are still lacking in surgery. In this publication, we shed light on the underlying reasons and provide a roadmap for future advances in the field. Based on an international workshop involving leading researchers in the field of SDS, we review current practice, key achievements and initiatives as well as available standards and tools for a number of topics relevant to the field, namely (1) infrastructure for data acquisition, storage and access in the presence of regulatory constraints, (2) data annotation and sharing and (3) data analytics. We further complement this technical perspective with (4) a review of currently available SDS products and the translational progress from academia and (5) a roadmap for faster clinical translation and exploitation of the full potential of SDS, based on an international multi-round Delphi process.
ROMay 11, 2020
Autonomous Tissue Scanning under Free-Form Motion for Intraoperative Tissue CharacterisationJian Zhan, Joao Cartucho, Stamatia Giannarou
In Minimally Invasive Surgery (MIS), tissue scanning with imaging probes is required for subsurface visualisation to characterise the state of the tissue. However, scanning of large tissue surfaces in the presence of deformation is a challenging task for the surgeon. Recently, robot-assisted local tissue scanning has been investigated for motion stabilisation of imaging probes to facilitate the capturing of good quality images and reduce the surgeon's cognitive load. Nonetheless, these approaches require the tissue surface to be static or deform with periodic motion. To eliminate these assumptions, we propose a visual servoing framework for autonomous tissue scanning, able to deal with free-form tissue deformation. The 3D structure of the surgical scene is recovered and a feature-based method is proposed to estimate the motion of the tissue in real-time. A desired scanning trajectory is manually defined on a reference frame and continuously updated using projective geometry to follow the tissue motion and control the movement of the robotic arm. The advantage of the proposed method is that it does not require the learning of the tissue motion prior to scanning and can deal with free-form deformation. We deployed this framework on the da Vinci surgical robot using the da Vinci Research Kit (dVRK) for Ultrasound tissue scanning. Since the framework does not rely on information from the Ultrasound data, it can be easily extended to other probe-based imaging modalities.
ROMay 16, 2017
Motion-Compensated Autonomous Scanning for Tumour Localisation using Intraoperative UltrasoundLin Zhang, Menglong Ye, Stamatia Giannarou et al.
Intraoperative ultrasound facilitates localisation of tumour boundaries during minimally invasive procedures. Autonomous ultrasound scanning systems have been recently proposed to improve scanning accuracy and reduce surgeons' cognitive load. However, current methods mainly consider static scanning environments typically with the probe pressing against the tissue surface. In this work, a motion-compensated autonomous ultrasound scanning system using the da Vinci Research Kit (dVRK) is proposed. An optimal scanning trajectory is generated considering both the tissue surface shape and the ultrasound transducer dimensions. A robust vision-based approach is proposed to learn the underlying tissue motion characteristics. The learned motion model is then incorporated into the visual servoing framework. The proposed system has been validated with both phantom and ex vivo experiments using the ground truth motion data for comparison.
CVMay 11, 2016
Real-time 3D Tracking of Articulated Tools for Robotic SurgeryMenglong Ye, Lin Zhang, Stamatia Giannarou et al.
In robotic surgery, tool tracking is important for providing safe tool-tissue interaction and facilitating surgical skills assessment. Despite recent advances in tool tracking, existing approaches are faced with major difficulties in real-time tracking of articulated tools. Most algorithms are tailored for offline processing with pre-recorded videos. In this paper, we propose a real-time 3D tracking method for articulated tools in robotic surgery. The proposed method is based on the CAD model of the tools as well as robot kinematics to generate online part-based templates for efficient 2D matching and 3D pose estimation. A robust verification approach is incorporated to reject outliers in 2D detections, which is then followed by fusing inliers with robot kinematic readings for 3D pose estimation of the tool. The proposed method has been validated with phantom data, as well as ex vivo and in vivo experiments. The results derived clearly demonstrate the performance advantage of the proposed method when compared to the state-of-the-art.