IVMar 4, 2022
3D endoscopic depth estimation using 3D surface-aware constraintsShang Zhao, Ce Wang, Qiyuan Wang et al.
Robotic-assisted surgery allows surgeons to conduct precise surgical operations with stereo vision and flexible motor control. However, the lack of 3D spatial perception limits situational awareness during procedures and hinders mastering surgical skills in the narrow abdominal space. Depth estimation, as a representative perception task, is typically defined as an image reconstruction problem. In this work, we show that depth estimation can be reformed from a 3D surface perspective. We propose a loss function for depth estimation that integrates the surface-aware constraints, leading to a faster and better convergence with the valid information from spatial information. In addition, camera parameters are incorporated into the training pipeline to increase the control and transparency of the depth estimation. We also integrate a specularity removal module to recover more buried image information. Quantitative experimental results on endoscopic datasets and user studies with medical professionals demonstrate the effectiveness of our method.
CVDec 24, 2022
MURPHY: Relations Matter in Surgical Workflow AnalysisShang Zhao, Yanzhe Liu, Qiyuan Wang et al.
Autonomous robotic surgery has advanced significantly based on analysis of visual and temporal cues in surgical workflow, but relational cues from domain knowledge remain under investigation. Complex relations in surgical annotations can be divided into intra- and inter-relations, both valuable to autonomous systems to comprehend surgical workflows. Intra- and inter-relations describe the relevance of various categories within a particular annotation type and the relevance of different annotation types, respectively. This paper aims to systematically investigate the importance of relational cues in surgery. First, we contribute the RLLS12M dataset, a large-scale collection of robotic left lateral sectionectomy (RLLS), by curating 50 videos of 50 patients operated by 5 surgeons and annotating a hierarchical workflow, which consists of 3 inter- and 6 intra-relations, 6 steps, 15 tasks, and 38 activities represented as the triplet of 11 instruments, 8 actions, and 16 objects, totaling 2,113,510 video frames and 12,681,060 annotation entities. Correspondingly, we propose a multi-relation purification hybrid network (MURPHY), which aptly incorporates novel relation modules to augment the feature representation by purifying relational features using the intra- and inter-relations embodied in annotations. The intra-relation module leverages a R-GCN to implant visual features in different graph relations, which are aggregated using a targeted relation purification with affinity information measuring label consistency and feature similarity. The inter-relation module is motivated by attention mechanisms to regularize the influence of relational features based on the hierarchy of annotation types from the domain knowledge. Extensive experimental results on the curated RLLS dataset confirm the effectiveness of our approach, demonstrating that relations matter in surgical workflow analysis.
CVMar 14, 2024
WeakSurg: Weakly supervised surgical instrument segmentation using temporal equivariance and semantic continuityQiyuan Wang, Yanzhe Liu, Shang Zhao et al.
For robotic surgical videos, instrument presence annotations are typically recorded with video streams, which offering the potential to reduce the manually annotated costs for segmentation. However, weakly supervised surgical instrument segmentation with only instrument presence labels has been rarely explored in surgical domain due to the highly under-constrained challenges. Temporal properties can enhance representation learning by capturing sequential dependencies and patterns over time even in incomplete supervision situations. From this, we take the inherent temporal attributes of surgical video into account and extend a two-stage weakly supervised segmentation paradigm from different perspectives. Firstly, we make temporal equivariance constraint to enhance pixel-wise temporal consistency between adjacent features. Secondly, we constrain class-aware semantic continuity between global and local regions across temporal dimension. Finally, we generate temporal-enhanced pseudo masks from consecutive frames to suppress irrelevant regions. Extensive experiments are validated on two surgical video datasets, including one cholecystectomy surgery benchmark and one real robotic left lateral segment liver surgery dataset. We annotate instance-wise instrument labels with fixed time-steps which are double checked by a clinician with 3-years experience to evaluate segmentation results. Experimental results demonstrate the promising performances of our method, which consistently achieves comparable or favorable results with previous state-of-the-art approaches.