CVAIROAug 6, 2024

BodySLAM: A Generalized Monocular Visual SLAM Framework for Surgical Applications

arXiv:2408.03078v29 citationsh-index: 31
Originality Incremental advance
AI Analysis

This addresses depth perception and instrument manipulation problems for surgeons in endoscopic procedures, representing an incremental improvement over existing methods.

The paper tackles the challenge of implementing monocular visual SLAM in endoscopic surgery by presenting BodySLAM, a deep learning-based framework that combines novel pose estimation, depth estimation, and 3D reconstruction modules. Results show competitive pose estimation with the lowest inference time and significantly outperforming depth estimation on three endoscopic datasets.

Endoscopic surgery relies on two-dimensional views, posing challenges for surgeons in depth perception and instrument manipulation. While Monocular Visual Simultaneous Localization and Mapping (MVSLAM) has emerged as a promising solution, its implementation in endoscopic procedures faces significant challenges due to hardware limitations, such as the use of a monocular camera and the absence of odometry sensors. This study presents BodySLAM, a robust deep learning-based MVSLAM approach that addresses these challenges through three key components: CycleVO, a novel unsupervised monocular pose estimation module; the integration of the state-of-the-art Zoe architecture for monocular depth estimation; and a 3D reconstruction module creating a coherent surgical map. The approach is rigorously evaluated using three publicly available datasets (Hamlyn, EndoSLAM, and SCARED) spanning laparoscopy, gastroscopy, and colonoscopy scenarios, and benchmarked against four state-of-the-art methods. Results demonstrate that CycleVO exhibited competitive performance with the lowest inference time among pose estimation methods, while maintaining robust generalization capabilities, whereas Zoe significantly outperformed existing algorithms for depth estimation in endoscopy. BodySLAM's strong performance across diverse endoscopic scenarios demonstrates its potential as a viable MVSLAM solution for endoscopic applications.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes