CVMar 7

SurgCUT3R: Surgical Scene-Aware Continuous Understanding of Temporal 3D Representation

arXiv:2603.06971v1
Predicted impact top 44% in CV · last 90 daysOriginality Incremental advance
AI Analysis

This work provides a practical and effective solution for robust 3D reconstruction in surgical environments, which is critical for advancing robotic-assisted surgery, by overcoming data scarcity and long-sequence drift. This is an incremental improvement for the surgical robotics domain.

This paper addresses the challenges of 3D surgical scene reconstruction from monocular endoscopic video, specifically the lack of supervised training data and performance degradation over long sequences. The authors propose SurgCUT3R, a framework that achieves a competitive balance between accuracy and efficiency, delivering near state-of-the-art pose estimation while being substantially faster than existing methods.

Reconstructing surgical scenes from monocular endoscopic video is critical for advancing robotic-assisted surgery. However, the application of state-of-the-art general-purpose reconstruction models is constrained by two key challenges: the lack of supervised training data and performance degradation over long video sequences. To overcome these limitations, we propose SurgCUT3R, a systematic framework that adapts unified 3D reconstruction models to the surgical domain. Our contributions are threefold. First, we develop a data generation pipeline that exploits public stereo surgical datasets to produce large-scale, metric-scale pseudo-ground-truth depth maps, effectively bridging the data gap. Second, we propose a hybrid supervision strategy that couples our pseudo-ground-truth with geometric self-correction to enhance robustness against inherent data imperfections. Third, we introduce a hierarchical inference framework that employs two specialized models to effectively mitigate accumulated pose drift over long surgical videos: one for global stability and one for local accuracy. Experiments on the SCARED and StereoMIS datasets demonstrate that our method achieves a competitive balance between accuracy and efficiency, delivering near state-of-the-art but substantially faster pose estimation and offering a practical and effective solution for robust reconstruction in surgical environments. Project page: https://chumo-xu.github.io/SurgCUT3R-ICRA26/.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes