CVAIJan 18

Where It Moves, It Matters: Referring Surgical Instrument Segmentation via Motion

arXiv:2601.122241 citationsh-index: 21
Originality Highly original
AI Analysis

This work addresses the underexplored task of language-driven surgical instrument segmentation, enabling robust interaction under occlusion, ambiguity, or unfamiliar terminology.

SurgRef introduces a motion-guided framework for referring surgical instrument segmentation, achieving state-of-the-art accuracy and generalization across procedures by grounding language descriptions in instrument motion rather than static visual cues.

Enabling intuitive, language-driven interaction with surgical scenes is a critical step toward intelligent operating rooms and autonomous surgical robotic assistance. However, the task of referring segmentation, localizing surgical instruments based on natural language descriptions, remains underexplored in surgical videos, with existing approaches struggling to generalize due to reliance on static visual cues and predefined instrument names. In this work, we introduce SurgRef, a novel motion-guided framework that grounds free-form language expressions in instrument motion, capturing how tools move and interact across time, rather than what they look like. This allows models to understand and segment instruments even under occlusion, ambiguity, or unfamiliar terminology. To train and evaluate SurgRef, we present Ref-IMotion, a diverse, multi-institutional video dataset with dense spatiotemporal masks and rich motion-centric expressions. SurgRef achieves state-of-the-art accuracy and generalization across surgical procedures, setting a new benchmark for robust, language-driven surgical video segmentation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes