CVNov 17, 2020

Multi-frame Feature Aggregation for Real-time Instrument Segmentation in Endoscopic Video

arXiv:2011.08752v226 citations
AI Analysis

This work addresses the need for real-time, accurate surgical instrument segmentation for robotic-assisted surgery, which is an incremental improvement over existing methods.

The authors tackle the problem of real-time surgical instrument segmentation in endoscopic video, which is crucial for robotic-assisted surgery but limited by high computational costs and challenging image conditions. They propose a Multi-frame Feature Aggregation (MFFA) module that aggregates features temporally and spatially, enabling the use of a lightweight encoder and achieving superior performance compared to deeper models on two public surgery datasets.

Deep learning-based methods have achieved promising results on surgical instrument segmentation. However, the high computation cost may limit the application of deep models to time-sensitive tasks such as online surgical video analysis for robotic-assisted surgery. Moreover, current methods may still suffer from challenging conditions in surgical images such as various lighting conditions and the presence of blood. We propose a novel Multi-frame Feature Aggregation (MFFA) module to aggregate video frame features temporally and spatially in a recurrent mode. By distributing the computation load of deep feature extraction over sequential frames, we can use a lightweight encoder to reduce the computation costs at each time step. Moreover, public surgical videos usually are not labeled frame by frame, so we develop a method that can randomly synthesize a surgical frame sequence from a single labeled frame to assist network training. We demonstrate that our approach achieves superior performance to corresponding deeper segmentation models on two public surgery datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes