CVNov 17, 2020

Multi-frame Feature Aggregation for Real-time Instrument Segmentation in Endoscopic Video

Shan Lin, Fangbo Qin, Haonan Peng, Randall A. Bly, Kris S. Moe, Blake Hannaford

arXiv:2011.08752v26.526 citations

Originality Incremental advance

AI Analysis

This work addresses the need for real-time, accurate surgical instrument segmentation for robotic-assisted surgery, which is an incremental improvement over existing methods.

The authors tackle the problem of real-time surgical instrument segmentation in endoscopic video, which is crucial for robotic-assisted surgery but limited by high computational costs and challenging image conditions. They propose a Multi-frame Feature Aggregation (MFFA) module that aggregates features temporally and spatially, enabling the use of a lightweight encoder and achieving superior performance compared to deeper models on two public surgery datasets.

Deep learning-based methods have achieved promising results on surgical instrument segmentation. However, the high computation cost may limit the application of deep models to time-sensitive tasks such as online surgical video analysis for robotic-assisted surgery. Moreover, current methods may still suffer from challenging conditions in surgical images such as various lighting conditions and the presence of blood. We propose a novel Multi-frame Feature Aggregation (MFFA) module to aggregate video frame features temporally and spatially in a recurrent mode. By distributing the computation load of deep feature extraction over sequential frames, we can use a lightweight encoder to reduce the computation costs at each time step. Moreover, public surgical videos usually are not labeled frame by frame, so we develop a method that can randomly synthesize a surgical frame sequence from a single labeled frame to assist network training. We demonstrate that our approach achieves superior performance to corresponding deeper segmentation models on two public surgery datasets.

View on arXiv PDF

Similar