CVJul 20, 2018

Surgical Phase Recognition of Short Video Shots Based on Temporal Modeling of Deep Features

arXiv:1807.07853v415 citations
Originality Incremental advance
AI Analysis

This work addresses efficient content representation for surgical video databases, but it is incremental as it builds on existing CNN and LSTM methods with a novel feature addition.

The paper tackled surgical phase recognition in laparoscopic surgery videos by focusing on short video shots without prior context, using CNN architectures and temporal modeling. The result was an accuracy increase from 69% to 86% by incorporating elapsed time and LSTM networks.

Recognizing the phases of a laparoscopic surgery (LS) operation form its video constitutes a fundamental step for efficient content representation, indexing and retrieval in surgical video databases. In the literature, most techniques focus on phase segmentation of the entire LS video using hand-crafted visual features, instrument usage signals, and recently convolutional neural networks (CNNs). In this paper we address the problem of phase recognition of short video shots (10s) of the operation, without utilizing information about the preceding/forthcoming video frames, their phase labels or the instruments used. We investigate four state-of-the-art CNN architectures (Alexnet, VGG19, GoogleNet, and ResNet101), for feature extraction via transfer learning. Visual saliency was employed for selecting the most informative region of the image as input to the CNN. Video shot representation was based on two temporal pooling mechanisms. Most importantly, we investigate the role of 'elapsed time' (from the beginning of the operation), and we show that inclusion of this feature can increase performance dramatically (69% vs. 75% mean accuracy). Finally, a long short-term memory (LSTM) network was trained for video shot classification based on the fusion of CNN features with 'elapsed time', increasing the accuracy to 86%. Our results highlight the prominent role of visual saliency, long-range temporal recursion and 'elapsed time' (a feature so far ignored), for surgical phase recognition.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes