CVJun 19, 2023

A spatio-temporal network for video semantic segmentation in surgical videos

arXiv:2306.11052v129 citationsh-index: 60
Originality Incremental advance
AI Analysis

This work addresses the need for reliable segmentation in surgical applications like intra-operative guidance, though it is incremental as it builds on existing segmentation encoders.

The paper tackled the problem of temporally inconsistent semantic segmentation in surgical videos by proposing a spatio-temporal decoder architecture, which improved segmentation performance and temporal consistency on the CholecSeg8k and a private robotic Partial Nephrectomy dataset.

Semantic segmentation in surgical videos has applications in intra-operative guidance, post-operative analytics and surgical education. Segmentation models need to provide accurate and consistent predictions since temporally inconsistent identification of anatomical structures can impair usability and hinder patient safety. Video information can alleviate these challenges leading to reliable models suitable for clinical use. We propose a novel architecture for modelling temporal relationships in videos. The proposed model includes a spatio-temporal decoder to enable video semantic segmentation by improving temporal consistency across frames. The encoder processes individual frames whilst the decoder processes a temporal batch of adjacent frames. The proposed decoder can be used on top of any segmentation encoder to improve temporal consistency. Model performance was evaluated on the CholecSeg8k dataset and a private dataset of robotic Partial Nephrectomy procedures. Segmentation performance was improved when the temporal decoder was applied across both datasets. The proposed model also displayed improvements in temporal consistency.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes