CVFeb 21, 2023

Weakly Supervised Temporal Convolutional Networks for Fine-grained Surgical Activity Recognition

arXiv:2302.10834v29 citationsh-index: 68
Originality Incremental advance
AI Analysis

This work addresses the problem of reducing annotation effort for surgical activity recognition, which is crucial for intelligent intra-operative computer assistance, though it is incremental as it builds on existing weakly supervised and TCN methods.

The paper tackles the challenge of fine-grained surgical activity recognition by using coarser, easier-to-annotate phase labels as weak supervision to reduce reliance on manual step annotations, achieving effective results on datasets of 40 laparoscopic gastric bypass procedures and 50 cataract surgeries.

Automatic recognition of fine-grained surgical activities, called steps, is a challenging but crucial task for intelligent intra-operative computer assistance. The development of current vision-based activity recognition methods relies heavily on a high volume of manually annotated data. This data is difficult and time-consuming to generate and requires domain-specific knowledge. In this work, we propose to use coarser and easier-to-annotate activity labels, namely phases, as weak supervision to learn step recognition with fewer step annotated videos. We introduce a step-phase dependency loss to exploit the weak supervision signal. We then employ a Single-Stage Temporal Convolutional Network (SS-TCN) with a ResNet-50 backbone, trained in an end-to-end fashion from weakly annotated videos, for temporal activity segmentation and recognition. We extensively evaluate and show the effectiveness of the proposed method on a large video dataset consisting of 40 laparoscopic gastric bypass procedures and the public benchmark CATARACTS containing 50 cataract surgeries.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes