Temporal coherence-based self-supervised learning for laparoscopic workflow analysis
This work addresses the data annotation bottleneck for computer-assisted surgery systems, offering an incremental improvement in surgical phase segmentation.
The paper tackles the problem of limited annotated data for surgical workflow analysis by proposing self-supervised pretraining methods using temporal coherence on unlabeled laparoscopic videos, achieving an F1 score of 84.6 on the Cholec80 dataset and up to a 10-point improvement over non-pretrained networks.
In order to provide the right type of assistance at the right time, computer-assisted surgery systems need context awareness. To achieve this, methods for surgical workflow analysis are crucial. Currently, convolutional neural networks provide the best performance for video-based workflow analysis tasks. For training such networks, large amounts of annotated data are necessary. However, collecting a sufficient amount of data is often costly, time-consuming, and not always feasible. In this paper, we address this problem by presenting and comparing different approaches for self-supervised pretraining of neural networks on unlabeled laparoscopic videos using temporal coherence. We evaluate our pretrained networks on Cholec80, a publicly available dataset for surgical phase segmentation, on which a maximum F1 score of 84.6 was reached. Furthermore, we were able to achieve an increase of the F1 score of up to 10 points when compared to a non-pretrained neural network.