CV LGSep 11, 2025

Exploring Pre-training Across Domains for Few-Shot Surgical Skill Assessment

Dimitrios Anastasiou, Razvan Caramalau, Nazir Sirajudeen, Matthew Boal, Philip Edwards, Justin Collins, John Kelly, Ashwin Sridhar, Maxine Tran, Faiz Mumtaz, Nevil Pavithran, Nader Francis

arXiv:2509.09327v11 citationsh-index: 13Has CodeDEMI@MICCAI

Originality Incremental advance

AI Analysis

This work addresses the challenge of skill annotation scarcity in surgical computer vision, offering a scalable few-shot approach that is incremental in optimizing pre-training strategies for domain-specific applications.

The paper tackles the problem of automated surgical skill assessment (SSA) by exploring few-shot learning with self-supervised pre-training, showing that domain-relevant datasets outperform larger, less aligned ones, achieving accuracies of 60.16%, 66.03%, and 73.65% in 1-, 2-, and 5-shot settings, and that incorporating procedure-specific data boosts performance by an average of +1.22% in accuracy and +2.28% in F1-score.

Automated surgical skill assessment (SSA) is a central task in surgical computer vision. Developing robust SSA models is challenging due to the scarcity of skill annotations, which are time-consuming to produce and require expert consensus. Few-shot learning (FSL) offers a scalable alternative enabling model development with minimal supervision, though its success critically depends on effective pre-training. While widely studied for several surgical downstream tasks, pre-training has remained largely unexplored in SSA. In this work, we formulate SSA as a few-shot task and investigate how self-supervised pre-training strategies affect downstream few-shot SSA performance. We annotate a publicly available robotic surgery dataset with Objective Structured Assessment of Technical Skill (OSATS) scores, and evaluate various pre-training sources across three few-shot settings. We quantify domain similarity and analyze how domain gap and the inclusion of procedure-specific data into pre-training influence transferability. Our results show that small but domain-relevant datasets can outperform large scale, less aligned ones, achieving accuracies of 60.16%, 66.03%, and 73.65% in the 1-, 2-, and 5-shot settings, respectively. Moreover, incorporating procedure-specific data into pre-training with a domain-relevant external dataset significantly boosts downstream performance, with an average gain of +1.22% in accuracy and +2.28% in F1-score; however, applying the same strategy with less similar but large-scale sources can instead lead to performance degradation. Code and models are available at https://github.com/anastadimi/ssa-fsl.

View on arXiv PDF Code

Similar