CVAIJun 9, 2025

SurgBench: A Unified Large-Scale Benchmark for Surgical Video Analysis

arXiv:2506.07603v29 citationsh-index: 11
Originality Synthesis-oriented
AI Analysis

This addresses the problem of limited data for surgical video understanding, enabling better automated decision-making and skill assessment in healthcare, though it is incremental as it builds on existing benchmarking approaches.

The authors tackled the lack of large-scale datasets for surgical video analysis by introducing SurgBench, a unified benchmarking framework with a pretraining dataset of 53 million frames and an evaluation benchmark across 72 tasks, which improved performance and generalization for video foundation models.

Surgical video understanding is pivotal for enabling automated intraoperative decision-making, skill assessment, and postoperative quality improvement. However, progress in developing surgical video foundation models (FMs) remains hindered by the scarcity of large-scale, diverse datasets for pretraining and systematic evaluation. In this paper, we introduce \textbf{SurgBench}, a unified surgical video benchmarking framework comprising a pretraining dataset, \textbf{SurgBench-P}, and an evaluation benchmark, \textbf{SurgBench-E}. SurgBench offers extensive coverage of diverse surgical scenarios, with SurgBench-P encompassing 53 million frames across 22 surgical procedures and 11 specialties, and SurgBench-E providing robust evaluation across six categories (phase classification, camera motion, tool recognition, disease diagnosis, action classification, and organ detection) spanning 72 fine-grained tasks. Extensive experiments reveal that existing video FMs struggle to generalize across varied surgical video analysis tasks, whereas pretraining on SurgBench-P yields substantial performance improvements and superior cross-domain generalization to unseen procedures and modalities. Our dataset and code are available upon request.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes