CVFeb 21

YOLOv10-Based Multi-Task Framework for Hand Localization and Laterality Classification in Surgical Videos

arXiv:2602.18959v1
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of supporting rapid intraoperative decisions in trauma surgery, but it is incremental as it builds on existing YOLO methods for a specific domain.

The paper tackled real-time hand tracking in trauma surgery by proposing a YOLOv10-based framework for simultaneous hand localization and laterality classification, achieving 67% left-hand and 71% right-hand classification accuracy with an mAP of 0.33 while maintaining real-time inference.

Real-time hand tracking in trauma surgery is essential for supporting rapid and precise intraoperative decisions. We propose a YOLOv10-based framework that simultaneously localizes hands and classifies their laterality (left or right) in complex surgical scenes. The model is trained on the Trauma THOMPSON Challenge 2025 Task 2 dataset, consisting of first-person surgical videos with annotated hand bounding boxes. Extensive data augmentation and a multi-task detection design improve robustness against motion blur, lighting variations, and diverse hand appearances. Evaluation demonstrates accurate left-hand (67\%) and right-hand (71\%) classification, while distinguishing hands from the background remains challenging. The model achieves an $mAP_{[0.5:0.95]}$ of 0.33 and maintains real-time inference, highlighting its potential for intraoperative deployment. This work establishes a foundation for advanced hand-instrument interaction analysis in emergency surgical procedures.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes