YOLOv10-Based Multi-Task Framework for Hand Localization and Laterality Classification in Surgical Videos
This work addresses the problem of supporting rapid intraoperative decisions in trauma surgery, but it is incremental as it builds on existing YOLO methods for a specific domain.
The paper tackled real-time hand tracking in trauma surgery by proposing a YOLOv10-based framework for simultaneous hand localization and laterality classification, achieving 67% left-hand and 71% right-hand classification accuracy with an mAP of 0.33 while maintaining real-time inference.
Real-time hand tracking in trauma surgery is essential for supporting rapid and precise intraoperative decisions. We propose a YOLOv10-based framework that simultaneously localizes hands and classifies their laterality (left or right) in complex surgical scenes. The model is trained on the Trauma THOMPSON Challenge 2025 Task 2 dataset, consisting of first-person surgical videos with annotated hand bounding boxes. Extensive data augmentation and a multi-task detection design improve robustness against motion blur, lighting variations, and diverse hand appearances. Evaluation demonstrates accurate left-hand (67\%) and right-hand (71\%) classification, while distinguishing hands from the background remains challenging. The model achieves an $mAP_{[0.5:0.95]}$ of 0.33 and maintains real-time inference, highlighting its potential for intraoperative deployment. This work establishes a foundation for advanced hand-instrument interaction analysis in emergency surgical procedures.