LG AI CVNov 17, 2024

Multi-Modal Self-Supervised Learning for Surgical Feedback Effectiveness Assessment

Arushi Gupta, Rafal Kocielnik, Jiayun Wang, Firdavs Nasriddinov, Cherine Yang, Elyssa Wong, Anima Anandkumar, Andrew Hung

arXiv:2411.10919v17.93 citationsh-index: 20Has CodeML4H@NeurIPS

Originality Synthesis-oriented

AI Analysis

This addresses the need for automated, scalable assessment of feedback effectiveness in surgical training, which is incremental as it applies existing multi-modal learning to a new domain-specific task.

The paper tackled the problem of predicting the effectiveness of feedback in surgical training by integrating transcribed verbal feedback and surgical video, achieving an AUROC of 0.70+/-0.02 and improving accuracy by up to 6.6%.

During surgical training, real-time feedback from trainers to trainees is important for preventing errors and enhancing long-term skill acquisition. Accurately predicting the effectiveness of this feedback, specifically whether it leads to a change in trainee behavior, is crucial for developing methods for improving surgical training and education. However, relying on human annotations to assess feedback effectiveness is laborious and prone to biases, underscoring the need for an automated, scalable, and objective method. Creating such an automated system poses challenges, as it requires an understanding of both the verbal feedback delivered by the trainer and the visual context of the real-time surgical scene. To address this, we propose a method that integrates information from transcribed verbal feedback and corresponding surgical video to predict feedback effectiveness. Our findings show that both transcribed feedback and surgical video are individually predictive of trainee behavior changes, and their combination achieves an AUROC of 0.70+/-0.02, improving prediction accuracy by up to 6.6%. Additionally, we introduce self-supervised fine-tuning as a strategy for enhancing surgical video representation learning, which is scalable and further enhances prediction performance. Our results demonstrate the potential of multi-modal learning to advance the automated assessment of surgical feedback.

View on arXiv PDF Code

Similar