CVJul 18, 2024

QuIIL at T3 challenge: Towards Automation in Life-Saving Intervention Procedures from First-Person View

Trinh T. L. Vuong, Doanh C. Bui, Jin Tae Kwak

arXiv:2407.13216v12.0h-index: 10Has Code

Originality Incremental advance

AI Analysis

This work addresses automation for life-saving medical procedures, but it is incremental as it builds on existing methods like knowledge distillation and co-attention networks.

The paper tackled automation tasks in life-saving intervention procedures, achieving 2nd rank in action recognition and anticipation and 1st rank in Visual Question Answering (VQA) in the T3 Challenge.

In this paper, we present our solutions for a spectrum of automation tasks in life-saving intervention procedures within the Trauma THOMPSON (T3) Challenge, encompassing action recognition, action anticipation, and Visual Question Answering (VQA). For action recognition and anticipation, we propose a pre-processing strategy that samples and stitches multiple inputs into a single image and then incorporates momentum- and attention-based knowledge distillation to improve the performance of the two tasks. For training, we present an action dictionary-guided design, which consistently yields the most favorable results across our experiments. In the realm of VQA, we leverage object-level features and deploy co-attention networks to train both object and question features. Notably, we introduce a novel frame-question cross-attention mechanism at the network's core for enhanced performance. Our solutions achieve the $2^{nd}$ rank in action recognition and anticipation tasks and $1^{st}$ rank in the VQA task.

View on arXiv PDF Code

Similar