ST(OR)2: Spatio-Temporal Object Level Reasoning for Activity Recognition in the Operating Room
This work addresses the problem of resource-intensive data collection for surgical activity recognition, offering a more efficient method for developing AI tools in surgical robotics, though it is incremental as it builds on object-centric approaches.
The paper tackled surgical activity recognition in the operating room by proposing a sample-efficient, object-based approach that focuses on geometric arrangements between clinicians and devices, showing superior performance in low-data regimes and clip-level action classification benchmarks.
Surgical robotics holds much promise for improving patient safety and clinician experience in the Operating Room (OR). However, it also comes with new challenges, requiring strong team coordination and effective OR management. Automatic detection of surgical activities is a key requirement for developing AI-based intelligent tools to tackle these challenges. The current state-of-the-art surgical activity recognition methods however operate on image-based representations and depend on large-scale labeled datasets whose collection is time-consuming and resource-expensive. This work proposes a new sample-efficient and object-based approach for surgical activity recognition in the OR. Our method focuses on the geometric arrangements between clinicians and surgical devices, thus utilizing the significant object interaction dynamics in the OR. We conduct experiments in a low-data regime study for long video activity recognition. We also benchmark our method againstother object-centric approaches on clip-level action classification and show superior performance.