CVNov 11, 2021

Open surgery tool classification and hand utilization using a multi-camera system

arXiv:2111.06098v18 citations
Originality Synthesis-oriented
AI Analysis

This work addresses tool tracking in open surgery for improved surgical workflow analysis, but it is incremental as it builds on existing detection and LSTM methods.

This paper tackled the problem of classifying open surgery tools and identifying which tool is held in each hand using a multi-camera system to prevent occlusions, achieving an accuracy of 0.93 and F1 score of 0.94 with their final architecture.

Purpose: The goal of this work is to use multi-camera video to classify open surgery tools as well as identify which tool is held in each hand. Multi-camera systems help prevent occlusions in open surgery video data. Furthermore, combining multiple views such as a Top-view camera covering the full operative field and a Close-up camera focusing on hand motion and anatomy, may provide a more comprehensive view of the surgical workflow. However, multi-camera data fusion poses a new challenge: a tool may be visible in one camera and not the other. Thus, we defined the global ground truth as the tools being used regardless their visibility. Therefore, tools that are out of the image should be remembered for extensive periods of time while the system responds quickly to changes visible in the video. Methods: Participants (n=48) performed a simulated open bowel repair. A Top-view and a Close-up cameras were used. YOLOv5 was used for tool and hand detection. A high frequency LSTM with a 1 second window at 30 frames per second (fps) and a low frequency LSTM with a 40 second window at 3 fps were used for spatial, temporal, and multi-camera integration. Results: The accuracy and F1 of the six systems were: Top-view (0.88/0.88), Close-up (0.81,0.83), both cameras (0.9/0.9), high fps LSTM (0.92/0.93), low fps LSTM (0.9/0.91), and our final architecture the Multi-camera classifier(0.93/0.94). Conclusion: By combining a system with a high fps and a low fps from the multiple camera array we improved the classification abilities of the global ground truth.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes