When Tracking Fails: Analyzing Failure Modes of SAM2 for Point-Based Tracking in Surgical Videos
This work addresses reliability issues in zero-shot tracking for surgical video analysis, providing incremental insights for improving point-based methods in complex medical environments.
The paper analyzed the failure modes of point-based tracking using SAM2 in laparoscopic cholecystectomy videos, finding it competitive for surgical tools but underperforming for anatomical targets due to tissue similarity and ambiguous boundaries.
Video object segmentation (VOS) models such as SAM2 offer promising zero-shot tracking capabilities for surgical videos using minimal user input. Among the available input types, point-based tracking offers an efficient and low-cost alternative, yet its reliability and failure cases in complex surgical environments are not well understood. In this work, we systematically analyze the failure modes of point-based tracking in laparoscopic cholecystectomy videos. Focusing on three surgical targets, the gallbladder, grasper, and L-hook electrocautery, we compare the performance of point-based tracking with segmentation mask initialization. Our results show that point-based tracking is competitive for surgical tools but consistently underperforms for anatomical targets, where tissue similarity and ambiguous boundaries lead to failure. Through qualitative analysis, we reveal key factors influencing tracking outcomes and provide several actionable recommendations for selecting and placing tracking points to improve performance in surgical video analysis.