Multi-Stage Boundary-Aware Transformer Network for Action Segmentation in Untrimmed Surgical Videos
This addresses the need for better surgical workflow analysis to enhance training and efficiency, though it is incremental with a novel method for a known bottleneck.
The paper tackles the problem of action segmentation in untrimmed surgical videos, which is challenging due to variable surgeon approaches and ambiguous action boundaries, and proposes MSBATN to improve segmentation by accurately identifying boundaries, achieving state-of-the-art F1 scores at 25% and 50% thresholds.
Understanding actions within surgical workflows is critical for evaluating post-operative outcomes and enhancing surgical training and efficiency. Capturing and analyzing long sequences of actions in surgical settings is challenging due to the inherent variability in individual surgeon approaches, which are shaped by their expertise and preferences. This variability complicates the identification and segmentation of distinct actions with ambiguous boundary start and end points. The traditional models, such as MS-TCN, which rely on large receptive fields, that causes over-segmentation, or under-segmentation, where distinct actions are incorrectly aligned. To address these challenges, we propose the Multi-Stage Boundary-Aware Transformer Network (MSBATN) with hierarchical sliding window attention to improve action segmentation. Our approach effectively manages the complexity of varying action durations and subtle transitions by accurately identifying start and end action boundaries in untrimmed surgical videos. MSBATN introduces a novel unified loss function that optimises action classification and boundary detection as interconnected tasks. Unlike conventional binary boundary detection methods, our innovative boundary weighing mechanism leverages contextual information to precisely identify action boundaries. Extensive experiments on three challenging surgical datasets demonstrate that MSBATN achieves state-of-the-art performance, with superior F1 scores at 25% and 50%. thresholds and competitive results across other metrics.