CVMar 16, 2023

MATIS: Masked-Attention Transformers for Surgical Instrument Segmentation

arXiv:2303.09514v448 citationsh-index: 7
Originality Incremental advance
AI Analysis

This addresses instrument segmentation in surgical videos for medical applications, representing an incremental improvement over existing methods.

The paper tackles surgical instrument segmentation by proposing MATIS, a two-stage transformer-based method with masked attention and video-level information, achieving state-of-the-art results on Endovis 2017 and 2018 benchmarks with performance boosts from temporal consistency.

We propose Masked-Attention Transformers for Surgical Instrument Segmentation (MATIS), a two-stage, fully transformer-based method that leverages modern pixel-wise attention mechanisms for instrument segmentation. MATIS exploits the instance-level nature of the task by employing a masked attention module that generates and classifies a set of fine instrument region proposals. Our method incorporates long-term video-level information through video transformers to improve temporal consistency and enhance mask classification. We validate our approach in the two standard public benchmarks, Endovis 2017 and Endovis 2018. Our experiments demonstrate that MATIS' per-frame baseline outperforms previous state-of-the-art methods and that including our temporal consistency module boosts our model's performance further.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes