CVAIMay 7, 2024

Structured Click Control in Transformer-based Interactive Segmentation

arXiv:2405.04009v12 citationsh-index: 2Has Code
Originality Incremental advance
AI Analysis

This work addresses robustness issues in interactive segmentation for users needing precise control, representing an incremental improvement over existing methods.

The paper tackles the problem of imprecise and non-robust responses in click-point-based interactive segmentation after multiple clicks by proposing a structured click intent model using graph neural networks and dual cross-attention, resulting in improved performance as a general structure for Transformer-based methods.

Click-point-based interactive segmentation has received widespread attention due to its efficiency. However, it's hard for existing algorithms to obtain precise and robust responses after multiple clicks. In this case, the segmentation results tend to have little change or are even worse than before. To improve the robustness of the response, we propose a structured click intent model based on graph neural networks, which adaptively obtains graph nodes via the global similarity of user-clicked Transformer tokens. Then the graph nodes will be aggregated to obtain structured interaction features. Finally, the dual cross-attention will be used to inject structured interaction features into vision Transformer features, thereby enhancing the control of clicks over segmentation results. Extensive experiments demonstrated the proposed algorithm can serve as a general structure in improving Transformer-based interactive segmenta?tion performance. The code and data will be released at https://github.com/hahamyt/scc.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes