Graph-Segmenter: Graph Transformer with Boundary-aware Attention for Semantic Segmentation
This work addresses a specific bottleneck in semantic segmentation for computer vision applications, offering incremental improvements over existing transformer-based methods.
The paper tackles the problem of insufficient relation modeling between windows in transformer-based semantic segmentation by proposing Graph-Segmenter, which uses a Graph Transformer and Boundary-aware Attention module to model global and local relations and adjust boundaries, achieving state-of-the-art performance on Cityscapes, ADE-20k, and PASCAL Context datasets.
The transformer-based semantic segmentation approaches, which divide the image into different regions by sliding windows and model the relation inside each window, have achieved outstanding success. However, since the relation modeling between windows was not the primary emphasis of previous work, it was not fully utilized. To address this issue, we propose a Graph-Segmenter, including a Graph Transformer and a Boundary-aware Attention module, which is an effective network for simultaneously modeling the more profound relation between windows in a global view and various pixels inside each window as a local one, and for substantial low-cost boundary adjustment. Specifically, we treat every window and pixel inside the window as nodes to construct graphs for both views and devise the Graph Transformer. The introduced boundary-aware attention module optimizes the edge information of the target objects by modeling the relationship between the pixel on the object's edge. Extensive experiments on three widely used semantic segmentation datasets (Cityscapes, ADE-20k and PASCAL Context) demonstrate that our proposed network, a Graph Transformer with Boundary-aware Attention, can achieve state-of-the-art segmentation performance.