CVDec 23, 2024

STeInFormer: Spatial-Temporal Interaction Transformer Architecture for Remote Sensing Change Detection

Xiaowen Ma, Zhenkai Wu, Mengting Ma, Mengjiao Zhao, Fan Yang, Zhenhong Du, Wei Zhang

arXiv:2412.17247v13.7h-index: 10Has Code

Originality Highly original

AI Analysis

This work addresses the lack of spatial-temporal interaction in remote sensing change detection, offering a novel backbone network for this domain-specific task.

The authors tackled the problem of remote sensing change detection by introducing STeInFormer, a spatial-temporal interaction Transformer architecture, which outperformed state-of-the-art methods on three datasets with improved efficiency-accuracy trade-off.

Convolutional neural networks and attention mechanisms have greatly benefited remote sensing change detection (RSCD) because of their outstanding discriminative ability. Existent RSCD methods often follow a paradigm of using a non-interactive Siamese neural network for multi-temporal feature extraction and change detection heads for feature fusion and change representation. However, this paradigm lacks the contemplation of the characteristics of RSCD in temporal and spatial dimensions, and causes the drawback on spatial-temporal interaction that hinders high-quality feature extraction. To address this problem, we present STeInFormer, a spatial-temporal interaction Transformer architecture for multi-temporal feature extraction, which is the first general backbone network specifically designed for RSCD. In addition, we propose a parameter-free multi-frequency token mixer to integrate frequency-domain features that provide spectral information for RSCD. Experimental results on three datasets validate the effectiveness of the proposed method, which can outperform the state-of-the-art methods and achieve the most satisfactory efficiency-accuracy trade-off. Code is available at https://github.com/xwmaxwma/rschange.

View on arXiv PDF Code

Similar