Transformers Fusion across Disjoint Samples for Hyperspectral Image Classification
This is an incremental improvement for hyperspectral image classification researchers, enhancing accuracy and robustness through model fusion and disjoint sampling.
The paper tackled hyperspectral image classification by fusing 3D Swin Transformer and Spatial-spectral Transformer with attentional mechanisms, achieving superior performance over traditional methods and individual transformers on benchmark datasets.
3D Swin Transformer (3D-ST) known for its hierarchical attention and window-based processing, excels in capturing intricate spatial relationships within images. Spatial-spectral Transformer (SST), meanwhile, specializes in modeling long-range dependencies through self-attention mechanisms. Therefore, this paper introduces a novel method: an attentional fusion of these two transformers to significantly enhance the classification performance of Hyperspectral Images (HSIs). What sets this approach apart is its emphasis on the integration of attentional mechanisms from both architectures. This integration not only refines the modeling of spatial and spectral information but also contributes to achieving more precise and accurate classification results. The experimentation and evaluation of benchmark HSI datasets underscore the importance of employing disjoint training, validation, and test samples. The results demonstrate the effectiveness of the fusion approach, showcasing its superiority over traditional methods and individual transformers. Incorporating disjoint samples enhances the robustness and reliability of the proposed methodology, emphasizing its potential for advancing hyperspectral image classification.