CVLGIVJan 26, 2022

Dual-Tasks Siamese Transformer Framework for Building Damage Assessment

arXiv:2201.10953v261 citations
AI Analysis

This work provides a novel method for fine-grained damage assessment to aid humanitarian relief, though it is incremental as it adapts Transformer architectures to a new domain.

The paper tackles building damage assessment from remote sensing images by proposing DamFormer, a Transformer-based architecture that addresses CNNs' limitations in modeling non-local pixel relationships, achieving promising results on the xBD dataset.

Accurate and fine-grained information about the extent of damage to buildings is essential for humanitarian relief and disaster response. However, as the most commonly used architecture in remote sensing interpretation tasks, Convolutional Neural Networks (CNNs) have limited ability to model the non-local relationship between pixels. Recently, Transformer architecture first proposed for modeling long-range dependency in natural language processing has shown promising results in computer vision tasks. Considering the frontier advances of Transformer architecture in the computer vision field, in this paper, we present the first attempt at designing a Transformer-based damage assessment architecture (DamFormer). In DamFormer, a siamese Transformer encoder is first constructed to extract non-local and representative deep features from input multitemporal image-pairs. Then, a multitemporal fusion module is designed to fuse information for downstream tasks. Finally, a lightweight dual-tasks decoder aggregates multi-level features for final prediction. To the best of our knowledge, it is the first time that such a deep Transformer-based network is proposed for multitemporal remote sensing interpretation tasks. The experimental results on the large-scale damage assessment dataset xBD demonstrate the potential of the Transformer-based architecture.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes