CVLGIVJul 24, 2021

Two Headed Dragons: Multimodal Fusion and Cross Modal Transactions

arXiv:2107.11585v112 citations
Originality Incremental advance
AI Analysis

This addresses a cumbersome challenge in remote sensing for researchers and practitioners, but it appears incremental as it builds on existing fusion methods with a new approach.

The paper tackles the problem of fusing highly disparate remote sensing modalities like hyperspectral (HSI) and LiDAR for recognition and classification tasks, proposing a novel transformer-based fusion method that achieves competitive results on datasets such as Houston and MUUFL Gulfport.

As the field of remote sensing is evolving, we witness the accumulation of information from several modalities, such as multispectral (MS), hyperspectral (HSI), LiDAR etc. Each of these modalities possess its own distinct characteristics and when combined synergistically, perform very well in the recognition and classification tasks. However, fusing multiple modalities in remote sensing is cumbersome due to highly disparate domains. Furthermore, the existing methods do not facilitate cross-modal interactions. To this end, we propose a novel transformer based fusion method for HSI and LiDAR modalities. The model is composed of stacked auto encoders that harness the cross key-value pairs for HSI and LiDAR, thus establishing a communication between the two modalities, while simultaneously using the CNNs to extract the spectral and spatial information from HSI and LiDAR. We test our model on Houston (Data Fusion Contest - 2013) and MUUFL Gulfport datasets and achieve competitive results.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes