Two Headed Dragons: Multimodal Fusion and Cross Modal Transactions
This addresses a cumbersome challenge in remote sensing for researchers and practitioners, but it appears incremental as it builds on existing fusion methods with a new approach.
The paper tackles the problem of fusing highly disparate remote sensing modalities like hyperspectral (HSI) and LiDAR for recognition and classification tasks, proposing a novel transformer-based fusion method that achieves competitive results on datasets such as Houston and MUUFL Gulfport.
As the field of remote sensing is evolving, we witness the accumulation of information from several modalities, such as multispectral (MS), hyperspectral (HSI), LiDAR etc. Each of these modalities possess its own distinct characteristics and when combined synergistically, perform very well in the recognition and classification tasks. However, fusing multiple modalities in remote sensing is cumbersome due to highly disparate domains. Furthermore, the existing methods do not facilitate cross-modal interactions. To this end, we propose a novel transformer based fusion method for HSI and LiDAR modalities. The model is composed of stacked auto encoders that harness the cross key-value pairs for HSI and LiDAR, thus establishing a communication between the two modalities, while simultaneously using the CNNs to extract the spectral and spatial information from HSI and LiDAR. We test our model on Houston (Data Fusion Contest - 2013) and MUUFL Gulfport datasets and achieve competitive results.