Explaining Multimodal Data Fusion: Occlusion Analysis for Wilderness Mapping
This work addresses the need for interpretability in multimodal fusion for domain-specific applications like earth observation, but it is incremental as it applies an existing explainability method to a new scenario.
The study tackled the problem of understanding how different data modalities influence model decisions in multimodal data fusion for wilderness mapping, showing that auxiliary data like land cover and night time light significantly benefits the task.
Jointly harnessing complementary features of multi-modal input data in a common latent space has been found to be beneficial long ago. However, the influence of each modality on the models decision remains a puzzle. This study proposes a deep learning framework for the modality-level interpretation of multimodal earth observation data in an end-to-end fashion. While leveraging an explainable machine learning method, namely Occlusion Sensitivity, the proposed framework investigates the influence of modalities under an early-fusion scenario in which the modalities are fused before the learning process. We show that the task of wilderness mapping largely benefits from auxiliary data such as land cover and night time light data.