Multimodal Object Detection via Probabilistic a priori Information Integration
This work addresses multimodal object detection in remote sensing, where modalities are not strictly aligned, offering a solution for scenarios where only one modality contains the target object.
The paper tackled the problem of low-quality multimodal data with misaligned modalities in remote sensing object detection by converting contextual binary information into probability maps and using an early fusion architecture, achieving validation through extensive experiments on the DOTA dataset.
Multimodal object detection has shown promise in remote sensing. However, multimodal data frequently encounter the problem of low-quality, wherein the modalities lack strict cell-to-cell alignment, leading to mismatch between different modalities. In this paper, we investigate multimodal object detection where only one modality contains the target object and the others provide crucial contextual information. We propose to resolve the alignment problem by converting the contextual binary information into probability maps. We then propose an early fusion architecture that we validate with extensive experiments on the DOTA dataset.