CVJun 22, 2023

RXFOOD: Plug-in RGB-X Fusion for Object of Interest Detection

arXiv:2306.12621v1h-index: 28
Originality Incremental advance
AI Analysis

This addresses the challenge of improving object detection in multimedia applications by enhancing fusion in dual-branch networks, though it is incremental as it builds on existing encoder-decoder architectures.

The paper tackles the problem of fusing multi-modal data (RGB-X) for object detection by proposing RXFOOD, a plug-in module that uses a unified attention mechanism to integrate features across scales and modalities, resulting in clear effectiveness demonstrated on tasks like RGB-NIR and RGB-D salient object detection.

The emergence of different sensors (Near-Infrared, Depth, etc.) is a remedy for the limited application scenarios of traditional RGB camera. The RGB-X tasks, which rely on RGB input and another type of data input to resolve specific problems, have become a popular research topic in multimedia. A crucial part in two-branch RGB-X deep neural networks is how to fuse information across modalities. Given the tremendous information inside RGB-X networks, previous works typically apply naive fusion (e.g., average or max fusion) or only focus on the feature fusion at the same scale(s). While in this paper, we propose a novel method called RXFOOD for the fusion of features across different scales within the same modality branch and from different modality branches simultaneously in a unified attention mechanism. An Energy Exchange Module is designed for the interaction of each feature map's energy matrix, who reflects the inter-relationship of different positions and different channels inside a feature map. The RXFOOD method can be easily incorporated to any dual-branch encoder-decoder network as a plug-in module, and help the original backbone network better focus on important positions and channels for object of interest detection. Experimental results on RGB-NIR salient object detection, RGB-D salient object detection, and RGBFrequency image manipulation detection demonstrate the clear effectiveness of the proposed RXFOOD.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes