CVApr 29, 2024

MiPa: Mixed Patch Infrared-Visible Modality Agnostic Object Detection

arXiv:2404.18849v26 citationsh-index: 19Has CodeWACV
Originality Incremental advance
AI Analysis

This addresses the need for efficient multimodal object detection in applications like autonomous driving and surveillance by reducing memory footprint, though it is incremental in improving modality balance.

The paper tackles the problem of training a single shared vision encoder for object detection using both RGB and infrared modalities, where only one modality is observed at inference, and introduces MiPa, a training technique that mixes patches from both modalities to counter modality imbalance, achieving competitive results on RGB/IR benchmarks.

In real-world scenarios, using multiple modalities like visible (RGB) and infrared (IR) can greatly improve the performance of a predictive task such as object detection (OD). Multimodal learning is a common way to leverage these modalities, where multiple modality-specific encoders and a fusion module are used to improve performance. In this paper, we tackle a different way to employ RGB and IR modalities, where only one modality or the other is observed by a single shared vision encoder. This realistic setting requires a lower memory footprint and is more suitable for applications such as autonomous driving and surveillance, which commonly rely on RGB and IR data. However, when learning a single encoder on multiple modalities, one modality can dominate the other, producing uneven recognition results. This work investigates how to efficiently leverage RGB and IR modalities to train a common transformer-based OD vision encoder, while countering the effects of modality imbalance. For this, we introduce a novel training technique to Mix Patches (MiPa) from the two modalities, in conjunction with a patch-wise modality agnostic module, for learning a common representation of both modalities. Our experiments show that MiPa can learn a representation to reach competitive results on traditional RGB/IR benchmarks while only requiring a single modality during inference. Our code is available at: https://github.com/heitorrapela/MiPa.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes