CVAIDec 28, 2023

EvPlug: Learn a Plug-and-Play Module for Event and Image Fusion

arXiv:2312.16933v11 citationsh-index: 9
Originality Incremental advance
AI Analysis

This work addresses multi-modal fusion for vision tasks like object detection and segmentation, offering a practical solution for applications requiring high temporal resolution, but it is incremental as it builds on existing RGB-based models.

The paper tackles the challenge of integrating event cameras with RGB cameras for vision tasks by proposing EvPlug, a plug-and-play fusion module that uses unlabeled event-image pairs to enhance RGB-based models, resulting in improved robustness to high dynamic range and fast motion scenes without altering the original model structure.

Event cameras and RGB cameras exhibit complementary characteristics in imaging: the former possesses high dynamic range (HDR) and high temporal resolution, while the latter provides rich texture and color information. This makes the integration of event cameras into middle- and high-level RGB-based vision tasks highly promising. However, challenges arise in multi-modal fusion, data annotation, and model architecture design. In this paper, we propose EvPlug, which learns a plug-and-play event and image fusion module from the supervision of the existing RGB-based model. The learned fusion module integrates event streams with image features in the form of a plug-in, endowing the RGB-based model to be robust to HDR and fast motion scenes while enabling high temporal resolution inference. Our method only requires unlabeled event-image pairs (no pixel-wise alignment required) and does not alter the structure or weights of the RGB-based model. We demonstrate the superiority of EvPlug in several vision tasks such as object detection, semantic segmentation, and 3D hand pose estimation

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes