ASSDFeb 18, 2022

Multi-view and Multi-modal Event Detection Utilizing Transformer-based Multi-sensor fusion

arXiv:2202.09124v121 citations
AI Analysis

This work addresses event detection in complex real-world settings for applications like surveillance or human activity monitoring, but it is incremental as it builds on existing sensor fusion and Transformer techniques.

The paper tackled multi-view and multi-modal event detection in real environments using distributed cameras and microphones with weak labels, proposing a Transformer-based multi-sensor fusion method (MultiTrans) that improved event detection performance and outperformed comparative methods in experiments on a newly collected dataset.

We tackle a challenging task: multi-view and multi-modal event detection that detects events in a wide-range real environment by utilizing data from distributed cameras and microphones and their weak labels. In this task, distributed sensors are utilized complementarily to capture events that are difficult to capture with a single sensor, such as a series of actions of people moving in an intricate room, or communication between people located far apart in a room. For sensors to cooperate effectively in such a situation, the system should be able to exchange information among sensors and combines information that is useful for identifying events in a complementary manner. For such a mechanism, we propose a Transformer-based multi-sensor fusion (MultiTrans) which combines multi-sensor data on the basis of the relationships between features of different viewpoints and modalities. In the experiments using a dataset newly collected for this task, our proposed method using MultiTrans improved the event detection performance and outperformed comparatives.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes