ASCVSDMar 15, 2020

A proto-object based audiovisual saliency map

arXiv:2003.06779v1
AI Analysis

This incremental work addresses scene analysis for applications like surveillance and robotics by integrating multisensory information.

The study tackled the problem of analyzing dynamic natural scenes by developing a proto-object based audiovisual saliency map (AVSM) using vision and audio data, and demonstrated that it captures more valid salient events than unisensory maps and aligns with human judgment.

Natural environment and our interaction with it is essentially multisensory, where we may deploy visual, tactile and/or auditory senses to perceive, learn and interact with our environment. Our objective in this study is to develop a scene analysis algorithm using multisensory information, specifically vision and audio. We develop a proto-object based audiovisual saliency map (AVSM) for the analysis of dynamic natural scenes. A specialized audiovisual camera with $360 \degree$ Field of View, capable of locating sound direction, is used to collect spatiotemporally aligned audiovisual data. We demonstrate that the performance of proto-object based audiovisual saliency map in detecting and localizing salient objects/events is in agreement with human judgment. In addition, the proto-object based AVSM that we compute as a linear combination of visual and auditory feature conspicuity maps captures a higher number of valid salient events compared to unisensory saliency maps. Such an algorithm can be useful in surveillance, robotic navigation, video compression and related applications.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes