CVSDASDec 14, 2023

Segment Beyond View: Handling Partially Missing Modality for Audio-Visual Semantic Segmentation

arXiv:2312.08673v312 citationsh-index: 29AAAI
Originality Incremental advance
AI Analysis

This addresses user safety in AR devices by enabling detection of oncoming vehicles beyond the camera's view, though it is an incremental improvement in multimodal segmentation.

The paper tackles the problem of semantic segmentation for objects outside the camera's field-of-view in AR devices by proposing SBV, an audio-visual method that uses teacher-student distillation to supplement visual data with auditory information, achieving improved performance over existing models across varying conditions.

Augmented Reality (AR) devices, emerging as prominent mobile interaction platforms, face challenges in user safety, particularly concerning oncoming vehicles. While some solutions leverage onboard camera arrays, these cameras often have limited field-of-view (FoV) with front or downward perspectives. Addressing this, we propose a new out-of-view semantic segmentation task and Segment Beyond View (SBV), a novel audio-visual semantic segmentation method. SBV supplements the visual modality, which miss the information beyond FoV, with the auditory information using a teacher-student distillation model (Omni2Ego). The model consists of a vision teacher utilising panoramic information, an auditory teacher with 8-channel audio, and an audio-visual student that takes views with limited FoV and binaural audio as input and produce semantic segmentation for objects outside FoV. SBV outperforms existing models in comparative evaluations and shows a consistent performance across varying FoV ranges and in monaural audio settings.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes