CVJul 24, 2021

ASOD60K: An Audio-Induced Salient Object Detection Dataset for Panoramic Videos

arXiv:2107.11629v48 citationsHas Code
Originality Synthesis-oriented
AI Analysis

This work addresses audio-induced salient object detection for applications like augmented reality, but it is incremental as it extends existing saliency detection to panoramic videos with a new dataset.

The authors tackled the problem of segmenting salient objects in panoramic videos by introducing a new task called PV-SOD and collecting the first large-scale dataset, ASOD60K, with 4K-resolution frames and six-level annotations, benchmarking 11 approaches to derive findings for advancing research.

Exploring to what humans pay attention in dynamic panoramic scenes is useful for many fundamental applications, including augmented reality (AR) in retail, AR-powered recruitment, and visual language navigation. With this goal in mind, we propose PV-SOD, a new task that aims to segment salient objects from panoramic videos. In contrast to existing fixation-/object-level saliency detection tasks, we focus on audio-induced salient object detection (SOD), where the salient objects are labeled with the guidance of audio-induced eye movements. To support this task, we collect the first large-scale dataset, named ASOD60K, which contains 4K-resolution video frames annotated with a six-level hierarchy, thus distinguishing itself with richness, diversity and quality. Specifically, each sequence is marked with both its super-/sub-class, with objects of each sub-class being further annotated with human eye fixations, bounding boxes, object-/instance-level masks, and associated attributes (e.g., geometrical distortion). These coarse-to-fine annotations enable detailed analysis for PV-SOD modelling, e.g., determining the major challenges for existing SOD models, and predicting scanpaths to study the long-term eye fixation behaviors of humans. We systematically benchmark 11 representative approaches on ASOD60K and derive several interesting findings. We hope this study could serve as a good starting point for advancing SOD research towards panoramic videos. The dataset and benchmark will be made publicly available at https://github.com/PanoAsh/ASOD60K.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes