CVAug 7, 2023

Distortion-aware Transformer in 360° Salient Object Detection

arXiv:2308.03359v117 citationsh-index: 37
Originality Incremental advance
AI Analysis

This addresses a key bottleneck in VR/AR applications by improving salient object detection in 360° data, though it is incremental as it builds on existing Transformer methods.

The paper tackles the problem of distortions in 360° salient object detection caused by equirectangular projection, proposing a Transformer-based model (DATFormer) that introduces distortion-adaptive modules and a learnable relation matrix, achieving state-of-the-art performance on three public datasets.

With the emergence of VR and AR, 360° data attracts increasing attention from the computer vision and multimedia communities. Typically, 360° data is projected into 2D ERP (equirectangular projection) images for feature extraction. However, existing methods cannot handle the distortions that result from the projection, hindering the development of 360-data-based tasks. Therefore, in this paper, we propose a Transformer-based model called DATFormer to address the distortion problem. We tackle this issue from two perspectives. Firstly, we introduce two distortion-adaptive modules. The first is a Distortion Mapping Module, which guides the model to pre-adapt to distorted features globally. The second module is a Distortion-Adaptive Attention Block that reduces local distortions on multi-scale features. Secondly, to exploit the unique characteristics of 360° data, we present a learnable relation matrix and use it as part of the positional embedding to further improve performance. Extensive experiments are conducted on three public datasets, and the results show that our model outperforms existing 2D SOD (salient object detection) and 360 SOD methods.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes