CVJul 2, 2024

Open Panoramic Segmentation

Junwei Zheng, Ruiping Liu, Yufan Chen, Kunyu Peng, Chengzhi Wu, Kailun Yang, Jiaming Zhang, Rainer Stiefelhagen

arXiv:2407.02685v217.822 citationsh-index: 40Has Code

Originality Incremental advance

AI Analysis

This addresses the cost and application restrictions in panoramic scene understanding for computer vision researchers and practitioners, though it is incremental as it builds on existing open-vocabulary segmentation methods.

The paper tackles the problem of training models for panoramic image segmentation without expensive dense annotations by introducing Open Panoramic Segmentation (OPS), a task that uses pinhole images in an open-vocabulary setting to achieve zero-shot segmentation on panoramic images, resulting in performance boosts of +2.2% on outdoor WildPASS and +2.4% mIoU on indoor Stanford2D3D datasets.

Panoramic images, capturing a 360° field of view (FoV), encompass omnidirectional spatial information crucial for scene understanding. However, it is not only costly to obtain training-sufficient dense-annotated panoramas but also application-restricted when training models in a close-vocabulary setting. To tackle this problem, in this work, we define a new task termed Open Panoramic Segmentation (OPS), where models are trained with FoV-restricted pinhole images in the source domain in an open-vocabulary setting while evaluated with FoV-open panoramic images in the target domain, enabling the zero-shot open panoramic semantic segmentation ability of models. Moreover, we propose a model named OOOPS with a Deformable Adapter Network (DAN), which significantly improves zero-shot panoramic semantic segmentation performance. To further enhance the distortion-aware modeling ability from the pinhole source domain, we propose a novel data augmentation method called Random Equirectangular Projection (RERP) which is specifically designed to address object deformations in advance. Surpassing other state-of-the-art open-vocabulary semantic segmentation approaches, a remarkable performance boost on three panoramic datasets, WildPASS, Stanford2D3D, and Matterport3D, proves the effectiveness of our proposed OOOPS model with RERP on the OPS task, especially +2.2% on outdoor WildPASS and +2.4% mIoU on indoor Stanford2D3D. The source code is publicly available at https://junweizheng93.github.io/publications/OPS/OPS.html.

View on arXiv PDF Code

Similar