CVOct 13, 2023

Equirectangular image construction method for standard CNNs for Semantic Segmentation

arXiv:2310.09122v1h-index: 11
Originality Synthesis-oriented
AI Analysis

This addresses a domain-specific challenge in processing 360° spherical images for computer vision applications, with incremental improvements over existing methods.

The paper tackles the problem of semantic segmentation for equirectangular images, which are distorted and lack translation invariance, by proposing a method to convert perspective images into equirectangular images using inverse transformations, achieving an average IoU of 43.76%.

360° spherical images have advantages of wide view field, and are typically projected on a planar plane for processing, which is known as equirectangular image. The object shape in equirectangular images can be distorted and lack translation invariance. In addition, there are few publicly dataset of equirectangular images with labels, which presents a challenge for standard CNNs models to process equirectangular images effectively. To tackle this problem, we propose a methodology for converting a perspective image into equirectangular image. The inverse transformation of the spherical center projection and the equidistant cylindrical projection are employed. This enables the standard CNNs to learn the distortion features at different positions in the equirectangular image and thereby gain the ability to semantically the equirectangular image. The parameter, φ, which determines the projection position of the perspective image, has been analyzed using various datasets and models, such as UNet, UNet++, SegNet, PSPNet, and DeepLab v3+. The experiments demonstrate that an optimal value of φ for effective semantic segmentation of equirectangular images is 6π/16 for standard CNNs. Compared with the other three types of methods (supervised learning, unsupervised learning and data augmentation), the method proposed in this paper has the best average IoU value of 43.76%. This value is 23.85%, 10.7% and 17.23% higher than those of other three methods, respectively.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes