CVDec 9, 2024

SphereUFormer: A U-Shaped Transformer for Spherical 360 Perception

arXiv:2412.06968v16 citationsh-index: 6CVPR
Originality Highly original
AI Analysis

This addresses the challenge of accurate 360° perception for applications like VR/AR and autonomous systems, offering a novel transformer-based solution that improves over previous methods.

The paper tackles the problem of omnidirectional 360° perception by proposing SphereUFormer, a transformer-based architecture that operates directly in the spherical domain to avoid distortions from equirectangular projection. It outperforms state-of-the-art methods on benchmarks for depth estimation and semantic segmentation.

This paper proposes a novel method for omnidirectional 360$\degree$ perception. Most common previous methods relied on equirectangular projection. This representation is easily applicable to 2D operation layers but introduces distortions into the image. Other methods attempted to remove the distortions by maintaining a sphere representation but relied on complicated convolution kernels that failed to show competitive results. In this work, we introduce a transformer-based architecture that, by incorporating a novel ``Spherical Local Self-Attention'' and other spherically-oriented modules, successfully operates in the spherical domain and outperforms the state-of-the-art in 360$\degree$ perception benchmarks for depth estimation and semantic segmentation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes