Tangent Images for Mitigating Spherical Distortion
This addresses the problem of spherical distortion for researchers and practitioners in computer vision, offering a scalable and transferable solution for tasks like structure-from-motion and SLAM, though it is incremental as it builds on existing icosahedral approximations and standard CNNs.
The paper tackles the problem of spherical image distortion in 360° computer vision by proposing tangent images, a representation that renders spherical images to locally-planar grids on an icosahedron, reducing distortion and enabling high-resolution handling. The result shows that training standard CNNs on tangent images outperforms specialized spherical kernels, scales efficiently to higher resolutions, and allows transfer from perspective images with limited performance drop-off.
In this work, we propose "tangent images," a spherical image representation that facilitates transferable and scalable $360^\circ$ computer vision. Inspired by techniques in cartography and computer graphics, we render a spherical image to a set of distortion-mitigated, locally-planar image grids tangent to a subdivided icosahedron. By varying the resolution of these grids independently of the subdivision level, we can effectively represent high resolution spherical images while still benefiting from the low-distortion icosahedral spherical approximation. We show that training standard convolutional neural networks on tangent images compares favorably to the many specialized spherical convolutional kernels that have been developed, while also scaling efficiently to handle significantly higher spherical resolutions. Furthermore, because our approach does not require specialized kernels, we show that we can transfer networks trained on perspective images to spherical data without fine-tuning and with limited performance drop-off. Finally, we demonstrate that tangent images can be used to improve the quality of sparse feature detection on spherical images, illustrating its usefulness for traditional computer vision tasks like structure-from-motion and SLAM.