Spherical View Synthesis for Self-Supervised 360 Depth Estimation
This work addresses depth perception for omnidirectional vision, which is incremental as it adapts existing view synthesis methods to the spherical domain.
The paper tackled the problem of learning monocular 360-degree depth estimation by exploring spherical view synthesis as a self-supervised method, demonstrating its feasibility with results for horizontal, vertical, and trinocular baselines, and showing that view synthesis may not be the best approach for high-quality depth perception compared to direct supervision.
Learning based approaches for depth perception are limited by the availability of clean training data. This has led to the utilization of view synthesis as an indirect objective for learning depth estimation using efficient data acquisition procedures. Nonetheless, most research focuses on pinhole based monocular vision, with scarce works presenting results for omnidirectional input. In this work, we explore spherical view synthesis for learning monocular 360 depth in a self-supervised manner and demonstrate its feasibility. Under a purely geometrically derived formulation we present results for horizontal and vertical baselines, as well as for the trinocular case. Further, we show how to better exploit the expressiveness of traditional CNNs when applied to the equirectangular domain in an efficient manner. Finally, given the availability of ground truth depth data, our work is uniquely positioned to compare view synthesis against direct supervision in a consistent and fair manner. The results indicate that alternative research directions might be better suited to enable higher quality depth perception. Our data, models and code are publicly available at https://vcl3d.github.io/SphericalViewSynthesis/.