Instance-wise Occlusion and Depth Orders in Natural Scenes
This work addresses the challenge of interpreting instance-wise occlusion and depth orders in natural scenes for computer vision applications, representing an incremental advancement with new datasets and models.
The authors tackled the problem of understanding geometric relationships between instances in images by introducing a new dataset, InstaOrder, with 2.9M annotations across 101K natural scenes, and developed InstaOrderNet and InstaDepthNet, which improved state-of-the-art depth prediction accuracy.
In this paper, we introduce a new dataset, named InstaOrder, that can be used to understand the geometrical relationships of instances in an image. The dataset consists of 2.9M annotations of geometric orderings for class-labeled instances in 101K natural scenes. The scenes were annotated by 3,659 crowd-workers regarding (1) occlusion order that identifies occluder/occludee and (2) depth order that describes ordinal relations that consider relative distance from the camera. The dataset provides joint annotation of two kinds of orderings for the same instances, and we discover that the occlusion order and depth order are complementary. We also introduce a geometric order prediction network called InstaOrderNet, which is superior to state-of-the-art approaches. Moreover, we propose a dense depth prediction network called InstaDepthNet that uses auxiliary geometric order loss to boost the accuracy of the state-of-the-art depth prediction approach, MiDaS [56].