CVSep 1, 2023

RigNet++: Semantic Assisted Repetitive Image Guided Network for Depth Completion

arXiv:2309.00655v413 citations
Originality Incremental advance
AI Analysis

This addresses depth completion for computer vision applications like robotics and AR, but is incremental as it builds on existing image-guided frameworks with architectural refinements.

The paper tackles the problem of depth completion from sparse depth maps by proposing RigNet++, which uses repetitive network designs and semantic guidance from SAM to improve performance. The method achieves state-of-the-art results on six benchmark datasets including KITTI, NYUv2, and a new TOFDC dataset collected from smartphone sensors.

Depth completion aims to recover dense depth maps from sparse ones, where color images are often used to facilitate this task. Recent depth methods primarily focus on image guided learning frameworks. However, blurry guidance in the image and unclear structure in the depth still impede their performance. To tackle these challenges, we explore a repetitive design in our image guided network to gradually and sufficiently recover depth values. Specifically, the repetition is embodied in both the image guidance branch and depth generation branch. In the former branch, we design a dense repetitive hourglass network (DRHN) to extract discriminative image features of complex environments, which can provide powerful contextual instruction for depth prediction. In the latter branch, we present a repetitive guidance (RG) module based on dynamic convolution, in which an efficient convolution factorization is proposed to reduce the complexity while modeling high-frequency structures progressively. Furthermore, in the semantic guidance branch, we utilize the well-known large vision model, i.e., segment anything (SAM), to supply RG with semantic prior. In addition, we propose a region-aware spatial propagation network (RASPN) for further depth refinement based on the semantic prior constraint. Finally, we collect a new dataset termed TOFDC for the depth completion task, which is acquired by the time-of-flight (TOF) sensor and the color camera on smartphones. Extensive experiments demonstrate that our method achieves state-of-the-art performance on KITTI, NYUv2, Matterport3D, 3D60, VKITTI, and our TOFDC.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes