CVIVMar 16, 2023

Cross-Dimensional Refined Learning for Real-Time 3D Visual Perception from Monocular Video

arXiv:2303.09248v26 citationsh-index: 9
Originality Incremental advance
AI Analysis

This work improves real-time 3D visual perception for applications like robotics or autonomous systems, but it appears incremental as it refines existing volumetric methods.

The paper tackles the problem of real-time 3D scene reconstruction from monocular video by addressing the lack of local geometric detail in volumetric approaches, achieving state-of-the-art efficiency in 3D perception on multiple datasets.

We present a novel real-time capable learning method that jointly perceives a 3D scene's geometry structure and semantic labels. Recent approaches to real-time 3D scene reconstruction mostly adopt a volumetric scheme, where a Truncated Signed Distance Function (TSDF) is directly regressed. However, these volumetric approaches tend to focus on the global coherence of their reconstructions, which leads to a lack of local geometric detail. To overcome this issue, we propose to leverage the latent geometric prior knowledge in 2D image features by explicit depth prediction and anchored feature generation, to refine the occupancy learning in TSDF volume. Besides, we find that this cross-dimensional feature refinement methodology can also be adopted for the semantic segmentation task by utilizing semantic priors. Hence, we proposed an end-to-end cross-dimensional refinement neural network (CDRNet) to extract both 3D mesh and 3D semantic labeling in real time. The experiment results show that this method achieves a state-of-the-art 3D perception efficiency on multiple datasets, which indicates the great potential of our method for industrial applications.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes