CVSep 3, 2024

EPRecon: An Efficient Framework for Real-Time Panoptic 3D Reconstruction from Monocular Video

Zhen Zhou, Yunkai Ma, Junfeng Fan, Shaolin Zhang, Fengshui Jing, Min Tan

arXiv:2409.01807v26.58 citationsh-index: 28Has Code

Originality Highly original

AI Analysis

This addresses a fundamental perceptual task in robotic scene understanding, offering practical improvements in speed and accuracy.

The paper tackles the problem of inefficient panoptic 3D reconstruction from monocular video by proposing EPRecon, which achieves real-time inference and superior reconstruction quality on the ScanNetV2 dataset compared to state-of-the-art methods.

Panoptic 3D reconstruction from a monocular video is a fundamental perceptual task in robotic scene understanding. However, existing efforts suffer from inefficiency in terms of inference speed and accuracy, limiting their practical applicability. We present EPRecon, an efficient real-time panoptic 3D reconstruction framework. Current volumetric-based reconstruction methods usually utilize multi-view depth map fusion to obtain scene depth priors, which is time-consuming and poses challenges to real-time scene reconstruction. To address this issue, we propose a lightweight module to directly estimate scene depth priors in a 3D volume for reconstruction quality improvement by generating occupancy probabilities of all voxels. In addition, compared with existing panoptic segmentation methods, EPRecon extracts panoptic features from both voxel features and corresponding image features, obtaining more detailed and comprehensive instance-level semantic information and achieving more accurate segmentation results. Experimental results on the ScanNetV2 dataset demonstrate the superiority of EPRecon over current state-of-the-art methods in terms of both panoptic 3D reconstruction quality and real-time inference. Code is available at https://github.com/zhen6618/EPRecon.

View on arXiv PDF Code

Similar