CVMay 24

VEOcc: Voxel-Centric Online Semantic Occupancy Prediction For Embodied Scene Understanding

Ruoyu Wang, Yong Liu, Sheng Tao, Yuhang Lin, Yukai Ma

arXiv:2605.2505958.8

Predicted impact top 67% in CV · last 90 daysOriginality Incremental advance

AI Analysis

For autonomous exploration systems, VEOcc provides a more efficient and accurate online mapping solution that works without predefined scene priors.

VEOcc introduces a voxel-centric framework for online 3D occupancy prediction that eliminates the need for initial scale estimation, achieving state-of-the-art performance on Occ-ScanNet and EmbodiedOcc-ScanNet benchmarks with robust out-of-distribution generalization.

Crucial for autonomous exploration, online 3D occupancy prediction and mapping incrementally constructs dense spatial representations on the fly. However, recent Gaussian-centric methods struggle with structural boundary fidelity and rely heavily on predefined scene-size priors, fundamentally limiting their operational efficiency. In this work, we present VEOcc, a voxel-centric framework formulated as a recursive perception-and-assimilation paradigm. By eliminating the need for initial scale estimation, VEOcc enables highly streamlined, open-ended map expansion. Furthermore, to robustly aggregate noisy temporal observations within the discrete voxel space, we propose a Spatio-Temporal-Aware Online Update Strategy. It integrates Cross-Temporal Logit Aggregation (TLA) for temporal consistency, Reliability-Aware Confidence Modulation (RCM) for spatial uncertainty calibration, and Confidence-Driven Incremental State Update (CSU) for robust global state assimilation. % Extensive experiments on Occ-ScanNet and EmbodiedOcc-ScanNet demonstrate that VEOcc establishes new state-of-the-art performance in both local and embodied settings, providing an accurate and efficient solution for real-world exploration. Extensive experiments on Occ-ScanNet and EmbodiedOcc-ScanNet demonstrate that VEOcc establishes new state-of-the-art performance in both local and embodied settings. Notably, zero-shot evaluations on self-collected video sequences further confirm its robust out-of-distribution generalization capability in completely unseen real-world environments. Ultimately, our framework provides an accurate and highly efficient solution for autonomous exploration. Code and supplementary visualizations are available on our project page: https://wryzju.github.io/VEOcc/.

View on arXiv PDF

Similar