CVROAug 7, 2024

VPOcc: Exploiting Vanishing Point for 3D Semantic Occupancy Prediction

arXiv:2408.03551v21 citationsh-index: 4
Originality Incremental advance
AI Analysis

This work addresses a key challenge for robots and autonomous vehicles in understanding 3D scenes from 2D images, though it appears incremental as it builds on existing methods with specific enhancements.

The paper tackles the 2D-3D discrepancy in camera-based 3D semantic occupancy prediction by proposing VPOcc, a framework that uses a vanishing point to mitigate this issue at pixel and feature levels, achieving improvements in IoU and mIoU metrics on SemanticKITTI and SSCBench-KITTI360 datasets.

Understanding 3D scenes semantically and spatially is crucial for the safe navigation of robots and autonomous vehicles, aiding obstacle avoidance and accurate trajectory planning. Camera-based 3D semantic occupancy prediction, which infers complete voxel grids from 2D images, is gaining importance in robot vision for its resource efficiency compared to 3D sensors. However, this task inherently suffers from a 2D-3D discrepancy, where objects of the same size in 3D space appear at different scales in a 2D image depending on their distance from the camera due to perspective projection. To tackle this issue, we propose a novel framework called VPOcc that leverages a vanishing point (VP) to mitigate the 2D-3D discrepancy at both the pixel and feature levels. As a pixel-level solution, we introduce a VPZoomer module, which warps images by counteracting the perspective effect using a VP-based homography transformation. In addition, as a feature-level solution, we propose a VP-guided cross-attention (VPCA) module that performs perspective-aware feature aggregation, utilizing 2D image features that are more suitable for 3D space. Lastly, we integrate two feature volumes extracted from the original and warped images to compensate for each other through a spatial volume fusion (SVF) module. By effectively incorporating VP into the network, our framework achieves improvements in both IoU and mIoU metrics on SemanticKITTI and SSCBench-KITTI360 datasets. Additional details are available at https://vision3d-lab.github.io/vpocc/.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes