CVMar 8, 2025

Vision-based 3D Semantic Scene Completion via Capture Dynamic Representations

arXiv:2503.06222v21 citationsh-index: 9Knowledge-Based Systems
Originality Highly original
AI Analysis

It addresses a critical issue for autonomous driving by improving robustness in dynamic scenes, though it is incremental as it builds on existing methods with a novel fusion approach.

The paper tackles the problem of dynamic objects degrading accuracy in vision-based 3D semantic scene completion from 2D images, proposing CDScene which decouples dynamic and static features and fuses them adaptively, achieving state-of-the-art results on benchmarks like SemanticKITTI.

The vision-based semantic scene completion task aims to predict dense geometric and semantic 3D scene representations from 2D images. However, the presence of dynamic objects in the scene seriously affects the accuracy of the model inferring 3D structures from 2D images. Existing methods simply stack multiple frames of image input to increase dense scene semantic information, but ignore the fact that dynamic objects and non-texture areas violate multi-view consistency and matching reliability. To address these issues, we propose a novel method, CDScene: Vision-based Robust Semantic Scene Completion via Capturing Dynamic Representations. First, we leverage a multimodal large-scale model to extract 2D explicit semantics and align them into 3D space. Second, we exploit the characteristics of monocular and stereo depth to decouple scene information into dynamic and static features. The dynamic features contain structural relationships around dynamic objects, and the static features contain dense contextual spatial information. Finally, we design a dynamic-static adaptive fusion module to effectively extract and aggregate complementary features, achieving robust and accurate semantic scene completion in autonomous driving scenarios. Extensive experimental results on the SemanticKITTI, SSCBench-KITTI360, and SemanticKITTI-C datasets demonstrate the superiority and robustness of CDScene over existing state-of-the-art methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes