CVMar 8, 2025

VLScene: Vision-Language Guidance Distillation for Camera-Based 3D Semantic Scene Completion

arXiv:2503.06219v111 citationsh-index: 9AAAI
Originality Highly original
AI Analysis

This addresses the need for dense geometric and semantic perception in autonomous driving, representing a novel method for a known bottleneck.

The paper tackles the problem of camera-based 3D semantic scene completion in autonomous driving, which suffers from geometric ambiguity and limited semantic modeling, by proposing VLScene, a method that uses vision-language guidance distillation to enhance semantic priors and spatial context, achieving rank-1st performance with mIoU scores of 17.52 on SemanticKITTI and 19.10 on SSCBench-KITTI-360.

Camera-based 3D semantic scene completion (SSC) provides dense geometric and semantic perception for autonomous driving. However, images provide limited information making the model susceptible to geometric ambiguity caused by occlusion and perspective distortion. Existing methods often lack explicit semantic modeling between objects, limiting their perception of 3D semantic context. To address these challenges, we propose a novel method VLScene: Vision-Language Guidance Distillation for Camera-based 3D Semantic Scene Completion. The key insight is to use the vision-language model to introduce high-level semantic priors to provide the object spatial context required for 3D scene understanding. Specifically, we design a vision-language guidance distillation process to enhance image features, which can effectively capture semantic knowledge from the surrounding environment and improve spatial context reasoning. In addition, we introduce a geometric-semantic sparse awareness mechanism to propagate geometric structures in the neighborhood and enhance semantic information through contextual sparse interactions. Experimental results demonstrate that VLScene achieves rank-1st performance on challenging benchmarks--SemanticKITTI and SSCBench-KITTI-360, yielding remarkably mIoU scores of 17.52 and 19.10, respectively.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes