CVROJul 3, 2024

Lift, Splat, Map: Lifting Foundation Masks for Label-Free Semantic Scene Completion

arXiv:2407.03425v11 citationsh-index: 3
AI Analysis

This addresses the need for context-aware and occlusion-robust perception in urban autonomous robots, offering a label-free and real-time solution that is incremental over existing representation learning methods.

The paper tackles the problem of semantic scene completion for autonomous robots by proposing LSMap, which lifts masks from visual foundation models to predict a continuous, open-set semantic and elevation-aware representation in bird's eye view from a single RGBD image without human labels, outperforming existing models in semantic and elevation scene completion tasks.

Autonomous mobile robots deployed in urban environments must be context-aware, i.e., able to distinguish between different semantic entities, and robust to occlusions. Current approaches like semantic scene completion (SSC) require pre-enumerating the set of classes and costly human annotations, while representation learning methods relax these assumptions but are not robust to occlusions and learn representations tailored towards auxiliary tasks. To address these limitations, we propose LSMap, a method that lifts masks from visual foundation models to predict a continuous, open-set semantic and elevation-aware representation in bird's eye view (BEV) for the entire scene, including regions underneath dynamic entities and in occluded areas. Our model only requires a single RGBD image, does not require human labels, and operates in real time. We quantitatively demonstrate our approach outperforms existing models trained from scratch on semantic and elevation scene completion tasks with finetuning. Furthermore, we show that our pre-trained representation outperforms existing visual foundation models at unsupervised semantic scene completion. We evaluate our approach using CODa, a large-scale, real-world urban robot dataset. Supplementary visualizations, code, data, and pre-trained models, will be publicly available soon.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes