CVROApr 3, 2025

MinkOcc: Towards real-time label-efficient semantic occupancy prediction

arXiv:2504.02270v12 citationsh-index: 4IROS
Originality Incremental advance
AI Analysis

This addresses the label efficiency problem for autonomous driving applications, though it appears incremental as it builds on existing sensor fusion and semi-supervised techniques.

The paper tackles the problem of expensive 3D annotations for semantic occupancy prediction by introducing MinkOcc, a multi-modal framework that uses a semi-supervised training procedure with vision foundational models, reducing manual labeling by 90% while maintaining competitive accuracy.

Developing 3D semantic occupancy prediction models often relies on dense 3D annotations for supervised learning, a process that is both labor and resource-intensive, underscoring the need for label-efficient or even label-free approaches. To address this, we introduce MinkOcc, a multi-modal 3D semantic occupancy prediction framework for cameras and LiDARs that proposes a two-step semi-supervised training procedure. Here, a small dataset of explicitly 3D annotations warm-starts the training process; then, the supervision is continued by simpler-to-annotate accumulated LiDAR sweeps and images -- semantically labelled through vision foundational models. MinkOcc effectively utilizes these sensor-rich supervisory cues and reduces reliance on manual labeling by 90\% while maintaining competitive accuracy. In addition, the proposed model incorporates information from LiDAR and camera data through early fusion and leverages sparse convolution networks for real-time prediction. With its efficiency in both supervision and computation, we aim to extend MinkOcc beyond curated datasets, enabling broader real-world deployment of 3D semantic occupancy prediction in autonomous driving.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes