CVLGApr 22, 2024

OccFeat: Self-supervised Occupancy Feature Prediction for Pretraining BEV Segmentation Networks

arXiv:2404.14027v322 citationsh-index: 36Has Code2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
Originality Incremental advance
AI Analysis

This work addresses the challenge of training BEV segmentation networks with limited labeled data, which is incremental as it builds on existing pretraining and distillation techniques.

The paper tackles the problem of improving Bird's-Eye-View (BEV) semantic segmentation for camera-only systems by introducing OccFeat, a self-supervised pretraining method that combines occupancy prediction and feature distillation, resulting in enhanced performance especially in low-data scenarios.

We introduce a self-supervised pretraining method, called OccFeat, for camera-only Bird's-Eye-View (BEV) segmentation networks. With OccFeat, we pretrain a BEV network via occupancy prediction and feature distillation tasks. Occupancy prediction provides a 3D geometric understanding of the scene to the model. However, the geometry learned is class-agnostic. Hence, we add semantic information to the model in the 3D space through distillation from a self-supervised pretrained image foundation model. Models pretrained with our method exhibit improved BEV semantic segmentation performance, particularly in low-data scenarios. Moreover, empirical results affirm the efficacy of integrating feature distillation with 3D occupancy prediction in our pretraining approach. Repository: https://github.com/valeoai/Occfeat

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes