OccFeat: Self-supervised Occupancy Feature Prediction for Pretraining BEV Segmentation Networks
This work addresses the challenge of training BEV segmentation networks with limited labeled data, which is incremental as it builds on existing pretraining and distillation techniques.
The paper tackles the problem of improving Bird's-Eye-View (BEV) semantic segmentation for camera-only systems by introducing OccFeat, a self-supervised pretraining method that combines occupancy prediction and feature distillation, resulting in enhanced performance especially in low-data scenarios.
We introduce a self-supervised pretraining method, called OccFeat, for camera-only Bird's-Eye-View (BEV) segmentation networks. With OccFeat, we pretrain a BEV network via occupancy prediction and feature distillation tasks. Occupancy prediction provides a 3D geometric understanding of the scene to the model. However, the geometry learned is class-agnostic. Hence, we add semantic information to the model in the 3D space through distillation from a self-supervised pretrained image foundation model. Models pretrained with our method exhibit improved BEV semantic segmentation performance, particularly in low-data scenarios. Moreover, empirical results affirm the efficacy of integrating feature distillation with 3D occupancy prediction in our pretraining approach. Repository: https://github.com/valeoai/Occfeat