CVMay 7, 2024

ELiTe: Efficient Image-to-LiDAR Knowledge Transfer for Semantic Segmentation

arXiv:2405.04121v13 citationsh-index: 6ICME
AI Analysis

This addresses the problem of inefficient knowledge transfer in LiDAR segmentation for autonomous driving applications, representing a novel method for a known bottleneck.

The paper tackles the weak teacher challenge in cross-modal knowledge transfer for LiDAR semantic segmentation by proposing ELiTe, which transfers knowledge from vision foundation models to lightweight student models using patch-to-point distillation and pseudo-label generation, achieving state-of-the-art results on SemanticKITTI with fewer parameters.

Cross-modal knowledge transfer enhances point cloud representation learning in LiDAR semantic segmentation. Despite its potential, the \textit{weak teacher challenge} arises due to repetitive and non-diverse car camera images and sparse, inaccurate ground truth labels. To address this, we propose the Efficient Image-to-LiDAR Knowledge Transfer (ELiTe) paradigm. ELiTe introduces Patch-to-Point Multi-Stage Knowledge Distillation, transferring comprehensive knowledge from the Vision Foundation Model (VFM), extensively trained on diverse open-world images. This enables effective knowledge transfer to a lightweight student model across modalities. ELiTe employs Parameter-Efficient Fine-Tuning to strengthen the VFM teacher and expedite large-scale model training with minimal costs. Additionally, we introduce the Segment Anything Model based Pseudo-Label Generation approach to enhance low-quality image labels, facilitating robust semantic representations. Efficient knowledge transfer in ELiTe yields state-of-the-art results on the SemanticKITTI benchmark, outperforming real-time inference models. Our approach achieves this with significantly fewer parameters, confirming its effectiveness and efficiency.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes