CVMar 18, 2024

HVDistill: Transferring Knowledge from Images to Point Clouds via Unsupervised Hybrid-View Distillation

arXiv:2403.11817v117 citationsh-index: 23Int J Comput Vis
Originality Incremental advance
AI Analysis

This work addresses the challenge of enhancing point cloud feature learning using unlabeled image data, which is incremental as it builds on existing knowledge distillation and multi-view techniques.

The paper tackles the problem of transferring knowledge from images to point clouds without supervision by proposing HVDistill, a hybrid-view distillation framework that uses geometric relationships between cameras and LiDAR to establish correspondences, achieving consistent improvements over baselines and outperforming existing schemes on datasets like nuScenes, SemanticKITTI, and KITTI.

We present a hybrid-view-based knowledge distillation framework, termed HVDistill, to guide the feature learning of a point cloud neural network with a pre-trained image network in an unsupervised manner. By exploiting the geometric relationship between RGB cameras and LiDAR sensors, the correspondence between the two modalities based on both image-plane view and bird-eye view can be established, which facilitates representation learning. Specifically, the image-plane correspondences can be simply obtained by projecting the point clouds, while the bird-eye-view correspondences can be achieved by lifting pixels to the 3D space with the predicted depths under the supervision of projected point clouds. The image teacher networks provide rich semantics from the image-plane view and meanwhile acquire geometric information from the bird-eye view. Indeed, image features from the two views naturally complement each other and together can ameliorate the learned feature representation of the point cloud student networks. Moreover, with a self-supervised pre-trained 2D network, HVDistill requires neither 2D nor 3D annotations. We pre-train our model on nuScenes dataset and transfer it to several downstream tasks on nuScenes, SemanticKITTI, and KITTI datasets for evaluation. Extensive experimental results show that our method achieves consistent improvements over the baseline trained from scratch and significantly outperforms the existing schemes. Codes are available at git@github.com:zhangsha1024/HVDistill.git.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes