CVLGDec 13, 2022

LidarCLIP or: How I Learned to Talk to Point Clouds

arXiv:2212.06858v334 citationsh-index: 19Has Code
Originality Incremental advance
AI Analysis

This work addresses the lack of text-lidar integration for applications like autonomous driving, offering a novel approach to bridge this gap, though it builds incrementally on existing CLIP technology.

The paper tackles the problem of connecting text to lidar data by proposing LidarCLIP, a method that maps automotive point clouds to a pre-existing CLIP embedding space, resulting in lidar-based retrieval on par with image-based retrieval and improved performance in tasks like zero-shot classification and multimodal applications.

Research connecting text and images has recently seen several breakthroughs, with models like CLIP, DALL-E 2, and Stable Diffusion. However, the connection between text and other visual modalities, such as lidar data, has received less attention, prohibited by the lack of text-lidar datasets. In this work, we propose LidarCLIP, a mapping from automotive point clouds to a pre-existing CLIP embedding space. Using image-lidar pairs, we supervise a point cloud encoder with the image CLIP embeddings, effectively relating text and lidar data with the image domain as an intermediary. We show the effectiveness of LidarCLIP by demonstrating that lidar-based retrieval is generally on par with image-based retrieval, but with complementary strengths and weaknesses. By combining image and lidar features, we improve upon both single-modality methods and enable a targeted search for challenging detection scenarios under adverse sensor conditions. We also explore zero-shot classification and show that LidarCLIP outperforms existing attempts to use CLIP for point clouds by a large margin. Finally, we leverage our compatibility with CLIP to explore a range of applications, such as point cloud captioning and lidar-to-image generation, without any additional training. Code and pre-trained models are available at https://github.com/atonderski/lidarclip.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes