CVAIROJun 28, 2025

Single-Frame Point-Pixel Registration via Supervised Cross-Modal Feature Matching

arXiv:2506.22784v1h-index: 7IEEE Trans Autom Sci Eng
Originality Incremental advance
AI Analysis

This addresses a fundamental challenge in autonomous driving and robotic perception by improving cross-modal registration reliability under sparse LiDAR conditions, though it is incremental as it builds on existing detector-free paradigms.

The paper tackles point-pixel registration between LiDAR point clouds and camera images under sparse single-frame settings by introducing a detector-free framework that projects LiDAR intensity maps and uses an attention-based network with repeatability scoring, achieving state-of-the-art performance on benchmarks like nuScenes without multi-frame accumulation.

Point-pixel registration between LiDAR point clouds and camera images is a fundamental yet challenging task in autonomous driving and robotic perception. A key difficulty lies in the modality gap between unstructured point clouds and structured images, especially under sparse single-frame LiDAR settings. Existing methods typically extract features separately from point clouds and images, then rely on hand-crafted or learned matching strategies. This separate encoding fails to bridge the modality gap effectively, and more critically, these methods struggle with the sparsity and noise of single-frame LiDAR, often requiring point cloud accumulation or additional priors to improve reliability. Inspired by recent progress in detector-free matching paradigms (e.g. MatchAnything), we revisit the projection-based approach and introduce the detector-free framework for direct point-pixel matching between LiDAR and camera views. Specifically, we project the LiDAR intensity map into a 2D view from the LiDAR perspective and feed it into an attention-based detector-free matching network, enabling cross-modal correspondence estimation without relying on multi-frame accumulation. To further enhance matching reliability, we introduce a repeatability scoring mechanism that acts as a soft visibility prior. This guides the network to suppress unreliable matches in regions with low intensity variation, improving robustness under sparse input. Extensive experiments on KITTI, nuScenes, and MIAS-LCEC-TF70 benchmarks demonstrate that our method achieves state-of-the-art performance, outperforming prior approaches on nuScenes (even those relying on accumulated point clouds), despite using only single-frame LiDAR.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes