CV LG ROAug 25, 2024

TraIL-Det: Transformation-Invariant Local Feature Networks for 3D LiDAR Object Detection with Unsupervised Pre-Training

Li Li, Tanqiu Qiao, Hubert P. H. Shum, Toby P. Breckon

arXiv:2408.13902v15.22 citationsh-index: 45

Originality Incremental advance

AI Analysis

This work addresses the challenge of robust 3D object detection for autonomous driving by improving invariance to transformations and point density variations, representing an incremental advance over existing self-supervised methods.

The paper tackled the problem of inadequate isometric invariance in 3D LiDAR object detection by introducing Transformation-Invariant Local (TraIL) features and the TraIL-Det architecture, achieving mAP scores of 67.8 on KITTI and 68.9 on Waymo datasets under 20% label conditions.

3D point clouds are essential for perceiving outdoor scenes, especially within the realm of autonomous driving. Recent advances in 3D LiDAR Object Detection focus primarily on the spatial positioning and distribution of points to ensure accurate detection. However, despite their robust performance in variable conditions, these methods are hindered by their sole reliance on coordinates and point intensity, resulting in inadequate isometric invariance and suboptimal detection outcomes. To tackle this challenge, our work introduces Transformation-Invariant Local (TraIL) features and the associated TraIL-Det architecture. Our TraIL features exhibit rigid transformation invariance and effectively adapt to variations in point density, with a design focus on capturing the localized geometry of neighboring structures. They utilize the inherent isotropic radiation of LiDAR to enhance local representation, improve computational efficiency, and boost detection performance. To effectively process the geometric relations among points within each proposal, we propose a Multi-head self-Attention Encoder (MAE) with asymmetric geometric features to encode high-dimensional TraIL features into manageable representations. Our method outperforms contemporary self-supervised 3D object detection approaches in terms of mAP on KITTI (67.8, 20% label, moderate) and Waymo (68.9, 20% label, moderate) datasets under various label ratios (20%, 50%, and 100%).

View on arXiv PDF

Similar