CVJun 4, 2025

FALO: Fast and Accurate LiDAR 3D Object Detection on Resource-Constrained Devices

arXiv:2506.04499v1h-index: 81
Originality Incremental advance
AI Analysis

This addresses the challenge of deploying efficient 3D object detection for autonomous systems on compact, embedded devices, representing an incremental improvement focused on speed and hardware compatibility.

The paper tackles the problem of running LiDAR 3D object detection on resource-constrained edge devices by proposing FALO, a hardware-friendly method that achieves competitive accuracy on benchmarks like nuScenes and Waymo while being 1.6 to 9.8 times faster than state-of-the-art methods on mobile GPUs and NPUs.

Existing LiDAR 3D object detection methods predominantely rely on sparse convolutions and/or transformers, which can be challenging to run on resource-constrained edge devices, due to irregular memory access patterns and high computational costs. In this paper, we propose FALO, a hardware-friendly approach to LiDAR 3D detection, which offers both state-of-the-art (SOTA) detection accuracy and fast inference speed. More specifically, given the 3D point cloud and after voxelization, FALO first arranges sparse 3D voxels into a 1D sequence based on their coordinates and proximity. The sequence is then processed by our proposed ConvDotMix blocks, consisting of large-kernel convolutions, Hadamard products, and linear layers. ConvDotMix provides sufficient mixing capability in both spatial and embedding dimensions, and introduces higher-order nonlinear interaction among spatial features. Furthermore, when going through the ConvDotMix layers, we introduce implicit grouping, which balances the tensor dimensions for more efficient inference and takes into account the growing receptive field. All these operations are friendly to run on resource-constrained platforms and proposed FALO can readily deploy on compact, embedded devices. Our extensive evaluation on LiDAR 3D detection benchmarks such as nuScenes and Waymo shows that FALO achieves competitive performance. Meanwhile, FALO is 1.6~9.8x faster than the latest SOTA on mobile Graphics Processing Unit (GPU) and mobile Neural Processing Unit (NPU).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes