CV AIMar 11, 2024

Fine-Grained Pillar Feature Encoding Via Spatio-Temporal Virtual Grid for 3D Object Detection

Konyul Park, Yecheol Kim, Junho Koh, Byungwoo Park, Jun Won Choi

arXiv:2403.06433v12.04 citationsh-index: 8Has CodeICRA

Originality Incremental advance

AI Analysis

This work addresses a bottleneck in real-time LiDAR-based detection for autonomous vehicles, offering an incremental improvement to pillar encoding methods.

The paper tackles the problem of underperformance in pillar-based 3D object detection for autonomous vehicles by proposing Fine-Grained Pillar Feature Encoding (FG-PFE), which uses spatio-temporal virtual grids to capture point distributions, resulting in significant performance improvements on the nuScenes dataset over baselines like PointPillar with minor computational overhead.

Developing high-performance, real-time architectures for LiDAR-based 3D object detectors is essential for the successful commercialization of autonomous vehicles. Pillar-based methods stand out as a practical choice for onboard deployment due to their computational efficiency. However, despite their efficiency, these methods can sometimes underperform compared to alternative point encoding techniques such as Voxel-encoding or PointNet++. We argue that current pillar-based methods have not sufficiently captured the fine-grained distributions of LiDAR points within each pillar structure. Consequently, there exists considerable room for improvement in pillar feature encoding. In this paper, we introduce a novel pillar encoding architecture referred to as Fine-Grained Pillar Feature Encoding (FG-PFE). FG-PFE utilizes Spatio-Temporal Virtual (STV) grids to capture the distribution of point clouds within each pillar across vertical, temporal, and horizontal dimensions. Through STV grids, points within each pillar are individually encoded using Vertical PFE (V-PFE), Temporal PFE (T-PFE), and Horizontal PFE (H-PFE). These encoded features are then aggregated through an Attentive Pillar Aggregation method. Our experiments conducted on the nuScenes dataset demonstrate that FG-PFE achieves significant performance improvements over baseline models such as PointPillar, CenterPoint-Pillar, and PillarNet, with only a minor increase in computational overhead.

View on arXiv PDF Code

Similar