CVDec 10, 2019

Context-Aware Dynamic Feature Extraction for 3D Object Detection in Point Clouds

Yonglin Tian, Lichao Huang, Xuesong Li, Kunfeng Wang, Zilei Wang, Fei-Yue Wang

arXiv:1912.04775v31.81 citations

Originality Incremental advance

AI Analysis

This work addresses a key challenge in 3D object detection for autonomous driving by improving detection accuracy and efficiency, though it appears incremental as it builds on existing methods like voxelization and dynamic convolution.

The paper tackles the problem of varying point cloud density in 3D object detection by proposing a context-aware dynamic network (CADNet) that captures density variance through point and semantic contexts, achieving improved precision and speed, with results showing it outperforms SECOND and PointPillars by a large margin and runs at 30 FPS on the KITTI dataset.

Varying density of point clouds increases the difficulty of 3D detection. In this paper, we present a context-aware dynamic network (CADNet) to capture the variance of density by considering both point context and semantic context. Point-level contexts are generated from original point clouds to enlarge the effective receptive filed. They are extracted around the voxelized pillars based on our extended voxelization method and processed with the context encoder in parallel with the pillar features. With a large perception range, we are able to capture the variance of features for potential objects and generate attentive spatial guidance to help adjust the strengths for different regions. In the region proposal network, considering the limited representation ability of traditional convolution where same kernels are shared among different samples and positions, we propose a decomposable dynamic convolutional layer to adapt to the variance of input features by learning from local semantic context. It adaptively generates the position-dependent coefficients for multiple fixed kernels and combines them to convolve with local feature windows. Based on our dynamic convolution, we design a dual-path convolution block to further improve the representation ability. We conduct experiments with our Network on KITTI dataset and achieve good performance on 3D detection task for both precision and speed. Our one-stage detector outperforms SECOND and PointPillars by a large margin and achieves the speed of 30 FPS.

View on arXiv PDF

Similar