Deformable Filter Convolution for Point Cloud Reasoning
This addresses the loss of geometric details in 3D perception models for applications like autonomous driving and robotics, representing a novel method rather than an incremental improvement.
The paper tackles the problem of preserving local geometric details in 3D point cloud processing by proposing a novel learnable convolution layer that deforms filters to match point cloud shapes, achieving state-of-the-art results on LiDAR semantic segmentation and significant gains in LiDAR object detection.
Point clouds are the native output of many real-world 3D sensors. To borrow the success of 2D convolutional network architectures, a majority of popular 3D perception models voxelize the points, which can result in a loss of local geometric details that cannot be recovered. In this paper, we propose a novel learnable convolution layer for processing 3D point cloud data directly. Instead of discretizing points into fixed voxels, we deform our learnable 3D filters to match with the point cloud shape. We propose to combine voxelized backbone networks with our deformable filter layer at 1) the network input stream and 2) the output prediction layers to enhance point level reasoning. We obtain state-of-the-art results on LiDAR semantic segmentation and producing a significant gain in performance on LiDAR object detection.