Exploiting Inductive Bias in Transformer for Point Cloud Classification and Segmentation
This work addresses a key problem in 3D computer vision for applications like autonomous driving or robotics, but it appears incremental as it builds on existing Transformer approaches by adding local feature integration.
The paper tackles the challenge of efficiently extracting high-dimensional features from point clouds by designing an Inductive Bias-aided Transformer (IBT) method that integrates both local and global attentions, demonstrating superiority in classification and segmentation tasks.
Discovering inter-point connection for efficient high-dimensional feature extraction from point coordinate is a key challenge in processing point cloud. Most existing methods focus on designing efficient local feature extractors while ignoring global connection, or vice versa. In this paper, we design a new Inductive Bias-aided Transformer (IBT) method to learn 3D inter-point relations, which considers both local and global attentions. Specifically, considering local spatial coherence, local feature learning is performed through Relative Position Encoding and Attentive Feature Pooling. We incorporate the learned locality into the Transformer module. The local feature affects value component in Transformer to modulate the relationship between channels of each point, which can enhance self-attention mechanism with locality based channel interaction. We demonstrate its superiority experimentally on classification and segmentation tasks. The code is available at: https://github.com/jiamang/IBT