CVJul 18, 2024

GPSFormer: A Global Perception and Local Structure Fitting-based Transformer for Point Cloud Understanding

arXiv:2407.13519v245 citationsh-index: 35Has Code
Originality Incremental advance
AI Analysis

This work addresses a fundamental problem in 3D computer vision for applications like autonomous driving and robotics, though it is incremental as it builds on existing transformer and convolution methods.

The paper tackles the challenge of capturing intricate shape information from irregular point clouds without external data by proposing GPSFormer, a transformer-based model that integrates global perception and local structure fitting, achieving state-of-the-art results in shape classification, part segmentation, and few-shot learning tasks.

Despite the significant advancements in pre-training methods for point cloud understanding, directly capturing intricate shape information from irregular point clouds without reliance on external data remains a formidable challenge. To address this problem, we propose GPSFormer, an innovative Global Perception and Local Structure Fitting-based Transformer, which learns detailed shape information from point clouds with remarkable precision. The core of GPSFormer is the Global Perception Module (GPM) and the Local Structure Fitting Convolution (LSFConv). Specifically, GPM utilizes Adaptive Deformable Graph Convolution (ADGConv) to identify short-range dependencies among similar features in the feature space and employs Multi-Head Attention (MHA) to learn long-range dependencies across all positions within the feature space, ultimately enabling flexible learning of contextual representations. Inspired by Taylor series, we design LSFConv, which learns both low-order fundamental and high-order refinement information from explicitly encoded local geometric structures. Integrating the GPM and LSFConv as fundamental components, we construct GPSFormer, a cutting-edge Transformer that effectively captures global and local structures of point clouds. Extensive experiments validate GPSFormer's effectiveness in three point cloud tasks: shape classification, part segmentation, and few-shot learning. The code of GPSFormer is available at \url{https://github.com/changshuowang/GPSFormer}.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes