Few-Shot 3D Point Cloud Semantic Segmentation via Stratified Class-Specific Attention Based Transformer Network
This work addresses the challenge of segmenting new categories in 3D point clouds with limited annotated data, which is important for applications like scene reconstruction and understanding, though it is incremental as it builds on prior few-shot methods.
The paper tackles the problem of few-shot 3D point cloud semantic segmentation by developing a multi-layer transformer network that aggregates query features based on class-specific support features at different scales, achieving new state-of-the-art performance with 15% less inference time on S3DIS and ScanNet datasets.
3D point cloud semantic segmentation aims to group all points into different semantic categories, which benefits important applications such as point cloud scene reconstruction and understanding. Existing supervised point cloud semantic segmentation methods usually require large-scale annotated point clouds for training and cannot handle new categories. While a few-shot learning method was proposed recently to address these two problems, it suffers from high computational complexity caused by graph construction and inability to learn fine-grained relationships among points due to the use of pooling operations. In this paper, we further address these problems by developing a new multi-layer transformer network for few-shot point cloud semantic segmentation. In the proposed network, the query point cloud features are aggregated based on the class-specific support features in different scales. Without using pooling operations, our method makes full use of all pixel-level features from the support samples. By better leveraging the support features for few-shot learning, the proposed method achieves the new state-of-the-art performance, with 15\% less inference time, over existing few-shot 3D point cloud segmentation models on the S3DIS dataset and the ScanNet dataset.