CVSep 16, 2025

Few to Big: Prototype Expansion Network via Diffusion Learner for Point Cloud Few-shot Semantic Segmentation

Qianguang Zhao, Dongli Wang, Yan Zhou, Jianxun Li, Richard Irampa

arXiv:2509.12878v13.6h-index: 14

Originality Highly original

AI Analysis

This work addresses the challenge of segmenting novel categories in 3D point clouds with minimal labeled data, which is crucial for applications like robotics and autonomous driving, representing an incremental advance over existing prototype-based methods.

The paper tackles the problem of few-shot 3D point cloud semantic segmentation by addressing intra-class diversity and inter-set inconsistency in prototype-based methods, introducing a Prototype Expansion Network (PENet) that uses a diffusion learner to enhance prototypes, achieving significant performance improvements over state-of-the-art methods on S3DIS and ScanNet datasets.

Few-shot 3D point cloud semantic segmentation aims to segment novel categories using a minimal number of annotated support samples. While existing prototype-based methods have shown promise, they are constrained by two critical challenges: (1) Intra-class Diversity, where a prototype's limited representational capacity fails to cover a class's full variations, and (2) Inter-set Inconsistency, where prototypes derived from the support set are misaligned with the query feature space. Motivated by the powerful generative capability of diffusion model, we re-purpose its pre-trained conditional encoder to provide a novel source of generalizable features for expanding the prototype's representational range. Under this setup, we introduce the Prototype Expansion Network (PENet), a framework that constructs big-capacity prototypes from two complementary feature sources. PENet employs a dual-stream learner architecture: it retains a conventional fully supervised Intrinsic Learner (IL) to distill representative features, while introducing a novel Diffusion Learner (DL) to provide rich generalizable features. The resulting dual prototypes are then processed by a Prototype Assimilation Module (PAM), which adopts a novel push-pull cross-guidance attention block to iteratively align the prototypes with the query space. Furthermore, a Prototype Calibration Mechanism (PCM) regularizes the final big capacity prototype to prevent semantic drift. Extensive experiments on the S3DIS and ScanNet datasets demonstrate that PENet significantly outperforms state-of-the-art methods across various few-shot settings.

View on arXiv PDF

Similar