CVMay 23, 2025

Proto-FG3D: Prototype-based Interpretable Fine-Grained 3D Shape Classification

arXiv:2505.17666v1h-index: 12
Originality Highly original
AI Analysis

It addresses the problem of detailed shape understanding for 3D vision applications, offering an incremental improvement through a novel method for known bottlenecks in feature aggregation and interpretability.

The paper tackles fine-grained 3D shape classification by proposing Proto-FG3D, a prototype-based framework that shifts from parametric to non-parametric learning, achieving state-of-the-art accuracy and interpretability on benchmarks like FG3D and ModelNet40.

Deep learning-based multi-view coarse-grained 3D shape classification has achieved remarkable success over the past decade, leveraging the powerful feature learning capabilities of CNN-based and ViT-based backbones. However, as a challenging research area critical for detailed shape understanding, fine-grained 3D classification remains understudied due to the limited discriminative information captured during multi-view feature aggregation, particularly for subtle inter-class variations, class imbalance, and inherent interpretability limitations of parametric model. To address these problems, we propose the first prototype-based framework named Proto-FG3D for fine-grained 3D shape classification, achieving a paradigm shift from parametric softmax to non-parametric prototype learning. Firstly, Proto-FG3D establishes joint multi-view and multi-category representation learning via Prototype Association. Secondly, prototypes are refined via Online Clustering, improving both the robustness of multi-view feature allocation and inter-subclass balance. Finally, prototype-guided supervised learning is established to enhance fine-grained discrimination via prototype-view correlation analysis and enables ad-hoc interpretability through transparent case-based reasoning. Experiments on FG3D and ModelNet40 show Proto-FG3D surpasses state-of-the-art methods in accuracy, transparent predictions, and ad-hoc interpretability with visualizations, challenging conventional fine-grained 3D recognition approaches.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes