CVJul 31, 2019

ShapeCaptioner: Generative Caption Network for 3D Shapes by Learning a Mapping from Parts Detected in Multiple Views to Sentences

arXiv:1908.00120v150 citations
Originality Incremental advance
AI Analysis

This work improves 3D shape understanding for applications like robotics or design by providing more detailed captions, though it appears incremental as it builds on multi-view and sequence-to-sequence approaches.

The paper tackles the problem of 3D shape captioning by addressing the lack of detailed part-level descriptions in existing methods, proposing ShapeCaptioner which learns part detection from multiple views and generates captions, achieving outperforming results compared to previous work.

3D shape captioning is a challenging application in 3D shape understanding. Captions from recent multi-view based methods reveal that they cannot capture part-level characteristics of 3D shapes. This leads to a lack of detailed part-level description in captions, which human tend to focus on. To resolve this issue, we propose ShapeCaptioner, a generative caption network, to perform 3D shape captioning from semantic parts detected in multiple views. Our novelty lies in learning the knowledge of part detection in multiple views from 3D shape segmentations and transferring this knowledge to facilitate learning the mapping from 3D shapes to sentences. Specifically, ShapeCaptioner aggregates the parts detected in multiple colored views using our novel part class specific aggregation to represent a 3D shape, and then, employs a sequence to sequence model to generate the caption. Our outperforming results show that ShapeCaptioner can learn 3D shape features with more detailed part characteristics to facilitate better 3D shape captioning than previous work.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes