Learning Canonical View Representation for 3D Shape Recognition with Arbitrary Views
This addresses a challenging and realistic setting for view-based 3D shape recognition, which is incremental as it builds on existing view-based methods by handling arbitrary views more effectively.
The paper tackles the problem of recognizing 3D shapes from arbitrary viewpoints by proposing a canonical view representation that transforms arbitrary view features into a fixed set using optimal transport and learnable references, achieving competitive results on standard datasets and significantly outperforming methods in arbitrary view settings.
In this paper, we focus on recognizing 3D shapes from arbitrary views, i.e., arbitrary numbers and positions of viewpoints. It is a challenging and realistic setting for view-based 3D shape recognition. We propose a canonical view representation to tackle this challenge. We first transform the original features of arbitrary views to a fixed number of view features, dubbed canonical view representation, by aligning the arbitrary view features to a set of learnable reference view features using optimal transport. In this way, each 3D shape with arbitrary views is represented by a fixed number of canonical view features, which are further aggregated to generate a rich and robust 3D shape representation for shape recognition. We also propose a canonical view feature separation constraint to enforce that the view features in canonical view representation can be embedded into scattered points in a Euclidean space. Experiments on the ModelNet40, ScanObjectNN, and RGBD datasets show that our method achieves competitive results under the fixed viewpoint settings, and significantly outperforms the applicable methods under the arbitrary view setting.