Canonical Capsules: Self-Supervised Capsules in Canonical Pose
This work addresses the problem of learning robust, object-centric representations for 3D point clouds in a self-supervised manner, which is significant for researchers and practitioners working with 3D data.
This paper introduces a self-supervised capsule architecture for 3D point clouds that learns object decompositions and canonicalization by training with randomly rotated object pairs. The method achieves state-of-the-art performance in 3D point cloud reconstruction, canonicalization, and unsupervised classification without requiring classification labels or manually-aligned datasets.
We propose a self-supervised capsule architecture for 3D point clouds. We compute capsule decompositions of objects through permutation-equivariant attention, and self-supervise the process by training with pairs of randomly rotated objects. Our key idea is to aggregate the attention masks into semantic keypoints, and use these to supervise a decomposition that satisfies the capsule invariance/equivariance properties. This not only enables the training of a semantically consistent decomposition, but also allows us to learn a canonicalization operation that enables object-centric reasoning. To train our neural network we require neither classification labels nor manually-aligned training datasets. Yet, by learning an object-centric representation in a self-supervised manner, our method outperforms the state-of-the-art on 3D point cloud reconstruction, canonicalization, and unsupervised classification.