Shape-Pose Disentanglement using SE(3)-equivariant Vector Neurons
This addresses the challenge of robust 3D object understanding for computer vision applications, representing an incremental improvement through novel architectural extensions.
The paper tackles the problem of disentangling shape and pose in point cloud representations by developing an unsupervised auto-encoder that produces pose-invariant shape encodings and semantically aligns objects to a common canonical pose. The method achieves superior stability and consistency in quantitative and qualitative experiments.
We introduce an unsupervised technique for encoding point clouds into a canonical shape representation, by disentangling shape and pose. Our encoder is stable and consistent, meaning that the shape encoding is purely pose-invariant, while the extracted rotation and translation are able to semantically align different input shapes of the same class to a common canonical pose. Specifically, we design an auto-encoder based on Vector Neuron Networks, a rotation-equivariant neural network, whose layers we extend to provide translation-equivariance in addition to rotation-equivariance only. The resulting encoder produces pose-invariant shape encoding by construction, enabling our approach to focus on learning a consistent canonical pose for a class of objects. Quantitative and qualitative experiments validate the superior stability and consistency of our approach.