OmniShape: Zero-Shot Multi-Hypothesis Shape and Pose Estimation in the Real World
This addresses the challenge of 3D reconstruction in robotics and AR/VR, though it appears incremental as it builds on existing diffusion and neural field methods.
The authors tackled the problem of estimating an object's pose and full shape from a single observation without prior 3D models or category knowledge, achieving compelling performance on real-world datasets.
We would like to estimate the pose and full shape of an object from a single observation, without assuming known 3D model or category. In this work, we propose OmniShape, the first method of its kind to enable probabilistic pose and shape estimation. OmniShape is based on the key insight that shape completion can be decoupled into two multi-modal distributions: one capturing how measurements project into a normalized object reference frame defined by the dataset and the other modelling a prior over object geometries represented as triplanar neural fields. By training separate conditional diffusion models for these two distributions, we enable sampling multiple hypotheses from the joint pose and shape distribution. OmniShape demonstrates compelling performance on challenging real world datasets. Project website: https://tri-ml.github.io/omnishape