Object-Centric Multi-View Aggregation
This addresses the challenge of 3D inference from limited views for applications in computer vision and graphics, though it appears incremental relative to existing aggregation methods.
The paper tackles the problem of aggregating sparse object views into a 3D representation without camera pose estimation, using an object-centric canonical coordinate system with symmetry-aware mapping. The result is improved propagation to unseen regions and robust handling of pose ambiguities, enabling tasks like volumetric reconstruction and novel view synthesis.
We present an approach for aggregating a sparse set of views of an object in order to compute a semi-implicit 3D representation in the form of a volumetric feature grid. Key to our approach is an object-centric canonical 3D coordinate system into which views can be lifted, without explicit camera pose estimation, and then combined -- in a manner that can accommodate a variable number of views and is view order independent. We show that computing a symmetry-aware mapping from pixels to the canonical coordinate system allows us to better propagate information to unseen regions, as well as to robustly overcome pose ambiguities during inference. Our aggregate representation enables us to perform 3D inference tasks like volumetric reconstruction and novel view synthesis, and we use these tasks to demonstrate the benefits of our aggregation approach as compared to implicit or camera-centric alternatives.