Common Pets in 3D: Dynamic New-View Synthesis of Real-Life Deformable Categories
This work addresses the challenge of reconstructing deformable objects like pets from limited video data, which is incremental by extending prior rigid object methods to dynamic cases.
The paper tackles the problem of photorealistic 3D reconstruction of dynamic objects from sparse views by introducing CoP3D, a dataset of around 4,200 distinct pets, and Tracker-NeRF, a method for 4D reconstruction that achieves significantly better non-rigid new-view synthesis performance than existing baselines.
Obtaining photorealistic reconstructions of objects from sparse views is inherently ambiguous and can only be achieved by learning suitable reconstruction priors. Earlier works on sparse rigid object reconstruction successfully learned such priors from large datasets such as CO3D. In this paper, we extend this approach to dynamic objects. We use cats and dogs as a representative example and introduce Common Pets in 3D (CoP3D), a collection of crowd-sourced videos showing around 4,200 distinct pets. CoP3D is one of the first large-scale datasets for benchmarking non-rigid 3D reconstruction "in the wild". We also propose Tracker-NeRF, a method for learning 4D reconstruction from our dataset. At test time, given a small number of video frames of an unseen object, Tracker-NeRF predicts the trajectories of its 3D points and generates new views, interpolating viewpoint and time. Results on CoP3D reveal significantly better non-rigid new-view synthesis performance than existing baselines.