Learning the 3D Fauna of the Web
This addresses the challenge of scaling 3D animal modeling for diverse species, though it is incremental in improving generalization over prior category-specific methods.
The paper tackles the problem of learning 3D models of animals from limited 2D internet images by developing 3D-Fauna, a pan-category deformable model for over 100 species, which reconstructs articulated 3D meshes from single images in seconds.
Learning 3D models of all animals on the Earth requires massively scaling up existing solutions. With this ultimate goal in mind, we develop 3D-Fauna, an approach that learns a pan-category deformable 3D animal model for more than 100 animal species jointly. One crucial bottleneck of modeling animals is the limited availability of training data, which we overcome by simply learning from 2D Internet images. We show that prior category-specific attempts fail to generalize to rare species with limited training images. We address this challenge by introducing the Semantic Bank of Skinned Models (SBSM), which automatically discovers a small set of base animal shapes by combining geometric inductive priors with semantic knowledge implicitly captured by an off-the-shelf self-supervised feature extractor. To train such a model, we also contribute a new large-scale dataset of diverse animal species. At inference time, given a single image of any quadruped animal, our model reconstructs an articulated 3D mesh in a feed-forward fashion within seconds.