Self-supervised Learning of 3D Objects from Natural Images
This addresses the challenge of 3D object reconstruction from limited natural image data, offering an incremental improvement by enabling training on diverse categories without full supervision.
The paper tackles the problem of single-view 3D reconstruction of objects from natural images in a self-supervised way, achieving the ability to train on datasets like CIFAR-10 and PASCAL objects, which suggests potential for broader application beyond synthetic data.
We present a method to learn single-view reconstruction of the 3D shape, pose, and texture of objects from categorized natural images in a self-supervised manner. Since this is a severely ill-posed problem, carefully designing a training method and introducing constraints are essential. To avoid the difficulty of training all elements at the same time, we propose training category-specific base shapes with fixed pose distribution and simple textures first, and subsequently training poses and textures using the obtained shapes. Another difficulty is that shapes and backgrounds sometimes become excessively complicated to mistakenly reconstruct textures on object surfaces. To suppress it, we propose using strong regularization and constraints on object surfaces and background images. With these two techniques, we demonstrate that we can use natural image collections such as CIFAR-10 and PASCAL objects for training, which indicates the possibility to realize 3D object reconstruction on diverse object categories beyond synthetic datasets.