Escaping Plato's Cave: 3D Shape From Adversarial Rendering
This addresses the challenge of 3D shape reconstruction for computer vision researchers by enabling use of unstructured 2D data instead of curated 3D datasets, though it is incremental as it builds on adversarial and differentiable rendering techniques.
The paper tackles the problem of reconstructing 3D shapes from unstructured 2D image collections without known camera poses, using PlatonicGAN to generate 3D shapes that render to images indistinguishable from real ones, achieving consistent improvements over baselines and sometimes surpassing 3D-supervised methods.
We introduce PlatonicGAN to discover the 3D structure of an object class from an unstructured collection of 2D images, i.e., where no relation between photos is known, except that they are showing instances of the same category. The key idea is to train a deep neural network to generate 3D shapes which, when rendered to images, are indistinguishable from ground truth images (for a discriminator) under various camera poses. Discriminating 2D images instead of 3D shapes allows tapping into unstructured 2D photo collections instead of relying on curated (e.g., aligned, annotated, etc.) 3D data sets. To establish constraints between 2D image observation and their 3D interpretation, we suggest a family of rendering layers that are effectively differentiable. This family includes visual hull, absorption-only (akin to x-ray), and emission-absorption. We can successfully reconstruct 3D shapes from unstructured 2D images and extensively evaluate PlatonicGAN on a range of synthetic and real data sets achieving consistent improvements over baseline methods. We further show that PlatonicGAN can be combined with 3D supervision to improve on and in some cases even surpass the quality of 3D-supervised methods.