CV GRApr 21, 2022

Share With Thy Neighbors: Single-View Reconstruction by Cross-Instance Consistency

Tom Monnier, Matthew Fisher, Alexei A. Efros, Mathieu Aubry

Berkeley

arXiv:2204.10310v316.740 citationsh-index: 111Has Code

Originality Incremental advance

AI Analysis

This enables learning from unlabelled image collections for object categories, addressing a key bottleneck in 3D vision, though it builds on existing autoencoding and rendering techniques.

The paper tackles single-view 3D reconstruction without supervision like viewpoint annotations or silhouettes by leveraging consistency between images of different object instances, achieving competitive results on benchmarks like ShapeNet and Pascal3D+ Car.

Approaches for single-view reconstruction typically rely on viewpoint annotations, silhouettes, the absence of background, multiple views of the same instance, a template shape, or symmetry. We avoid all such supervision and assumptions by explicitly leveraging the consistency between images of different object instances. As a result, our method can learn from large collections of unlabelled images depicting the same object category. Our main contributions are two ways for leveraging cross-instance consistency: (i) progressive conditioning, a training strategy to gradually specialize the model from category to instances in a curriculum learning fashion; and (ii) neighbor reconstruction, a loss enforcing consistency between instances having similar shape or texture. Also critical to the success of our method are: our structured autoencoding architecture decomposing an image into explicit shape, texture, pose, and background; an adapted formulation of differential rendering; and a new optimization scheme alternating between 3D and pose learning. We compare our approach, UNICORN, both on the diverse synthetic ShapeNet dataset - the classical benchmark for methods requiring multiple views as supervision - and on standard real-image benchmarks (Pascal3D+ Car, CUB) for which most methods require known templates and silhouette annotations. We also showcase applicability to more challenging real-world collections (CompCars, LSUN), where silhouettes are not available and images are not cropped around the object.

View on arXiv PDF Code

Similar