CVGRApr 21, 2022

Share With Thy Neighbors: Single-View Reconstruction by Cross-Instance Consistency

Berkeley
arXiv:2204.10310v340 citationsh-index: 111
Originality Incremental advance
AI Analysis

This enables learning from unlabelled image collections for object categories, addressing a key bottleneck in 3D vision, though it builds on existing autoencoding and rendering techniques.

The paper tackles single-view 3D reconstruction without supervision like viewpoint annotations or silhouettes by leveraging consistency between images of different object instances, achieving competitive results on benchmarks like ShapeNet and Pascal3D+ Car.

Approaches for single-view reconstruction typically rely on viewpoint annotations, silhouettes, the absence of background, multiple views of the same instance, a template shape, or symmetry. We avoid all such supervision and assumptions by explicitly leveraging the consistency between images of different object instances. As a result, our method can learn from large collections of unlabelled images depicting the same object category. Our main contributions are two ways for leveraging cross-instance consistency: (i) progressive conditioning, a training strategy to gradually specialize the model from category to instances in a curriculum learning fashion; and (ii) neighbor reconstruction, a loss enforcing consistency between instances having similar shape or texture. Also critical to the success of our method are: our structured autoencoding architecture decomposing an image into explicit shape, texture, pose, and background; an adapted formulation of differential rendering; and a new optimization scheme alternating between 3D and pose learning. We compare our approach, UNICORN, both on the diverse synthetic ShapeNet dataset - the classical benchmark for methods requiring multiple views as supervision - and on standard real-image benchmarks (Pascal3D+ Car, CUB) for which most methods require known templates and silhouette annotations. We also showcase applicability to more challenging real-world collections (CompCars, LSUN), where silhouettes are not available and images are not cropped around the object.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes