CVAIDec 5, 2022

3D-LatentMapper: View Agnostic Single-View Reconstruction of 3D Shapes

arXiv:2212.02184v11 citationsh-index: 11
Originality Highly original
AI Analysis

This work addresses the need for robust 3D reconstruction from single images in fields like computer graphics and robotics, offering an incremental improvement by enabling view-agnostic handling of occlusions.

The paper tackles the problem of single-view 3D shape reconstruction, which is challenging due to occlusions, by proposing a framework that uses Vision Transformer and CLIP features to map to a 3D generative model, achieving view-agnostic reconstruction with demonstrated effectiveness on ShapeNetV2 compared to state-of-the-art methods.

Computer graphics, 3D computer vision and robotics communities have produced multiple approaches to represent and generate 3D shapes, as well as a vast number of use cases. However, single-view reconstruction remains a challenging topic that can unlock various interesting use cases such as interactive design. In this work, we propose a novel framework that leverages the intermediate latent spaces of Vision Transformer (ViT) and a joint image-text representational model, CLIP, for fast and efficient Single View Reconstruction (SVR). More specifically, we propose a novel mapping network architecture that learns a mapping between deep features extracted from ViT and CLIP, and the latent space of a base 3D generative model. Unlike previous work, our method enables view-agnostic reconstruction of 3D shapes, even in the presence of large occlusions. We use the ShapeNetV2 dataset and perform extensive experiments with comparisons to SOTA methods to demonstrate our method's effectiveness.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes