CVApr 13, 2023

ShapeClipper: Scalable 3D Shape Learning from Single-View Images via Geometric and CLIP-based Consistency

Zixuan Huang, Varun Jampani, Anh Thai, Yuanzhen Li, Stefan Stojanov, James M. Rehg

arXiv:2304.06247v113.127 citationsh-index: 82

Originality Incremental advance

AI Analysis

This addresses the problem of 3D shape learning for computer vision applications without needing costly annotations, though it is incremental as it builds on existing CLIP and geometric techniques.

ShapeClipper tackles 3D shape reconstruction from single-view RGB images by using CLIP-based shape consistency and geometric constraints, achieving superior performance over state-of-the-art methods on datasets like Pix3D, Pascal3D+, and OpenImages.

We present ShapeClipper, a novel method that reconstructs 3D object shapes from real-world single-view RGB images. Instead of relying on laborious 3D, multi-view or camera pose annotation, ShapeClipper learns shape reconstruction from a set of single-view segmented images. The key idea is to facilitate shape learning via CLIP-based shape consistency, where we encourage objects with similar CLIP encodings to share similar shapes. We also leverage off-the-shelf normals as an additional geometric constraint so the model can learn better bottom-up reasoning of detailed surface geometry. These two novel consistency constraints, when used to regularize our model, improve its ability to learn both global shape structure and local geometric details. We evaluate our method over three challenging real-world datasets, Pix3D, Pascal3D+, and OpenImages, where we achieve superior performance over state-of-the-art methods.

View on arXiv PDF

Similar