CVMar 20, 2024

T-Pixel2Mesh: Combining Global and Local Transformer for 3D Mesh Generation from a Single Image

Shijie Zhang, Boyan Jiang, Keke He, Junwei Zhu, Ying Tai, Chengjie Wang, Yinda Zhang, Yanwei Fu

Peking U

arXiv:2403.13663v12.013 citationsh-index: 19ICASSP

Originality Incremental advance

AI Analysis

This work addresses the domain gap and detail loss in 3D mesh generation from images, offering incremental improvements for computer vision applications.

The paper tackled the problem of generating overly smooth 3D meshes and poor generalization to real-world images in single-view 3D reconstruction by proposing T-Pixel2Mesh, which improved performance on ShapeNet and enhanced real-world reconstruction.

Pixel2Mesh (P2M) is a classical approach for reconstructing 3D shapes from a single color image through coarse-to-fine mesh deformation. Although P2M is capable of generating plausible global shapes, its Graph Convolution Network (GCN) often produces overly smooth results, causing the loss of fine-grained geometry details. Moreover, P2M generates non-credible features for occluded regions and struggles with the domain gap from synthetic data to real-world images, which is a common challenge for single-view 3D reconstruction methods. To address these challenges, we propose a novel Transformer-boosted architecture, named T-Pixel2Mesh, inspired by the coarse-to-fine approach of P2M. Specifically, we use a global Transformer to control the holistic shape and a local Transformer to progressively refine the local geometry details with graph-based point upsampling. To enhance real-world reconstruction, we present the simple yet effective Linear Scale Search (LSS), which serves as prompt tuning during the input preprocessing. Our experiments on ShapeNet demonstrate state-of-the-art performance, while results on real-world data show the generalization capability.

View on arXiv PDF

Similar