DiffSurf: A Transformer-based Diffusion Model for Generating and Reconstructing 3D Surfaces in Pose
This work addresses the challenge of versatile 3D surface generation and reconstruction for applications like computer graphics and human modeling, representing an incremental advancement by adapting diffusion models to 3D tasks.
The paper tackles the problem of generating and reconstructing 3D surfaces in various poses and shapes, such as human bodies and objects, using a transformer-based diffusion model called DiffSurf, which achieves greater diversity and higher quality in shape generation and comparable accuracy in 3D human mesh recovery at near real-time rates.
This paper presents DiffSurf, a transformer-based denoising diffusion model for generating and reconstructing 3D surfaces. Specifically, we design a diffusion transformer architecture that predicts noise from noisy 3D surface vertices and normals. With this architecture, DiffSurf is able to generate 3D surfaces in various poses and shapes, such as human bodies, hands, animals and man-made objects. Further, DiffSurf is versatile in that it can address various 3D downstream tasks including morphing, body shape variation and 3D human mesh fitting to 2D keypoints. Experimental results on 3D human model benchmarks demonstrate that DiffSurf can generate shapes with greater diversity and higher quality than previous generative models. Furthermore, when applied to the task of single-image 3D human mesh recovery, DiffSurf achieves accuracy comparable to prior techniques at a near real-time rate.