PolyGen: An Autoregressive Generative Model of 3D Meshes
This addresses a core problem in computer graphics, robotics, and games development by enabling direct mesh generation, though it is incremental as it builds on existing Transformer architectures.
The paper tackles the challenge of directly generating 3D polygon meshes, which are efficient for geometry but difficult for learning-based methods, by proposing an autoregressive Transformer model that predicts vertices and faces sequentially, achieving high-quality meshes and establishing log-likelihood benchmarks.
Polygon meshes are an efficient representation of 3D geometry, and are of central importance in computer graphics, robotics and games development. Existing learning-based approaches have avoided the challenges of working with 3D meshes, instead using alternative object representations that are more compatible with neural architectures and training approaches. We present an approach which models the mesh directly, predicting mesh vertices and faces sequentially using a Transformer-based architecture. Our model can condition on a range of inputs, including object classes, voxels, and images, and because the model is probabilistic it can produce samples that capture uncertainty in ambiguous scenarios. We show that the model is capable of producing high-quality, usable meshes, and establish log-likelihood benchmarks for the mesh-modelling task. We also evaluate the conditional models on surface reconstruction metrics against alternative methods, and demonstrate competitive performance despite not training directly on this task.