ConTEXTure: Consistent Multiview Images to Texture
This addresses the issue of viewpoint inconsistencies in 3D texture generation for applications like computer graphics and virtual reality, representing an incremental improvement over prior methods.
The paper tackles the problem of generating consistent texture maps for 3D meshes from multiview images by introducing ConTEXTure, which uses Zero123++ to produce view-consistent images for six viewpoints simultaneously, resulting in rendered images free from viewpoint irregularities.
We introduce ConTEXTure, a generative network designed to create a texture map/atlas for a given 3D mesh using images from multiple viewpoints. The process begins with generating a front-view image from a text prompt, such as 'Napoleon, front view', describing the 3D mesh. Additional images from different viewpoints are derived from this front-view image and camera poses relative to it. ConTEXTure builds upon the TEXTure network, which uses text prompts for six viewpoints (e.g., 'Napoleon, front view', 'Napoleon, left view', etc.). However, TEXTure often generates images for non-front viewpoints that do not accurately represent those viewpoints.To address this issue, we employ Zero123++, which generates multiple view-consistent images for the six specified viewpoints simultaneously, conditioned on the initial front-view image and the depth maps of the mesh for the six viewpoints. By utilizing these view-consistent images, ConTEXTure learns the texture atlas from all viewpoint images concurrently, unlike previous methods that do so sequentially. This approach ensures that the rendered images from various viewpoints, including back, side, bottom, and top, are free from viewpoint irregularities.