MatLat: Material Latent Space for PBR Texture Generation
This work addresses the challenge of PBR texture generation for 3D graphics, where large-scale datasets are scarce, by introducing a novel fine-tuning approach that enhances texture quality, making it significant for applications in computer graphics and virtual content creation.
The paper tackled the problem of generating high-quality PBR textures for 3D meshes by proposing a generative framework that fine-tunes a pretrained VAE to incorporate new material channels with minimal distribution shift and enforces locality in the latent-to-image mapping. The result is improved PBR texture fidelity, achieving state-of-the-art performance as demonstrated in ablation studies and comparisons with previous baselines.
We propose a generative framework for producing high-quality PBR textures on a given 3D mesh. As large-scale PBR texture datasets are scarce, our approach focuses on effectively leveraging the embedding space and diffusion priors of pretrained latent image generative models while learning a material latent space, MatLat, through targeted fine-tuning. Unlike prior methods that freeze the embedding network and thus lead to distribution shifts when encoding additional PBR channels and hinder subsequent diffusion training, we fine-tune the pretrained VAE so that new material channels can be incorporated with minimal latent distribution deviation. We further show that correspondence-aware attention alone is insufficient for cross-view consistency unless the latent-to-image mapping preserves locality. To enforce this locality, we introduce a regularization in the VAE fine-tuning that crops latent patches, decodes them, and aligns the corresponding image regions to maintain strong pixel-latent spatial correspondence. Ablation studies and comparison with previous baselines demonstrate that our framework improves PBR texture fidelity and that each component is critical for achieving state-of-the-art performance.