M^3ashy: Multi-Modal Material Synthesis via Hyperdiffusion
This work addresses the underexplored problem of generating measured BRDFs for realistic material synthesis, offering a novel approach with multi-modal control for graphics and rendering applications.
M^3ashy introduces a hyperdiffusion-based framework for synthesizing real-world measured BRDFs, achieving high-quality reconstruction and flexible multi-modal conditioning (material type, text, images). It outperforms prior methods in metrics like FID and LPIPS, and contributes new datasets and evaluation metrics.
High-quality material synthesis is essential for replicating complex surface properties to create realistic scenes. Despite advances in the generation of material appearance based on analytic models, the synthesis of real-world measured BRDFs remains largely unexplored. To address this challenge, we propose M^3ashy, a novel multi-modal material synthesis framework based on hyperdiffusion. M^3ashy enables high-quality reconstruction of complex real-world materials by leveraging neural fields as a compact continuous representation of BRDFs. Furthermore, our multi-modal conditional hyperdiffusion model allows for flexible material synthesis conditioned on material type, natural language descriptions, or reference images, providing greater user control over material generation. To support future research, we contribute two new material datasets and introduce two BRDF distributional metrics for more rigorous evaluation. We demonstrate the effectiveness of Mashy through extensive experiments, including a novel statistics-based constrained synthesis, which enables the generation of materials of desired categories.