DealMaTe: Multi-Dimensional Material Transfer via Diffusion Transformer
This work simplifies material transfer for computer vision and graphics applications by eliminating text dependency and auxiliary networks, though it is incremental over existing diffusion-based methods.
DealMaTe proposes a diffusion-based material transfer method that uses depth, normal, and lighting images without text guidance or reference networks, achieving high-fidelity results with low architectural complexity and reduced inference latency.
Recently, diffusion-based material transfer methods rely on image fine-tuning or complex architectures with auxiliary networks but face challenges such as text dependency, additional computational costs, and feature misalignment. To address these limitations, we propose \textbf{DealMaTe}, using \underline{\textbf{de}}pth, norm\underline{\textbf{a}}l, and \underline{\textbf{l}}ighting images for \underline{\textbf{ma}}terial \underline{\textbf{t}}ransf\underline{\textbf{e}}r. DealMaTe is a simplified diffusion framework that eliminates text guidance and reference networks. We design a lightweight 3D information injection method, Multi-Dim 3D Shader LoRA, which, without modifying the base model weights, enables compatible control conditions and achieves harmonious and stable results. Additionally, we optimize the attention mechanism with Shader Causal Mutual Attention and key-value (KV) caching to reduce inference latency caused by multiple conditions, improve computational efficiency, and achieve high-quality material transfer results with low architectural complexity. Extensive experiments covering a wide variety of objects and lighting conditions consistently demonstrate that DealMaTe achieves remarkable high-fidelity material transfer under arbitrary input materials. The code is available at https://github.com/haha-lisa/DealMaTe.