CVMay 15

MaTe: Images Are All You Need for Material Transfer via Diffusion Transformer

arXiv:2605.1566077.34 citations
AI Analysis

For researchers and practitioners in image editing and material transfer, MaTe simplifies the inference pipeline by removing the need for fine-tuning or auxiliary networks, achieving competitive results with reduced computational overhead.

MaTe introduces a zero-shot, training-free diffusion framework for material transfer that integrates input images at the token level via multi-modal attention, eliminating text dependency and extra networks. It outperforms state-of-the-art methods in visual quality and efficiency while preserving detail alignment.

Recent diffusion-based methods for material transfer rely on image fine-tuning or complex architectures with assistive networks, but face challenges including text dependency, extra computational costs, and feature misalignment. To address these limitations, we propose MaTe, a streamlined diffusion framework that eliminates textual guidance and reference networks. MaTe integrates input images at the token level, enabling unified processing via multi-modal attention in a shared latent space. This design removes the need for additional adapters, ControlNet, inversion sampling, or model fine-tuning. Extensive experiments demonstrate that MaTe achieves high-quality material generation under a zero-shot, training-free paradigm. It outperforms state-of-the-art methods in both visual quality and efficiency while preserving precise detail alignment, significantly simplifying inference prerequisites.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes