TextureDiffusion: Target Prompt Disentangled Editing for Various Texture Transfer
This addresses a limitation in image editing for users needing to apply diverse textures, though it is incremental as it builds on existing diffusion-based methods.
The paper tackles the problem of transferring complex textures like cloud or fire in text-guided image editing, which existing methods struggle with, and achieves harmonious texture transfer with excellent structure and background preservation.
Recently, text-guided image editing has achieved significant success. However, existing methods can only apply simple textures like wood or gold when changing the texture of an object. Complex textures such as cloud or fire pose a challenge. This limitation stems from that the target prompt needs to contain both the input image content and <texture>, restricting the texture representation. In this paper, we propose TextureDiffusion, a tuning-free image editing method applied to various texture transfer. Initially, the target prompt is directly set to "<texture>", making the texture disentangled from the input image content to enhance texture representation. Subsequently, query features in self-attention and features in residual blocks are utilized to preserve the structure of the input image. Finally, to maintain the background, we introduce an edit localization technique which blends the self-attention results and the intermediate latents. Comprehensive experiments demonstrate that TextureDiffusion can harmoniously transfer various textures with excellent structure and background preservation. Code is publicly available at https://github.com/THU-CVML/TextureDiffusion