CVAIJun 30, 2025

Imagine for Me: Creative Conceptual Blending of Real Images and Text via Blended Attention

arXiv:2506.24085v2h-index: 7
Originality Incremental advance
AI Analysis

This addresses the challenge of overcoming cognitive biases like design fixation in creative tasks for designers or artists, though it appears incremental as an adaptation of existing diffusion models.

The paper tackles the problem of automating cross-modal conceptual blending of real images and text to enhance human creativity, proposing IT-Blender, a T2I diffusion adapter that outperforms baselines by a large margin in blending visual and textual concepts.

Blending visual and textual concepts into a new visual concept is a unique and powerful trait of human beings that can fuel creativity. However, in practice, cross-modal conceptual blending for humans is prone to cognitive biases, like design fixation, which leads to local minima in the design space. In this paper, we propose a T2I diffusion adapter "IT-Blender" that can automate the blending process to enhance human creativity. Prior works related to cross-modal conceptual blending are limited in encoding a real image without loss of details or in disentangling the image and text inputs. To address these gaps, IT-Blender leverages pretrained diffusion models (SD and FLUX) to blend the latent representations of a clean reference image with those of the noisy generated image. Combined with our novel blended attention, IT-Blender encodes the real reference image without loss of details and blends the visual concept with the object specified by the text in a disentangled way. Our experiment results show that IT-Blender outperforms the baselines by a large margin in blending visual and textual concepts, shedding light on the new application of image generative models to augment human creativity.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes