CVFeb 9

Inspiration Seeds: Learning Non-Literal Visual Combinations for Generative Exploration

arXiv:2602.08615v1h-index: 12
AI Analysis

This addresses the need for visual ideation tools in creative work, offering a novel approach for designers to explore emergent connections, though it is incremental in shifting from execution to exploration.

The paper tackles the problem of generative models being limited to executing textual prompts rather than supporting open-ended visual exploration, by proposing Inspiration Seeds, a feed-forward framework that generates diverse, coherent compositions from two input images without text prompts, revealing latent relationships.

While generative models have become powerful tools for image synthesis, they are typically optimized for executing carefully crafted textual prompts, offering limited support for the open-ended visual exploration that often precedes idea formation. In contrast, designers frequently draw inspiration from loosely connected visual references, seeking emergent connections that spark new ideas. We propose Inspiration Seeds, a generative framework that shifts image generation from final execution to exploratory ideation. Given two input images, our model produces diverse, visually coherent compositions that reveal latent relationships between inputs, without relying on user-specified text prompts. Our approach is feed-forward, trained on synthetic triplets of decomposed visual aspects derived entirely through visual means: we use CLIP Sparse Autoencoders to extract editing directions in CLIP latent space and isolate concept pairs. By removing the reliance on language and enabling fast, intuitive recombination, our method supports visual ideation at the early and ambiguous stages of creative work.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes