CVOct 3, 2023

TP2O: Creative Text Pair-to-Object Generation using Balance Swap-Sampling

arXiv:2310.01819v418 citationsh-index: 13
Originality Incremental advance
AI Analysis

This addresses the problem of limited creativity in text-to-image synthesis for applications requiring novel object combinations, though it is incremental as it builds on existing diffusion models.

The paper tackles the challenge of generating creative combinatorial objects from unrelated text pairs in text-to-image synthesis by introducing a balance swap-sampling method, which outperforms recent state-of-the-art methods and rivals human artists in tasks like frog-broccoli generation.

Generating creative combinatorial objects from two seemingly unrelated object texts is a challenging task in text-to-image synthesis, often hindered by a focus on emulating existing data distributions. In this paper, we develop a straightforward yet highly effective method, called \textbf{balance swap-sampling}. First, we propose a swapping mechanism that generates a novel combinatorial object image set by randomly exchanging intrinsic elements of two text embeddings through a cutting-edge diffusion model. Second, we introduce a balance swapping region to efficiently sample a small subset from the newly generated image set by balancing CLIP distances between the new images and their original generations, increasing the likelihood of accepting the high-quality combinations. Last, we employ a segmentation method to compare CLIP distances among the segmented components, ultimately selecting the most promising object from the sampled subset. Extensive experiments demonstrate that our approach outperforms recent SOTA T2I methods. Surprisingly, our results even rival those of human artists, such as frog-broccoli.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes