CVLGAug 23, 2023

Manipulating Embeddings of Stable Diffusion Prompts

arXiv:2308.12059v214 citationsh-index: 53
Originality Incremental advance
AI Analysis

This work addresses the challenge of targeted image manipulation for users of generative models like Stable Diffusion, offering more precise control than prompt engineering, though it is incremental in building on existing gradient-based techniques.

The paper tackles the problem of fine-grained control in text-to-image generation by proposing a method to directly manipulate prompt embeddings instead of text, enabling users to optimize image metrics, navigate in image space, and incorporate visual information from specific seeds. The user study indicates that the method is considered less tedious and often produces preferred images compared to traditional prompt engineering.

Prompt engineering is still the primary way for users of generative text-to-image models to manipulate generated images in a targeted way. Based on treating the model as a continuous function and by passing gradients between the image space and the prompt embedding space, we propose and analyze a new method to directly manipulate the embedding of a prompt instead of the prompt text. We then derive three practical interaction tools to support users with image generation: (1) Optimization of a metric defined in the image space that measures, for example, the image style. (2) Supporting a user in creative tasks by allowing them to navigate in the image space along a selection of directions of "near" prompt embeddings. (3) Changing the embedding of the prompt to include information that a user has seen in a particular seed but has difficulty describing in the prompt. Compared to prompt engineering, user-driven prompt embedding manipulation enables a more fine-grained, targeted control that integrates a user's intentions. Our user study shows that our methods are considered less tedious and that the resulting images are often preferred.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes