HCCVFeb 11, 2025

SketchFlex: Facilitating Spatial-Semantic Coherence in Text-to-Image Generation with Region-Based Sketches

arXiv:2502.07556v123 citationsh-index: 8CHI
Originality Highly original
AI Analysis

This work addresses the challenge of text-to-image generation for non-expert users, providing an interactive system to improve the flexibility of spatially conditioned image generation.

The authors tackled the problem of generating semantically cohesive images from text descriptions, achieving more cohesive image generations with SketchFlex, and reducing cognitive load by 0%. Experimental results show that SketchFlex outperforms end-to-end models and region-based generation baseline in terms of image generation quality and user intention matching.

Text-to-image models can generate visually appealing images from text descriptions. Efforts have been devoted to improving model controls with prompt tuning and spatial conditioning. However, our formative study highlights the challenges for non-expert users in crafting appropriate prompts and specifying fine-grained spatial conditions (e.g., depth or canny references) to generate semantically cohesive images, especially when multiple objects are involved. In response, we introduce SketchFlex, an interactive system designed to improve the flexibility of spatially conditioned image generation using rough region sketches. The system automatically infers user prompts with rational descriptions within a semantic space enriched by crowd-sourced object attributes and relationships. Additionally, SketchFlex refines users' rough sketches into canny-based shape anchors, ensuring the generation quality and alignment of user intentions. Experimental results demonstrate that SketchFlex achieves more cohesive image generations than end-to-end models, meanwhile significantly reducing cognitive load and better matching user intentions compared to region-based generation baseline.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes