CVMar 18, 2025

ShapeShift: Towards Text-to-Shape Arrangement Synthesis with Content-Aware Geometric Constraints

arXiv:2503.14720v13 citationsh-index: 8
Originality Incremental advance
AI Analysis

This addresses a nuanced challenge in text-to-image generation for applications like puzzle-solving or object arrangement, though it is incremental as it builds on existing diffusion models.

The paper tackles the problem of generating images by rearranging a fixed set of rigid shapes to match text descriptions, akin to solving tangram puzzles, and introduces ShapeShift, which optimizes shape placement using diffusion models and a collision resolution mechanism to achieve physically valid configurations. Results show compelling outcomes with quantitative and qualitative advantages over alternatives.

While diffusion-based models excel at generating photorealistic images from text, a more nuanced challenge emerges when constrained to using only a fixed set of rigid shapes, akin to solving tangram puzzles or arranging real-world objects to match semantic descriptions. We formalize this problem as shape-based image generation, a new text-guided image-to-image translation task that requires rearranging the input set of rigid shapes into non-overlapping configurations and visually communicating the target concept. Unlike pixel-manipulation approaches, our method, ShapeShift, explicitly parameterizes each shape within a differentiable vector graphics pipeline, iteratively optimizing placement and orientation through score distillation sampling from pretrained diffusion models. To preserve arrangement clarity, we introduce a content-aware collision resolution mechanism that applies minimal semantically coherent adjustments when overlaps occur, ensuring smooth convergence toward physically valid configurations. By bridging diffusion-based semantic guidance with explicit geometric constraints, our approach yields interpretable compositions where spatial relationships clearly embody the textual prompt. Extensive experiments demonstrate compelling results across diverse scenarios, with quantitative and qualitative advantages over alternative techniques.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes