WordCraft: Interactive Artistic Typography with Attention Awareness and Noise Blending
This work addresses the need for more interactive and flexible artistic typography tools for artists and designers, representing an incremental improvement over existing generative models.
The paper tackled the problem of limited interactivity in automated artistic typography by introducing WordCraft, a system that integrates diffusion models with a training-free regional attention mechanism and noise blending, enabling precise multi-region generation and continuous refinement for high-quality stylized typography across multiple languages.
Artistic typography aims to stylize input characters with visual effects that are both creative and legible. Traditional approaches rely heavily on manual design, while recent generative models, particularly diffusion-based methods, have enabled automated character stylization. However, existing solutions remain limited in interactivity, lacking support for localized edits, iterative refinement, multi-character composition, and open-ended prompt interpretation. We introduce WordCraft, an interactive artistic typography system that integrates diffusion models to address these limitations. WordCraft features a training-free regional attention mechanism for precise, multi-region generation and a noise blending that supports continuous refinement without compromising visual quality. To support flexible, intent-driven generation, we incorporate a large language model to parse and structure both concrete and abstract user prompts. These components allow our framework to synthesize high-quality, stylized typography across single- and multi-character inputs across multiple languages, supporting diverse user-centered workflows. Our system significantly enhances interactivity in artistic typography synthesis, opening up creative possibilities for artists and designers.