AnyArtisticGlyph: Multilingual Controllable Artistic Glyph Generation
This work addresses the challenge of controllable artistic glyph generation for multilingual text applications, representing an incremental improvement over prior methods.
The paper tackles the problem of generating artistic glyph images with fine control, addressing issues like blurred textures in existing methods, and introduces AnyArtisticGlyph, a diffusion-based model that achieves state-of-the-art performance in producing natural, detailed images.
Artistic Glyph Image Generation (AGIG) differs from current creativity-focused generation models by offering finely controllable deterministic generation. It transfers the style of a reference image to a source while preserving its content. Although advanced and promising, current methods may reveal flaws when scrutinizing synthesized image details, often producing blurred or incorrect textures, posing a significant challenge. Hence, we introduce AnyArtisticGlyph, a diffusion-based, multilingual controllable artistic glyph generation model. It includes a font fusion and embedding module, which generates latent features for detailed structure creation, and a vision-text fusion and embedding module that uses the CLIP model to encode references and blends them with transformation caption embeddings for seamless global image generation. Moreover, we incorporate a coarse-grained feature-level loss to enhance generation accuracy. Experiments show that it produces natural, detailed artistic glyph images with state-of-the-art performance. Our project will be open-sourced on https://github.com/jiean001/AnyArtisticGlyph to advance text generation technology.