CVNov 22, 2024

AnyText2: Visual Text Generation and Editing With Customizable Attributes

arXiv:2411.15245v138 citationsh-index: 4Has Code
Originality Incremental advance
AI Analysis

This work addresses a specific limitation in text-to-image applications for users needing customizable text attributes, representing an incremental advancement over prior methods.

The paper tackles the problem of controlling font and color attributes in text-to-image generation, introducing AnyText2 which improves text accuracy by 3.3% for Chinese and 9.3% for English while increasing inference speed by 19.8%.

As the text-to-image (T2I) domain progresses, generating text that seamlessly integrates with visual content has garnered significant attention. However, even with accurate text generation, the inability to control font and color can greatly limit certain applications, and this issue remains insufficiently addressed. This paper introduces AnyText2, a novel method that enables precise control over multilingual text attributes in natural scene image generation and editing. Our approach consists of two main components. First, we propose a WriteNet+AttnX architecture that injects text rendering capabilities into a pre-trained T2I model. Compared to its predecessor, AnyText, our new approach not only enhances image realism but also achieves a 19.8% increase in inference speed. Second, we explore techniques for extracting fonts and colors from scene images and develop a Text Embedding Module that encodes these text attributes separately as conditions. As an extension of AnyText, this method allows for customization of attributes for each line of text, leading to improvements of 3.3% and 9.3% in text accuracy for Chinese and English, respectively. Through comprehensive experiments, we demonstrate the state-of-the-art performance of our method. The code and model will be made open-source in https://github.com/tyxsspa/AnyText2.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes