CVMar 27, 2025

LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis

Shitian Zhao, Qilong Wu, Xinyue Li, Bo Zhang, Ming Li, Qi Qin, Dongyang Liu, Kaipeng Zhang, Hongsheng Li, Yu Qiao, Peng Gao, Bin Fu

arXiv:2503.21749v121.712 citationsh-index: 26Has Code

Originality Incremental advance

AI Analysis

This work addresses the specific challenge of generating accurate and aesthetically pleasing text in images for applications in creative design and content creation, representing a strong incremental advance in the field.

The paper tackles the problem of low text rendering fidelity in text-to-image generation by introducing LeX-Art, a suite that includes a high-quality dataset, prompt enrichment model, and two models, achieving state-of-the-art performance with gains such as a 79.81% improvement in text accuracy on CreateBench.

We introduce LeX-Art, a comprehensive suite for high-quality text-image synthesis that systematically bridges the gap between prompt expressiveness and text rendering fidelity. Our approach follows a data-centric paradigm, constructing a high-quality data synthesis pipeline based on Deepseek-R1 to curate LeX-10K, a dataset of 10K high-resolution, aesthetically refined 1024$\times$1024 images. Beyond dataset construction, we develop LeX-Enhancer, a robust prompt enrichment model, and train two text-to-image models, LeX-FLUX and LeX-Lumina, achieving state-of-the-art text rendering performance. To systematically evaluate visual text generation, we introduce LeX-Bench, a benchmark that assesses fidelity, aesthetics, and alignment, complemented by Pairwise Normalized Edit Distance (PNED), a novel metric for robust text accuracy evaluation. Experiments demonstrate significant improvements, with LeX-Lumina achieving a 79.81% PNED gain on CreateBench, and LeX-FLUX outperforming baselines in color (+3.18%), positional (+4.45%), and font accuracy (+3.81%). Our codes, models, datasets, and demo are publicly available.

View on arXiv PDF

Similar