CVMar 27, 2025

LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis

arXiv:2503.21749v112 citationsh-index: 26Has Code
Originality Incremental advance
AI Analysis

This work addresses the specific challenge of generating accurate and aesthetically pleasing text in images for applications in creative design and content creation, representing a strong incremental advance in the field.

The paper tackles the problem of low text rendering fidelity in text-to-image generation by introducing LeX-Art, a suite that includes a high-quality dataset, prompt enrichment model, and two models, achieving state-of-the-art performance with gains such as a 79.81% improvement in text accuracy on CreateBench.

We introduce LeX-Art, a comprehensive suite for high-quality text-image synthesis that systematically bridges the gap between prompt expressiveness and text rendering fidelity. Our approach follows a data-centric paradigm, constructing a high-quality data synthesis pipeline based on Deepseek-R1 to curate LeX-10K, a dataset of 10K high-resolution, aesthetically refined 1024$\times$1024 images. Beyond dataset construction, we develop LeX-Enhancer, a robust prompt enrichment model, and train two text-to-image models, LeX-FLUX and LeX-Lumina, achieving state-of-the-art text rendering performance. To systematically evaluate visual text generation, we introduce LeX-Bench, a benchmark that assesses fidelity, aesthetics, and alignment, complemented by Pairwise Normalized Edit Distance (PNED), a novel metric for robust text accuracy evaluation. Experiments demonstrate significant improvements, with LeX-Lumina achieving a 79.81% PNED gain on CreateBench, and LeX-FLUX outperforming baselines in color (+3.18%), positional (+4.45%), and font accuracy (+3.81%). Our codes, models, datasets, and demo are publicly available.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes