CVMar 14, 2024

Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering

arXiv:2403.09622v274 citationsECCV
Originality Incremental advance
AI Analysis

This addresses a fundamental challenge in text-to-image generation for applications like design and scene text rendering, representing a strong specific gain rather than a broad paradigm shift.

The paper tackles the problem of inaccurate visual text rendering in text-to-image models by developing a customized text encoder, Glyph-ByT5, which improves text rendering accuracy from less than 20% to nearly 90% on a design image benchmark and enables high spelling accuracy for text paragraphs.

Visual text rendering poses a fundamental challenge for contemporary text-to-image generation models, with the core problem lying in text encoder deficiencies. To achieve accurate text rendering, we identify two crucial requirements for text encoders: character awareness and alignment with glyphs. Our solution involves crafting a series of customized text encoder, Glyph-ByT5, by fine-tuning the character-aware ByT5 encoder using a meticulously curated paired glyph-text dataset. We present an effective method for integrating Glyph-ByT5 with SDXL, resulting in the creation of the Glyph-SDXL model for design image generation. This significantly enhances text rendering accuracy, improving it from less than $20\%$ to nearly $90\%$ on our design image benchmark. Noteworthy is Glyph-SDXL's newfound ability for text paragraph rendering, achieving high spelling accuracy for tens to hundreds of characters with automated multi-line layouts. Finally, through fine-tuning Glyph-SDXL with a small set of high-quality, photorealistic images featuring visual text, we showcase a substantial improvement in scene text rendering capabilities in open-domain real images. These compelling outcomes aim to encourage further exploration in designing customized text encoders for diverse and challenging tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes