CL CVDec 20, 2022

Character-Aware Models Improve Visual Text Rendering

Rosanne Liu, Dan Garrette, Chitwan Saharia, William Chan, Adam Roberts, Sharan Narang, Irina Blok, RJ Mical, Mohammad Norouzi, Noah Constant

DeepMind

arXiv:2212.10562v224.3261 citationsh-index: 71

Originality Incremental advance

AI Analysis

This addresses a specific bottleneck in text-to-image generation for applications requiring accurate text rendering, representing an incremental improvement.

The paper tackles the problem of unreliable visual text generation in image models by identifying the lack of character-level features as a key issue, and shows that character-aware models improve accuracy by over 30 points on rare words in visual spelling tasks.

Current image generation models struggle to reliably produce well-formed visual text. In this paper, we investigate a key contributing factor: popular text-to-image models lack character-level input features, making it much harder to predict a word's visual makeup as a series of glyphs. To quantify this effect, we conduct a series of experiments comparing character-aware vs. character-blind text encoders. In the text-only domain, we find that character-aware models provide large gains on a novel spelling task (WikiSpell). Applying our learnings to the visual domain, we train a suite of image generation models, and show that character-aware variants outperform their character-blind counterparts across a range of novel text rendering tasks (our DrawText benchmark). Our models set a much higher state-of-the-art on visual spelling, with 30+ point accuracy gains over competitors on rare words, despite training on far fewer examples.

View on arXiv PDF

Similar