CVNov 1, 2023

On Manipulating Scene Text in the Wild with Diffusion Models

arXiv:2311.00734v213 citationsh-index: 3
Originality Incremental advance
AI Analysis

This work addresses the challenge of preserving details like color and font when editing text in real-world images, which is important for applications in document processing and augmented reality, though it is incremental as it builds on existing diffusion models.

The paper tackles the problem of scene text editing in images, where existing diffusion models often degrade details, by introducing DBEST, a method that achieves high character-level OCR accuracy of 94.15% on COCO-text and 98.12% on ICDAR2013 datasets.

Diffusion models have gained attention for image editing yielding impressive results in text-to-image tasks. On the downside, one might notice that generated images of stable diffusion models suffer from deteriorated details. This pitfall impacts image editing tasks that require information preservation e.g., scene text editing. As a desired result, the model must show the capability to replace the text on the source image to the target text while preserving the details e.g., color, font size, and background. To leverage the potential of diffusion models, in this work, we introduce Diffusion-BasEd Scene Text manipulation Network so-called DBEST. Specifically, we design two adaptation strategies, namely one-shot style adaptation and text-recognition guidance. In experiments, we thoroughly assess and compare our proposed method against state-of-the-arts on various scene text datasets, then provide extensive ablation studies for each granularity to analyze our performance gain. Also, we demonstrate the effectiveness of our proposed method to synthesize scene text indicated by competitive Optical Character Recognition (OCR) accuracy. Our method achieves 94.15% and 98.12% on COCO-text and ICDAR2013 datasets for character-level evaluation.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes