CVOct 3, 2025

OTR: Synthesizing Overlay Text Dataset for Text Removal

arXiv:2510.02787v1h-index: 4Has CodeMM
Originality Synthesis-oriented
AI Analysis

This work addresses dataset artifacts and evaluation issues for text removal tasks in computer vision, though it is incremental as it focuses on improving data quality rather than proposing a new method.

The paper tackles limitations in existing text removal datasets by introducing OTR, a synthesized dataset with text rendered on complex backgrounds using object-aware placement and vision-language model-generated content, which achieves improved generalization across domains beyond scene texts.

Text removal is a crucial task in computer vision with applications such as privacy preservation, image editing, and media reuse. While existing research has primarily focused on scene text removal in natural images, limitations in current datasets hinder out-of-domain generalization or accurate evaluation. In particular, widely used benchmarks such as SCUT-EnsText suffer from ground truth artifacts due to manual editing, overly simplistic text backgrounds, and evaluation metrics that do not capture the quality of generated results. To address these issues, we introduce an approach to synthesizing a text removal benchmark applicable to domains other than scene texts. Our dataset features text rendered on complex backgrounds using object-aware placement and vision-language model-generated content, ensuring clean ground truth and challenging text removal scenarios. The dataset is available at https://huggingface.co/datasets/cyberagent/OTR .

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes