CVOct 23, 2024

KhmerST: A Low-Resource Khmer Scene Text Detection and Recognition Benchmark

arXiv:2410.18277v14 citationsh-index: 30ACCV
Originality Synthesis-oriented
AI Analysis

This addresses the problem of limited resources for non-Latin scripts like Khmer, enabling improved text detection and recognition in applications such as document digitization and accessibility, though it is incremental as it focuses on dataset creation.

The authors tackled the lack of training data for scene text detection and recognition in low-resource languages by introducing the first Khmer scene-text dataset with 1,544 annotated images, providing baseline models for future research.

Developing effective scene text detection and recognition models hinges on extensive training data, which can be both laborious and costly to obtain, especially for low-resourced languages. Conventional methods tailored for Latin characters often falter with non-Latin scripts due to challenges like character stacking, diacritics, and variable character widths without clear word boundaries. In this paper, we introduce the first Khmer scene-text dataset, featuring 1,544 expert-annotated images, including 997 indoor and 547 outdoor scenes. This diverse dataset includes flat text, raised text, poorly illuminated text, distant and partially obscured text. Annotations provide line-level text and polygonal bounding box coordinates for each scene. The benchmark includes baseline models for scene-text detection and recognition tasks, providing a robust starting point for future research endeavors. The KhmerST dataset is publicly accessible at https://gitlab.com/vannkinhnom123/khmerst.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes