CVMay 4, 2023

Text Reading Order in Uncontrolled Conditions by Sparse Graph Segmentation

arXiv:2305.02577v19 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of text reading order for OCR applications in real-world scenarios with varied layouts and image degradations, representing an incremental improvement.

The paper tackles the problem of determining text reading order in OCR under uncontrolled conditions, proposing a lightweight GCN-based method that achieves effective performance across multi-language datasets and is deployable on mobile devices.

Text reading order is a crucial aspect in the output of an OCR engine, with a large impact on downstream tasks. Its difficulty lies in the large variation of domain specific layout structures, and is further exacerbated by real-world image degradations such as perspective distortions. We propose a lightweight, scalable and generalizable approach to identify text reading order with a multi-modal, multi-task graph convolutional network (GCN) running on a sparse layout based graph. Predictions from the model provide hints of bidimensional relations among text lines and layout region structures, upon which a post-processing cluster-and-sort algorithm generates an ordered sequence of all the text lines. The model is language-agnostic and runs effectively across multi-language datasets that contain various types of images taken in uncontrolled conditions, and it is small enough to be deployed on virtually any platform including mobile devices.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes