CVFeb 15, 2024

TEXTRON: Weakly Supervised Multilingual Text Detection through Data Programming

Dhruv Kudale, Badri Vishal Kasuba, Venkatapathy Subramanian, Parag Chaudhuri, Ganesh Ramakrishnan

arXiv:2402.09811v15.23 citationsh-index: 9Has CodeWACV

Originality Incremental advance

AI Analysis

This addresses the challenge of text detection in multilingual documents, particularly for Indian scripts, by providing a weakly supervised method that reduces reliance on manual annotation, though it is incremental as it builds on existing techniques.

The paper tackles the problem of multilingual text detection, especially for low-resource Indian scripts with scarce labeled data, by proposing TEXTRON, a Data Programming-based approach that ensembles computer vision and deep learning methods to improve detection performance without requiring labeled data for those languages.

Several recent deep learning (DL) based techniques perform considerably well on image-based multilingual text detection. However, their performance relies heavily on the availability and quality of training data. There are numerous types of page-level document images consisting of information in several modalities, languages, fonts, and layouts. This makes text detection a challenging problem in the field of computer vision (CV), especially for low-resource or handwritten languages. Furthermore, there is a scarcity of word-level labeled data for text detection, especially for multilingual settings and Indian scripts that incorporate both printed and handwritten text. Conventionally, Indian script text detection requires training a DL model on plenty of labeled data, but to the best of our knowledge, no relevant datasets are available. Manual annotation of such data requires a lot of time, effort, and expertise. In order to solve this problem, we propose TEXTRON, a Data Programming-based approach, where users can plug various text detection methods into a weak supervision-based learning framework. One can view this approach to multilingual text detection as an ensemble of different CV-based techniques and DL approaches. TEXTRON can leverage the predictions of DL models pre-trained on a significant amount of language data in conjunction with CV-based methods to improve text detection in other languages. We demonstrate that TEXTRON can improve the detection performance for documents written in Indian languages, despite the absence of corresponding labeled data. Further, through extensive experimentation, we show improvement brought about by our approach over the current State-of-the-art (SOTA) models, especially for handwritten Devanagari text. Code and dataset has been made available at https://github.com/IITB-LEAP-OCR/TEXTRON

View on arXiv PDF Code

Similar