Benjamin Kiessling

CV
h-index7
4papers
66citations
Novelty28%
AI Score42

4 Papers

11.9CVJun 2Code
End-to-End Text Line Detection and Ordering

Benjamin Kiessling

Practical text-recognition pipelines for historical documents typically decompose layout analysis into line detection followed by a separate reading-order step, with the latter most often handled by a hand-coded geometric heuristic that struggles with marginalia, multiple columns, tables, and source-specific editorial conventions. This article introduces Orli (Ordered Regression of Lines), an end-to-end model that casts both sub-tasks as a single image-to-sequence problem: from a page image, Orli autoregressively generates text-line baselines directly in reading order. Baselines are represented in a chord-frame parameterization that anchors a line's position, orientation, and extent while encoding local geometry through perpendicular offsets; an iterative refinement head and a local visual refiner produce the final curve. Trained on a heterogeneous corpus of 196,691 pages spanning ten writing systems, Orli marginally exceeds the previously reported state of the art for cBAD line detection without dataset-specific training, reaches near perfect coverage and ordering on multiple reading-order benchmarks zero-shot, and adapts to more specialized out-of-domain layouts with limited fine-tuning. The method's source code and model weights are available under an open license at https://github.com/mittagessen/orli.

CLFeb 8, 2024Code
Advances and Limitations in Open Source Arabic-Script OCR: A Case Study

Benjamin Kiessling, Gennady Kurin, Matthew Thomas Miller et al.

This work presents an accuracy study of the open source OCR engine, Kraken, on the leading Arabic scholarly journal, al-Abhath. In contrast with other commercially available OCR engines, Kraken is shown to be capable of producing highly accurate Arabic-script OCR. The study also assesses the relative accuracy of typeface-specific and generalized models on the al-Abhath data and provides a microanalysis of the ``error instances'' and the contextual features that may have contributed to OCR misrecognition. Building on this analysis, the paper argues that Arabic-script OCR can be significantly improved through (1) a more systematic approach to training data production, and (2) the development of key technological components, especially multi-language models and improved line segmentation and layout analysis. Cet article pr{é}sente une {é}tude d'exactitude du moteur ROC open source, Krakan, sur la revue acad{é}mique arabe de premier rang, al-Abhath. Contrairement {à} d'autres moteurs ROC disponibles sur le march{é}, Kraken se r{é}v{è}le {ê}tre capable de produire de la ROC extr{ê}mement exacte de l'{é}criture arabe. L'{é}tude {é}value aussi l'exactitude relative des mod{è}les sp{é}cifiquement configur{é}s {à} des polices et celle des mod{è}les g{é}n{é}ralis{é}s sur les donn{é}es d'al-Abhath et fournit une microanalyse des "occurrences d'erreurs", ainsi qu'une microanalyse des {é}l{é}ments contextuels qui pourraient avoir contribu{é} {à} la m{é}reconnaissance ROC. S'appuyant sur cette analyse, cet article fait valoir que la ROC de l'{é}criture arabe peut {ê}tre consid{é}rablement am{é}lior{é}e gr{â}ce {à} (1) une approche plus syst{é}matique d'entra{î}nement de la production de donn{é}es et (2) gr{â}ce au d{é}veloppement de composants technologiques fondamentaux, notammentl'am{é}lioration des mod{è}les multilingues, de la segmentation de ligne et de l'analyse de la mise en page.

CVMar 28, 2017Code
Important New Developments in Arabographic Optical Character Recognition (OCR)

Maxim Romanov, Matthew Thomas Miller, Sarah Bowen Savant et al.

The OpenITI team has achieved Optical Character Recognition (OCR) accuracy rates for classical Arabic-script texts in the high nineties. These numbers are based on our tests of seven different Arabic-script texts of varying quality and typefaces, totaling over 7,000 lines. These accuracy rates not only represent a distinct improvement over the actual accuracy rates of the various proprietary OCR options for classical Arabic-script texts, but, equally important, they are produced using an open-source OCR software, thus enabling us to make this Arabic-script OCR technology freely available to the broader Islamic, Persian, and Arabic Studies communities.

CVJul 9, 2019
BADAM: A Public Dataset for Baseline Detection in Arabic-script Manuscripts

Benjamin Kiessling, Daniel Stökl Ben Ezra, Matthew Thomas Miller

The application of handwritten text recognition to historical works is highly dependant on accurate text line retrieval. A number of systems utilizing a robust baseline detection paradigm have emerged recently but the advancement of layout analysis methods for challenging scripts is held back by the lack of well-established datasets including works in non-Latin scripts. We present a dataset of 400 annotated document images from different domains and time periods. A short elaboration on the particular challenges posed by handwriting in Arabic script for layout analysis and subsequent processing steps is given. Lastly, we propose a method based on a fully convolutional encoder-decoder network to extract arbitrarily shaped text line images from manuscripts.