LAREX - A semi-automatic open-source Tool for Layout Analysis and Region Extraction on Early Printed Books
This is an incremental tool for researchers and archivists working with early printed books to improve OCR workflows.
The authors tackled the problem of layout analysis for early printed books by developing LAREX, a semi-automatic open-source tool that uses a rule-based connected components approach, which is fast and allows intuitive manual correction, with evaluations showing it provides efficient and flexible page segmentation.
A semi-automatic open-source tool for layout analysis on early printed books is presented. LAREX uses a rule based connected components approach which is very fast, easily comprehensible for the user and allows an intuitive manual correction if necessary. The PageXML format is used to support integration into existing OCR workflows. Evaluations showed that LAREX provides an efficient and flexible way to segment pages of early printed books.