CV NEMar 3, 2013

Genetic Programming for Document Segmentation and Region Classification Using Discipulus

arXiv:1303.0460v17 citations

Originality Synthesis-oriented

AI Analysis

This addresses the time-consuming and labor-intensive issue of extracting information from documents for users in data processing, though it appears incremental as it builds on existing methods like Run length smearing.

The paper tackles the problem of automatic document segmentation and region classification by proposing a new approach using Genetic Programming with the Discipulus tool, achieving 97.5% classification accuracy.

Document segmentation is a method of rending the document into distinct regions. A document is an assortment of information and a standard mode of conveying information to others. Pursuance of data from documents involves ton of human effort, time intense and might severely prohibit the usage of data systems. So, automatic information pursuance from the document has become a big issue. It is been shown that document segmentation will facilitate to beat such problems. This paper proposes a new approach to segment and classify the document regions as text, image, drawings and table. Document image is divided into blocks using Run length smearing rule and features are extracted from every blocks. Discipulus tool has been used to construct the Genetic programming based classifier model and located 97.5% classification accuracy.

View on arXiv PDF

Similar