CV LGSep 11, 2025

Evaluation of Ensemble Learning Techniques for handwritten OCR Improvement

arXiv:2509.16221v13.6

Originality Synthesis-oriented

AI Analysis

This work addresses the need for high-accuracy digitization of medical records, but it is incremental as it applies known ensemble learning to a specific OCR task.

The study investigated ensemble learning techniques to improve optical character recognition (OCR) accuracy for digitizing handwritten historical patient records, finding that ensemble methods increased OCR accuracy and that training dataset size did not affect this improvement.

For the bachelor project 2021 of Professor Lippert's research group, handwritten entries of historical patient records needed to be digitized using Optical Character Recognition (OCR) methods. Since the data will be used in the future, a high degree of accuracy is naturally required. Especially in the medical field this has even more importance. Ensemble Learning is a method that combines several machine learning models and is claimed to be able to achieve an increased accuracy for existing methods. For this reason, Ensemble Learning in combination with OCR is investigated in this work in order to create added value for the digitization of the patient records. It was possible to discover that ensemble learning can lead to an increased accuracy for OCR, which methods were able to achieve this and that the size of the training data set did not play a role here.

View on arXiv PDF

Similar