2D Self-Organized ONN Model For Handwritten Text Recognition
This work addresses the incremental improvement of HTR accuracy for applications in document digitization and text analysis.
The paper tackles the problem of improving Handwritten Text Recognition (HTR) by proposing a 2D Self-Organized Operational Neural Network (Self-ONN) model with deformable convolutions, which reduces Character Error Rate (CER) and Word Error Rate (WER) by up to 3.4% on benchmark datasets compared to CNNs.
Deep Convolutional Neural Networks (CNNs) have recently reached state-of-the-art Handwritten Text Recognition (HTR) performance. However, recent research has shown that typical CNNs' learning performance is limited since they are homogeneous networks with a simple (linear) neuron model. With their heterogeneous network structure incorporating non-linear neurons, Operational Neural Networks (ONNs) have recently been proposed to address this drawback. Self-ONNs are self-organized variations of ONNs with the generative neuron model that can generate any non-linear function using the Taylor approximation. In this study, in order to improve the state-of-the-art performance level in HTR, the 2D Self-organized ONNs (Self-ONNs) in the core of a novel network model are proposed. Moreover, deformable convolutions, which have recently been demonstrated to tackle variations in the writing styles better, are utilized in this study. The results over the IAM English dataset and HADARA80P Arabic dataset show that the proposed model with the operational layers of Self-ONNs significantly improves Character Error Rate (CER) and Word Error Rate (WER). Compared with its counterpart CNNs, Self-ONNs reduce CER and WER by 1.2% and 3.4 % in the HADARA80P and 0.199% and 1.244% in the IAM dataset. The results over the benchmark IAM demonstrate that the proposed model with the operational layers of Self-ONNs outperforms recent deep CNN models by a significant margin while the use of Self-ONNs with deformable convolutions demonstrates exceptional results.