An Intelligent Hybrid Model for Identity Document Classification
This addresses the challenge of document classification for businesses and government agencies, but it is incremental as it builds on existing methods.
The study tackled the problem of accurately classifying identity documents from scanned images by proposing a hybrid model that fuses visual and text features with a fuzzy-mean fusion method, achieving evaluation via ROC curve analysis on synthetic and real-world datasets.
Digitization, i.e., the process of converting information into a digital format, may provide various opportunities (e.g., increase in productivity, disaster recovery, and environmentally friendly solutions) and challenges for businesses. In this context, one of the main challenges would be to accurately classify numerous scanned documents uploaded every day by customers as usual business processes. For example, processes in banking (e.g., applying for loans) or the Government Registry of BDM (Births, Deaths, and Marriages) applications may involve uploading several documents such as a driver's license and passport. There are not many studies available to address the challenge as an application of image classification. Although some studies are available which used various methods, a more accurate model is still required. The current study has proposed a robust fusion model to define the type of identity documents accurately. The proposed approach is based on two different methods in which images are classified based on their visual features and text features. A novel model based on statistics and regression has been proposed to calculate the confidence level for the feature-based classifier. A fuzzy-mean fusion model has been proposed to combine the classifier results based on their confidence score. The proposed approach has been implemented using Python and experimentally validated on synthetic and real-world datasets. The performance of the proposed model is evaluated using the Receiver Operating Characteristic (ROC) curve analysis.