CVDCLGJun 16, 2020

Improving accuracy and speeding up Document Image Classification through parallel systems

arXiv:2006.09141v134 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of efficient and accurate document classification for institutions undergoing digitalization, though it is incremental as it applies existing models and techniques to this domain.

The paper tackles document image classification by demonstrating that EfficientNet models outperform heavier CNNs on the RVL-CDIP dataset with improved accuracy and reduced model size, and shows that an ensemble combining image and text (BERT) predictions boosts performance while parallel training with increased batch size speeds up computation.

This paper presents a study showing the benefits of the EfficientNet models compared with heavier Convolutional Neural Networks (CNNs) in the Document Classification task, essential problem in the digitalization process of institutions. We show in the RVL-CDIP dataset that we can improve previous results with a much lighter model and present its transfer learning capabilities on a smaller in-domain dataset such as Tobacco3482. Moreover, we present an ensemble pipeline which is able to boost solely image input by combining image model predictions with the ones generated by BERT model on extracted text by OCR. We also show that the batch size can be effectively increased without hindering its accuracy so that the training process can be sped up by parallelizing throughout multiple GPUs, decreasing the computational time needed. Lastly, we expose the training performance differences between PyTorch and Tensorflow Deep Learning frameworks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes