Modular Multimodal Architecture for Document Classification
This improves document analysis systems by enabling better branching control flows for different document components.
The paper tackles document page classification by using both visual and textual content, achieving a state-of-the-art result of 93.03% test accuracy on the RVL-CDIP benchmark.
Page classification is a crucial component to any document analysis system, allowing for complex branching control flows for different components of a given document. Utilizing both the visual and textual content of a page, the proposed method exceeds the current state-of-the-art performance on the RVL-CDIP benchmark at 93.03% test accuracy.