CVAINov 9, 2023

DONUT-hole: DONUT Sparsification by Harnessing Knowledge and Optimizing Learning Efficiency

arXiv:2311.05778v12 citationsh-index: 12
Originality Incremental advance
AI Analysis

This enables more efficient deployment in production and edge devices for logistic companies, though it is incremental as it builds directly on DONUT.

The paper tackles the high memory and computational demands of the DONUT visual document understanding model by introducing DONUT-hole, which reduces model density by 54% while preserving performance, as shown by a CKA similarity score of 0.79.

This paper introduces DONUT-hole, a sparse OCR-free visual document understanding (VDU) model that addresses the limitations of its predecessor model, dubbed DONUT. The DONUT model, leveraging a transformer architecture, overcoming the challenges of separate optical character recognition (OCR) and visual semantic understanding (VSU) components. However, its deployment in production environments and edge devices is hindered by high memory and computational demands, particularly in large-scale request services. To overcome these challenges, we propose an optimization strategy based on knowledge distillation and model pruning. Our paradigm to produce DONUT-hole, reduces the model denisty by 54\% while preserving performance. We also achieve a global representational similarity index between DONUT and DONUT-hole based on centered kernel alignment (CKA) metric of 0.79. Moreover, we evaluate the effectiveness of DONUT-hole in the document image key information extraction (KIE) task, highlighting its potential for developing more efficient VDU systems for logistic companies.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes