CVApr 30, 2025

Comparison of Different Deep Neural Network Models in the Cultural Heritage Domain

arXiv:2504.21387v12 citationsh-index: 13
Originality Synthesis-oriented
AI Analysis

This work addresses the need for effective computer vision models in cultural heritage documentation and visitor experiences, but it is incremental as it compares existing methods on new data.

The study compared deep neural network models, including VGG, ResNet, DenseNet, Visual Transformer, Swin Transformer, and PoolFormer, for transferring knowledge from ImageNet to cultural heritage tasks, finding that DenseNet achieved the best efficiency-computability ratio.

The integration of computer vision and deep learning is an essential part of documenting and preserving cultural heritage, as well as improving visitor experiences. In recent years, two deep learning paradigms have been established in the field of computer vision: convolutional neural networks and transformer architectures. The present study aims to make a comparative analysis of some representatives of these two techniques of their ability to transfer knowledge from generic dataset, such as ImageNet, to cultural heritage specific tasks. The results of testing examples of the architectures VGG, ResNet, DenseNet, Visual Transformer, Swin Transformer, and PoolFormer, showed that DenseNet is the best in terms of efficiency-computability ratio.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes