Harnessing Self-Supervised Features for Art Classification
For art classification and retrieval tasks, the paper shows that self-supervised features outperform supervised ones, providing practical insights for VR museum navigation.
This paper investigates self-supervised vs. supervised backbones for artwork classification and retrieval, finding that self-supervised models (DINO, CLIP) consistently improve classification performance.
Classifying artworks presents a significant challenge due to the complex interplay of fine-grained details and abstract features that condition the style or genre of an artwork. This paper presents a systematic investigation of the effectiveness of supervised and self-supervised backbones as feature extractors for both artwork classification and retrieval, with a particular focus on paintings. We conduct an extensive experimental evaluation using the DINO family and CLIP models, assessing multiple classification strategies and feature representations. Our results demonstrate that employing a self-supervised backbone leads to consistent improvements in artwork classification performance. Moreover, our work provides insights into the applicability of classification and retrieval modules in real-world applications, such as virtual reality (VR) applications that support museum navigation.