CVApr 26, 2024

SAGHOG: Self-Supervised Autoencoder for Generating HOG Features for Writer Retrieval

arXiv:2404.17221v12 citationsh-index: 31ICDAR
Originality Incremental advance
AI Analysis

This addresses writer retrieval for historical document analysis, with incremental improvements in performance on specific datasets.

The paper tackles writer retrieval from historical documents by introducing SAGHOG, a self-supervised pretraining strategy using HOG features, achieving a mAP of 57.2% on HisFrag20, outperforming the state of the art by 11.6%, and a Top-1 accuracy of 58.0% on GRK-Papyri.

This paper introduces SAGHOG, a self-supervised pretraining strategy for writer retrieval using HOG features of the binarized input image. Our preprocessing involves the application of the Segment Anything technique to extract handwriting from various datasets, ending up with about 24k documents, followed by training a vision transformer on reconstructing masked patches of the handwriting. SAGHOG is then finetuned by appending NetRVLAD as an encoding layer to the pretrained encoder. Evaluation of our approach on three historical datasets, Historical-WI, HisFrag20, and GRK-Papyri, demonstrates the effectiveness of SAGHOG for writer retrieval. Additionally, we provide ablation studies on our architecture and evaluate un- and supervised finetuning. Notably, on HisFrag20, SAGHOG outperforms related work with a mAP of 57.2 % - a margin of 11.6 % to the current state of the art, showcasing its robustness on challenging data, and is competitive on even small datasets, e.g. GRK-Papyri, where we achieve a Top-1 accuracy of 58.0%.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes