CVMay 25, 2017

Unsupervised Feature Learning for Writer Identification and Writer Retrieval

arXiv:1705.09369v3100 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of limited annotated data in document analysis for researchers and practitioners, though it is incremental as it adapts existing unsupervised techniques to a specific domain.

The paper tackled the problem of writer identification and retrieval in historical documents by proposing an unsupervised method to learn CNN activation features, achieving superior performance to state-of-the-art writer identification methods on the ICDAR17 dataset and comparable results on the CLaMM16 dataset.

Deep Convolutional Neural Networks (CNN) have shown great success in supervised classification tasks such as character classification or dating. Deep learning methods typically need a lot of annotated training data, which is not available in many scenarios. In these cases, traditional methods are often better than or equivalent to deep learning methods. In this paper, we propose a simple, yet effective, way to learn CNN activation features in an unsupervised manner. Therefore, we train a deep residual network using surrogate classes. The surrogate classes are created by clustering the training dataset, where each cluster index represents one surrogate class. The activations from the penultimate CNN layer serve as features for subsequent classification tasks. We evaluate the feature representations on two publicly available datasets. The focus lies on the ICDAR17 competition dataset on historical document writer identification (Historical-WI). We show that the activation features trained without supervision are superior to descriptors of state-of-the-art writer identification methods. Additionally, we achieve comparable results in the case of handwriting classification using the ICFHR16 competition dataset on historical Latin script types (CLaMM16).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes